article thumbnail

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

Data Engineering Podcast

Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Go to dataengineeringpodcast.com/ascend and sign up for a free trial.

article thumbnail

Apache Spark Use Cases & Applications

Knowledge Hut

As per Apache, “ Apache Spark is a unified analytics engine for large-scale data processing ” Spark is a cluster computing framework, somewhat similar to MapReduce but has a lot more capabilities, features, speed and provides APIs for developers in many languages like Scala, Python, Java and R.

Scala 52
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

Data Engineering Podcast

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. The batch model for processing is intuitive despite its latency problems.

Data Lake 100
article thumbnail

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

This conversation was useful for getting a better idea of the challenges that exist in large scale data analytics, and the current state of the tradeoffs between data lakes and data warehouses in the cloud. What are some of the common antipatterns in data lake implementations and how does Delta Lake address them?

Data Lake 100
article thumbnail

An Overview of Real Time Data Warehousing on Cloudera

Cloudera

Users today are asking ever more from their data warehouse. As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases customers are building on Cloudera and which is becoming more and more common amongst our customers. What is Real Time Data Warehousing?

article thumbnail

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform. How do those challenges influence the adoption or viability of a data lake? What are the advantages of a data lake over a data warehouse if everything is being managed via SQL anyway?

Data Lake 100
article thumbnail

12 Big Data Project Topics with Source Code 2023

Knowledge Hut

There are many uses and benefits for real-time traffic simulation and prediction projects using big data. This project is a Lambda Architecture program that tracks Chicago's streets' traffic conditions, including congestion and safety. If you are familiar with SQL, you should have no trouble completing this project.