Remove projects big-data-projects apache-kafka-projects
article thumbnail

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. Can you start by sharing some of your experiences with data migration projects?

Systems 130
article thumbnail

Brief History of Data Engineering

Jesse Anderson

Doug Cutting took those papers and created Apache Hadoop in 2005. They were the first companies to commercialize open source big data technologies and pushed the marketing and commercialization of Hadoop. Hadoop was hard to program, and Apache Hive came along in 2010 to add SQL. We lacked a scalable pub/sub system.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Scala In Demand Technologies Built On Scala

Knowledge Hut

Developers are now much more interested in having Scala training to excel in the big data field. Play Framework, Akka, Apache Spark, etc are some of the tools and projects created using Scala. Apache Spark Apache Spark can be considered as the replacement of MapReduce.

Scala 52
article thumbnail

Fundamentals of Apache Spark

Knowledge Hut

Introduction Before getting into the fundamentals of Apache Spark, let’s understand What really is ‘Apache Spark’ is? Apache Spark is a fast and general-purpose, cluster computing system. One would find multiple definitions when you search the term Apache Spark. General Purpose: Apache spark is a unified framework.

Scala 98
article thumbnail

Top 12 Data Engineering Project Ideas [With Source Code]

Knowledge Hut

Welcome to the world of data engineering, where the power of big data unfolds. If you're aspiring to be a data engineer and seeking to showcase your skills or gain hands-on experience, you've landed in the right spot. What are Data Engineering Projects?

article thumbnail

Streaming Data Pipelines: What Are They and How to Build One

Precisely

The concept of streaming data was born of necessity. But insights derived from day-old data don’t cut it. Business success is based on how we use continuously changing data. That’s where streaming data pipelines come into play. What is a streaming data pipeline? How do streaming data pipelines work?

article thumbnail

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

Introduction At Lyft, we have used systems like Apache ClickHouse and Apache Druid for near real-time and sub-second analytics. Sub-second query systems allow for near real-time data explorations and low latency, high throughput queries, which are particularly well-suited for handling time-series data.

Kafka 104