Aggregated Data, Blog, Data Ingestion and Engineering

Aggregated Data

Blog

Data Ingestion

Engineering

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

NOVEMBER 29, 2023

In this particular blog post, we explain how Druid has been used at Lyft and what led us to adopt ClickHouse for our sub-second analytic system. Druid at Lyft Apache Druid is an in-memory, columnar, distributed, open-source data store designed for sub-second queries on real-time and historical data.

Kafka

Kafka Data Ingestion Datasets Architecture

Using other CDP services with Cloudera Operational Database

Cloudera

FEBRUARY 16, 2021

In the previous blog post , we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. Integrated across the Enterprise Data Lifecycle . Cloudera Data Engineering to ingest bulk data and data from mainframes.

Database

Database Machine Learning Data Lake Kafka

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

The blog posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ® ecosystem as a central, scalable and mission-critical nervous system. For now, we’ll focus on Kafka.

Machine Learning

Machine Learning Python Kafka Java

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Tips to Build a Robust Data Lake Infrastructure

DareData

JULY 5, 2023

In this blog post, we aim to share practical insights and techniques based on our real-world experience in developing data lake infrastructures for our clients - let's start! The Data Lake acts as the central repository for aggregating data from diverse sources in its raw format.

Data Lake

Data Lake Building Raw Data ETL Tools

How Rockset Enables SQL-Based Rollups for Streaming Data

Rockset

AUGUST 30, 2021

The latest Rockset release, SQL-based rollups, has made real-time analytics on streaming data a lot more affordable and accessible. Anyone who knows SQL, the lingua franca of analytics, can now rollup, transform, enrich and aggregate real-time data at massive scale. You can also optionally use WHERE clauses to filter out data.

SQL

SQL Kafka MongoDB MySQL

Build Internal Apps in Minutes with Retool and Rockset: A Customer 360 Example

Rockset

DECEMBER 17, 2020

Together, they empower developers to build performant internal tools, such as customer 360 and logistics monitoring apps, by solely using data APIs and pre-built UI components. In this blog, we’ll be building a customer 360 app using Rockset and Retool. From there, we’ll create a data API for the SQL query we write in Rockset.

Building

Building Aggregated Data SQL Data Ingestion

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. This process enables quick data analysis and consistent data quality, crucial for generating quality insights through data analytics or building machine learning models.

Data Pipeline

Data Pipeline Architecture Kafka AWS

20+ Data Engineering Projects for Beginners with Source Code

ProjectPro

AUGUST 24, 2021

Nevertheless, that is not the only job in the data world. Data professionals who work with raw data like data engineers, data analysts, machine learning scientists , and machine learning engineers also play a crucial role in any data science project.

Data Engineering

Data Engineering Data Engineer Coding Project

Case Study: How Rockset's Real-Time Analytics Platform Propels the Growth of Our NFT Marketplace

Rockset

OCTOBER 26, 2022

One was to create another data pipeline that would aggregate data as it was ingested into DynamoDB. After finding Rockset through an AWS blog on creating leaderboards , we wasted no time in starting to build a new customer-facing leaderboard based on Rockset. Both would have required a lot of work.

SQL

SQL NoSQL Database Aggregated Data

5 Steps for Migrating from Elasticsearch to Rockset for Real-Time Analytics

Rockset

NOVEMBER 1, 2022

This blog outlines best practices from customers I have helped migrate from Elasticsearch to Rockset , reducing risk and avoiding common pitfalls. So popular still today that Rockset engineers use it for our own internal log search functions. In this blog, we distilled their migration journeys into 5 steps.

Database-centric

Database-centric Pipeline-centric SQL Aggregated Data

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

Do ETL and data integration activities seem complex to you? Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Did you know the global big data market will likely reach $268.4 Businesses are leveraging big data now more than ever.

AWS

AWS Scala Metadata Data Lake

A Beginner’s Guide to Learning PySpark for Big Data Processing

ProjectPro

JANUARY 25, 2022

Here’s What You Need to Know About PySpark This blog will take you through the basics of PySpark, the PySpark architecture, and a few popular PySpark libraries , among other things. Finally, you'll find a list of PySpark projects to help you gain hands-on experience and land an ideal job in Data Science or Big Data.

Big Data

Big Data Data Process Process Kafka

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

NOVEMBER 15, 2021

Table of Contents 20 Open Source Big Data Projects To Contribute How to Contribute to Open Source Big Data Projects? 20 Open Source Big Data Projects To Contribute There are thousands of open-source projects in action today. This blog will walk through the most popular and fascinating open source big data projects.

Big Data

Big Data Project Metadata Programming Language

Handling Out-of-Order Data in Real-Time Analytics Applications

Rockset

APRIL 15, 2022

This is the second post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! He was also a contributor to the open source Apache HBase project.

Analytics Application

Analytics Application Data Warehouse Raw Data Kafka

Data Engineering Digest

Druid Deprecation and ClickHouse Adoption at Lyft

Using other CDP services with Cloudera Operational Database

Webinars

Trending Sources

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Webinars

Tips to Build a Robust Data Lake Infrastructure

How Rockset Enables SQL-Based Rollups for Streaming Data

Build Internal Apps in Minutes with Retool and Rockset: A Customer 360 Example

Data Pipeline- Definition, Architecture, Examples, and Use Cases

20+ Data Engineering Projects for Beginners with Source Code

Case Study: How Rockset's Real-Time Analytics Platform Propels the Growth of Our NFT Marketplace

5 Steps for Migrating from Elasticsearch to Rockset for Real-Time Analytics

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

A Beginner’s Guide to Learning PySpark for Big Data Processing

20 Best Open Source Big Data Projects to Contribute on GitHub

Handling Out-of-Order Data in Real-Time Analytics Applications

Stay Connected