article thumbnail

Handling Bursty Traffic in Real-Time Analytics Applications

Rockset

This is the third post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! There are many other occasions where data traffic balloons suddenly.

article thumbnail

Data News — Week 23.01

Christophe Blefari

The blog crossed the 2000 members mark (❤️) and I won the best data science newsletter award. I talked in Robin's podcast about the newsletter and my data engineering journey. And you know what, this is the first time in 7 years of teaching that more than 80% of the class wants to become data engineer.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily.

Big Data 310
article thumbnail

Demystifying Modern Data Platforms

Cloudera

Cloudera Contributor: Mark Ramsey, PhD ~ Globally Recognized Chief Data Officer. July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. Luke: What is a modern data platform?

article thumbnail

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

We wanted to do something about this search-engine deployment-related pain point and created a pre-configured service template to expedite the ditch-rich path of getting to a reliable Solr service, deployed for application developers to start using in just minutes. What is ‘Data Discovery and Exploration’ in CDP Data Hub?

article thumbnail

5 Apache Spark Best Practices

Data Science Blog: Data Engineering

Already familiar with the term big data, right? Despite the fact that we would all discuss Big Data, it takes a very long time before you confront it in your career. Apache Spark is a Big Data tool that aims to handle large datasets in a parallel and distributed manner. Begin with a small sample of the data.

Hadoop 52
article thumbnail

Why Mutability Is Essential for Real-Time Data Analytics

Rockset

This is the first post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! Immutable data is the opposite — it cannot be deleted or modified.