Sat.Dec 22, 2018 - Fri.Dec 28, 2018

article thumbnail

Continuously Query Your Time-Series Data Using PipelineDB with Derek Nelson and Usman Masood - Episode 62

Data Engineering Podcast

Summary Processing high velocity time-series data in real-time is a complex challenge. The team at PipelineDB has built a continuous query engine that simplifies the task of computing aggregates across incoming streams of events. In this episode Derek Nelson and Usman Masood explain how it is architected, strategies for designing your data flows, how to scale it up and out, and edge cases to be aware of.

article thumbnail

Creating Multi-language NLP Pipelines with Apache Spark

Domino Data Lab: Data Engineering

In this guest post, Holden Karau , Apache Spark Committer , provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She has already written a complementary blog post on using spaCy to process text data for Domino. Karau is a Developer Advocate at Google as well as a co-author on High Performance Spark and Learning Spark.

Java 52