Remove tag podcast-minds-of-data
article thumbnail

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code.

article thumbnail

SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11

Data Engineering Podcast

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Enterprise add-ons and professional support are available for added peace of mind. What was the inspiration for the name?

Database 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Language Models, Explained: How GPT and Other Models Work

AltexSoft

Models then use the patterns they learn from this training data to predict the next word in a sentence or generate new text that is grammatically correct and semantically coherent. In 2020, a remarkable AI took Silicon Valley by storm. GPT-3 has a spin-off called ChatGPT that is specifically fine-tuned for conversational tasks.

article thumbnail

Kafka Streams’ Take on Watermarks and Triggers

Confluent

You get to focus on the logic of your data processing pipeline. Whether Streams emits every single update or groups updates is irrelevant to the semantics of a data processing application. The continuous-refinement-with-operational-parameters model is a very simple and powerful design for a data processing system.

Kafka 106
article thumbnail

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

Similar to Google in web browsing and Photoshop in image processing, it became a gold standard in data streaming, preferred by 70 percent of Fortune 500 companies. Apache Kafka is an open-source, distributed streaming platform for messaging, storing, processing, and integrating large data volumes in real time. What is Kafka?

Kafka 93