Sun.Nov 27, 2022

article thumbnail

Data Engineering Weekly #109

Data Engineering Weekly

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today. Data Contracts for SaaS Developers with Benn Stancil The Thanksgiving break gives me enough time to catch up on a few podcasts.

article thumbnail

Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase

Data Engineering Podcast

Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data as bitmaps and the benefits that this representation provides for fast aggregate computation.

Data Lake 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Finagle Tutorial: Twitter’s RPC Library for Scala

Rock the JVM

This article is for Scala developers of all levels - you don’t need any fancy Scala knowledge to make the best out of this piece. For those who don’t know yet, almost the entire Twitter backend runs on Scala, and the Finagle library is at the core of almost all Twitter services. They also built an entire ecosystem of libraries, tools and frameworks on top of Finagle, which have been successfully used in production at other big companies (e.g.

Scala 40
article thumbnail

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, and Voltron Data was created to help grow and support its technology and community.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.