Remove Blog Remove Bytes Remove Coding Remove Metadata
article thumbnail

Open-Sourcing AvroTensorDataset: A Performant TensorFlow Dataset For Processing Avro Data

LinkedIn Engineering

In this blog post, we will discuss the AvroTensorDataset API, techniques we used to improve data processing speeds by up to 162x over existing solutions (thereby decreasing overall training time by up to 66%), and performance results from benchmarks and production. an array within a map, within a union, etc…). Default is 128 * 1024 (128KB).

Datasets 102
article thumbnail

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

In the first blog, we will share a short summary on the GokuS and GokuL architecture, data format for Goku Long Term, and how we improved the bootstrap time for our storage and serving components. More information about the architecture can be found in the GokuL blog and the cost reduction blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Launching the Engineering Blog

Zalando Engineering

Our Engineering Blog was launched in June 2020 after a long break of the previous tech blog. What customizations we applied to design the blog and the publishing process. Static Site Generator Our previous tech blog used a CMS which only a limited number of people had access to. So which static site generator to choose?

article thumbnail

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Monte Carlo

In this blog post we’ll dive into data vault architecture; challenges and best practices for maintaining data quality; and how data observability can help. architecture (with some minor deviations) to achieve their data integration objectives around scalability and use of metadata. “A What is a Data Vault model?

article thumbnail

Processing medical images at scale on the cloud

Tweag

In this blog post, I will explain the underlying technical challenges and share the solution that we helped implement at kaiko.ai , a MedTech startup in Amsterdam that is building a Data Platform to support AI research in hospitals. A solution is to read the bytes that we need when we need them directly from Blob Storage. width , spec.

Medical 60
article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

We’ll demonstrate using Gradle to execute and test our KSQL streaming code, as well as building and deploying our KSQL applications in a continuous fashion. The first requirement to tackle: how to express dependencies between KSQL queries that exist in script files in a source code repository. Sample repository. gradlew composeUp.

Kafka 95
article thumbnail

ZIO Streams: A Long-Form Introduction

Rock the JVM

For a more concrete example, we are going to write a program that will parse markdown files, extract words identified as tags, and then regenerate those files with tag-related metadata injected back into them. code, which was officially released on June 24th, 2022. Set up We’re going to base this discussion off of the latest ZIO 2.0

Scala 40