Sat.Aug 28, 2021 - Fri.Sep 03, 2021

article thumbnail

Understand & Deliver on Your Data Engineering Task

Start Data Engineering

1. Introduction 2. Understanding your data engineering task 2.1. Data infrastructure overview 2.2. What exactly 2.3. Why exactly 2.4. Current state 2.5. Downstream impact 3. Delivering your data engineering task 3.1. How 3.2. Breakdown into sub-tasks 3.3. Delivering the finished task 4. Conclusion 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and are assigned your very first data engineering task.

article thumbnail

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Uber Engineering

Introduction. Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One … The post Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework appeared first on Uber Engineering Blog.

AWS 144
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Confluent

Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take […].

Cloud 120
article thumbnail

When Data Redefines Companies

Cloudera

The more an enterprise wants to know about itself and its business prospects, the more data it needs to collect and analyze. Additionally, the more data it collects and stores, the better its ability to know customers, to find new ones, and to provide more of what they want to buy. Sounds simple, but a surprising majority of U.S. companies (about two-thirds, according to CIO.com ) are only now getting tuned in to become fully functioning data-driven enterprises by starting new initiatives, scali

Hadoop 96
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Towards a Reliable Device Management Platform

Netflix Tech

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In addition, Netflix continuously works with its partners (such as Roku, Samsung, LG, Amazon) to port the Netflix SDK to their new and upcoming devices (TVs, smart boxes, etc), to ensure the quality bar is reached be

article thumbnail

Chugai Pharmaceutical

Teradata

Accelerating drug discovery and development with Teradata Vantage on AWS.

More Trending

article thumbnail

Optimizing Cloudera Data Engineering Autoscaling Performance

Cloudera

The shift to cloud has been accelerating, and with it, a push to modernize data pipelines that fuel key applications. That is why cloud native solutions which take advantage of the capabilities such as disaggregated storage & compute, elasticity, and containerization are more paramount than ever. At Cloudera, we introduced Cloudera Data Engineering (CDE) as part of our Enterprise Data Cloud product — Cloudera Data Platform (CDP) — to meet these challenges.

article thumbnail

Grouparoo v0.6 release

Grouparoo

The newest release of Grouparoo has a few updates that make working with data easier. Staying sync with your data warehouse. If rows are deleted in your data warehouse, then Grouparoo profiles get deleted. Combine or use logic to make profile properties. Use code to re-mix your data and get the perfect formats. New destinations: Mixpanel, Mailjet Profile deletion Data systems are often quite good at ingesting new data, but things get complicated when it gets deleted.

article thumbnail

Data Quality + Data Lineage = ???

Datakin

Blog Data Quality + Data Lineage = Written by Peter Hicks on Sep 2, 2021 In a prior life, I dwelled in the day-to-day cycles of an e-commerce platform. I worked with a quite generalized system with orders, products, variants, SKUs, and customers that pined for every discount they could come by. The system built around the core business schema was the kind of chaos that data engineers are all too familiar with; large volumes of clickstream data, etl_warehouses, read replicas, and machine learning

Bytes 52
article thumbnail

Terraform Databricks Labs

Advancing Analytics: Data Engineering

In late 2020, Databricks introduced Databricks Labs a collection of Terraform Providers that gives you the ability to deploy nearly all Databricks resources onto Azure and Amazon Web Services (AWS) cloud platforms. Meaning you can deploy Databricks workspace, clusters, secrets, libraries, notebooks and automated jobs (and many more) at the time of provisioning the infrastructure, making it easy to manage and configure Databricks.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Data warehousing is the backbone of every data driven organization , providing mission critical analytics. Today, modern data warehousing has evolved to meet the intensive demands of the newest analytics required for a business to be data driven.

article thumbnail

What is a Data Incident Commander?

Monte Carlo

Incident management isn’t just for software engineers. With the rise of data platforms and the data-as-a-product mentality, building more reliable processes and workflows to handle data quality has emerged as a top concern for data engineers. In a previous post , we discussed how to set up automatic detection and alerting for bad data; now, guest author Glen Willis shares how the best data teams handle triaging and severity assessment for your broken data pipelines with the help of an emerging r

article thumbnail

How Rockset Enables SQL-Based Rollups for Streaming Data

Rockset

Until Now: The Slow Crawl from Batch to Real-Time Analytics The world is moving from batch to real-time analytics but it's been at a crawl. Apache Kafka has made acquiring real-time data more mainstream, but only a small sliver are turning batch analytics, run nightly, into real-time analytical dashboards with alerts and automatic anomaly detection.

SQL 52
article thumbnail

Acquiring is Dead. Long Live Acquiring.

Teradata

Data-driven services can help merchant acquirers add value to their core capabilities. However, to succeed, they need to be armed with the necessary data governance capabilities & know-how.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

The Data Janitor Letters - July 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. Building a data team at a mid-stage startup: a short story Erik Bernhardsson, Working on something, "Bernco" The data culture is driven both from above (the CEO pushing for it) as well as from below (people in the trenches). It's OK to fail if at least you learned something from it.

SQL 52
article thumbnail

A day in the life of a Technical Fellow

Eventbrite Engineering

In my two most recent blog posts, I talked about how to write a Long-Term Technical Vision and a Golden Path. These are future-looking and high-level artifacts so the question I keep hearing is: do I need to give up coding to grow in my career and become a Technical Fellow? In this post I will … Continue reading "A day in the life of a Technical Fellow" The post A day in the life of a Technical Fellow appeared first on Engineering Blog.

Coding 40
article thumbnail

Using Internal Mobility For Growth

Zalando Engineering

Long time readers of this blog will remember that back in 2019, we published a feature on the benefits of rotating engineers between teams. For those of you who have not seen it, the article described an initiative that aimed to establish cross-functional knowledge sharing, encourage cross team collaboration, and bring greater product awareness, by providing engineers with an opportunity to work on different teams within our Developer Productivity department.

article thumbnail

Build Your CFO Analytics Foundation

Teradata

A core finance foundation, supported by the right data management tools, creates a trusted, auditable, and traceable source of all things financial. Read more.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Replacing Segment Computed & SQL Traits With dbt & RudderStack Reverse ETL

RudderStack

Learn to use dbt & RudderStack Reverse ETL to leverage the power of your data warehouse to sync enriched users, audiences, and other data to downstream tools.

SQL 40
article thumbnail

50 ML Projects To Strengthen Your Portfolio and Get You Hired

ProjectPro

The most trusted way to learn and master the art of machine learning is to practice hands-on projects. Projects help you create a strong foundation of various machine learning algorithms and strengthen your resume. But as the saying goes the voyage of a thousand miles starts with a single footstep, we present to you a 50 first steps guide on your machine learning journey.

article thumbnail

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Netflix Tech

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services.

article thumbnail

Designing And Building Data Platforms As A Product

Data Engineering Podcast

Summary The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their experiences building them at various companies, and provide advice on how to treat them like a software product.

Designing 100
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Send Form Data From Marketo to Multiple Destinations Using RudderStack

RudderStack

See how you can leverage RudderStack to easily track Marketo form submissions without disrupting Marketo or your marketing team.

Data 40
article thumbnail

Learner Spotlight: Gino Parages

Dataquest

Meet Gino Parages, a former sales and IT business analyst with no coding skills who decided it was time to learn coding to give his career a boost. He chose Dataquest to help him achieve his learning goals and land the job he wanted. Here’s his story… Q: First, what are your preferred pronouns? A: He/him Q: All right, Gino! What’s your current job title?

article thumbnail

Why Your Data Warehouse Should Be the Foundation of Your CDP

RudderStack

This post explores how RudderStack’s warehouse-first approach separates it from the traditional marketing CDP.

article thumbnail

Using RudderStack To Power Your Machine Learning Models

RudderStack

This post explores three interesting ways you can use RudderStack to unlock the power of machine learning.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. In this episode Dipti Borkar discusses the work that she and her team are doing at Ahana to simplify the work of running your own PrestoDB environment in the cloud.

Data Lake 100