February, 2021

article thumbnail

Build your data pipelines like the Toyota Way

François Nguyen

If there is one only book to read about lean manufacturing, this is the one. This is the kind of book you can read again and again and still learn something about your current context. It is also a book you can read whatever your industry, you will always find situations covered by this book. Today, we are going to apply these principles to the data pipelines. “The right process will deliver the right results” – Totoya way (section II) In the 14 Toyota way principles, you have

article thumbnail

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Start Data Engineering

Introduction Pre-requisites Setting up the data-ops pipeline Snowflake Local development environment dbt cloud Connect to Snowflake Link to github repository Setup deployment(release/prod) environment Setup CI PR -> CI -> merge cycle Schedule jobs Host data documentation Conclusion and next steps Further reading References Introduction With companies realizing the importance of having correct data, there has been a lot of attention on the data-ops side of things.

Cloud 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Why Data Capabilities Follow Up a Digital Transformation

Team Data Science

Companies can now make data useful to elevate decision making and to optimise products and processes. But what organizational capabilities are necessary and how to get started? It's currently easy to acquire data strategically. First, consider that smartphones function like questionnaires that customers are frequently filling out in a passive or active manner [ , 1 ].

Food 130
article thumbnail

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Confluent

One of the great things about using an Apache Kafka® based architecture is that it naturally decouples systems and allows you to use the best tool for the job. While […].

Kafka 145
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Self Service Open Source Data Integration With AirByte

Data Engineering Podcast

Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly short. The team at Airbyte is adding a new entry to that list with the goal of making robust and easy to use data integration more accessible to teams who want or need to maintain full control of thei

article thumbnail

Is Your Data Holding You Back Instead of Driving You Forward?

Teradata

Everyone knows that data is vital for success in retail. But without a clear data strategy, retailers often eat up resources fighting small-scale battles, whilst gradually losing the war.

Retail 112

More Trending

article thumbnail

Apache Superset Tutorial

Start Data Engineering

Why data exploration Apache Superset architecture Setup Prerequisites Seed data Using Apache Superset 1. Connecting to a data warehouse 2. Querying data in SQL Lab 3. Creating a chart 4. Creating a dashboard Pros and Cons Pros Cons Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people.

article thumbnail

#ClouderaLife Spotlight: Kevin Smith, Staff Customer Operations Engineer

Cloudera

Meet Kevin Smith, a Staff Customer Operations Engineer within the US Public Sector support team. He sums up his day-to-day by saying he works directly with clients on technical cases and provides support and guidance as they troubleshoot unexpected behavior. He also serves as a member of several project teams focusing on upgrade experiences, internal tools, product testing, training, and documentation.

article thumbnail

Lessons Learned from Running Apache Kafka at Scale at Pinterest

Confluent

Apache Kafka® is at the heart of the data transportation layer at Pinterest. The amount of data that runs through Kafka has constantly grown over the years. This growth sometimes […].

Kafka 145
article thumbnail

Building The Foundations For Data Driven Businesses at 5xData

Data Engineering Podcast

Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed.

Building 100
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java 98
article thumbnail

Is Devops the future of Agile ?

François Nguyen

Let’s start with maybe the best definition you can find on Devops (credit to AWS ) : “DevOps is the combination of cultural philosophies , practices , and tools that increases an organization’s ability to deliver applications and services at high velocity : evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

AWS 130
article thumbnail

How to Join a fact and a type 2 dimension (SCD2) table

Start Data Engineering

Introduction What is an SCD2 table and why use it? Application table Dimension table Setup Joining fact and SCD2 tables high_spenders user_items Educating end users Conclusion Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tables are used to record a business event and dimension tables are used to record the attributes of business items(eg user, item tables in an e-commerce app).

article thumbnail

Data – the Octane Accelerating Intelligent Connected Vehicles

Cloudera

The digital revolution is making a deep impact on the automotive industry, offering practically unlimited possibilities for more efficient, convenient, and safe driving and travel experiences in connected vehicles. This revolution is just beginning to accelerate – in fact, according to a recent Applied Market Research study, the global connected car market was valued at $63.03 billion in 2019, and is projected to reach $225.16 billion by 2027, registering a CAGR of 17.1% from 2020 to 2027.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Confluent

Soon, Apache Kafka® will no longer need ZooKeeper! With KIP-500, Kafka will include its own built-in consensus layer, removing the ZooKeeper dependency altogether. The next big milestone in this effort […].

Kafka 145
article thumbnail

How Shopify Is Building Their Production Data Warehouse Using DBT

Data Engineering Podcast

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify.

article thumbnail

Pitching a DataOps Project That Matters

DataKitchen

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project 98
article thumbnail

The agile manifesto : 20 years later

François Nguyen

Or Robert C Martin, this uncle you should pay a visit more often. Where was I 20 years ago at that time when these 17 brillant folks were in a ski station for the Agile Manifesto ? I was part of a small team with great individuals and in fact we were an alternative to IT unable to deliver what we wanted. So we are going to do it ourselves. Without knowing it, we were totally in that agile mindset : valuing interactions, working software, our collaborations with the users and be able to change be

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Packaging award-winning shows with award-winning technology

Netflix Tech

By Cyril Concolato Introduction In previous blog posts, our colleagues at Netflix have explained how 4K video streams are optimized , how even legacy video streams are improved and more recently how new audio codecs can provide better aural experiences to our members. In all these cases, prior to being delivered through our content delivery network Open Connect , our award-winning TV shows, movies and documentaries like The Crown need to be packaged to enable crucial features for our members.

article thumbnail

Express Cloudera POV on 2021 data trends in insurance

Cloudera

Almost a year into the pandemic, the accelerated digital transformation has begun to feel less abrupt and more sustained. 2021 looks likely to be defined by a new phase: Thriving on digital transformation, rather than just surviving through it. . We’ve written about the changes forced on the traditionally risk-averse insurance industry by COVID-19. In 2021, with the crisis hopefully fading, insurance will have time to evaluate the changes made in 2020, assessing what worked and what didn’t

Insurance 106
article thumbnail

Introducing Confluent Platform 6.1

Confluent

We are pleased to announce the release of Confluent Platform 6.1. With this release, we are further simplifying management tasks for Apache Kafka® operators and providing even higher availability for […].

Kafka 142
article thumbnail

From Product Cycle to Digital Thread

Teradata

In order to survive, the auto industry needs to leverage 'digital threads’ that connect data from customers to dealers to products, & link R&D to production line & the aftermarket.

Data 69
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Data-driven performance improvements: Football and retail execution

Retail Insight

When I left school to start a professional football career, I understood very little about data – I did keep a note of the goals I scored, the assists I made and, most likely, the keepie-ups I could perform, but that was about it.

Retail 52
article thumbnail

Declarative Data Sync

Grouparoo

Developers have been using the Grouparoo UI to set up automated data movement from their databases to Mailchimp, Marketo, Salesforce, and more. While having these integrations already written for them saved plenty of time, there was something they missed: their normal developer workflow. Grouparoo now supports declarative data models and integrations to continuously sync your data to all of your cloud-based tools.

Data 52
article thumbnail

Intro to databases on Azure: Basics for aspiring data engineers

A Cloud Guru: Data Engineering

How do you get started with an Azure database? As a database novice or someone new to Microsoft Azure, there are so many options it can be hard to know where to begin. Which is right for you as you get started on the path to becoming a data engineer? Let’s turn the question around […] The post Intro to databases on Azure: Basics for aspiring data engineers appeared first on A Cloud Guru.

article thumbnail

Cloudera Operational Database application development concepts

Cloudera

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). . If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. . In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database.

Database 104
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Oracle CDC Source Premium Connector is Now Generally Available

Confluent

One of the most common relational database systems that connects to Apache Kafka® is Oracle, which often holds highly critical enterprise transaction workloads. While Oracle Database (DB) excels at many […].

article thumbnail

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

article thumbnail

Data Observability: How Blinkist Prevents Broken Data Pipelines at Scale with Monte Carlo

Monte Carlo

Companies spend upwards of $15 million an nually tackling data downtime , in other words, periods of time where data is missing, broken, or otherwise erroneous, and over 88 percent of U.S. bu sinesses have lost money as a result of data quality issues. Fortunately, there’s hope in the n ext frontier of data engineering: observability. Here’s how the data engineering team at Blinkist, a book-summarizing subscription service, increases cost savings, collaboration, and productivity with data observ

article thumbnail

Rockset Is Up to 9.4x Faster than Apache Druid on the Star Schema Benchmark

Rockset

Rockset released new numbers for the Star Schema Benchmark in April 2022. Learn how Rockset is 1.67 times faster than ClickHouse and 1.12 times faster than Druid in the latest performance blog post. Real-time analytics is all about deriving insights and taking actions as soon as data is produced. When broken down into its core requirements, real-time analytics means two things: access to fresh data and fast responses to queries.

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.