Sat.Feb 27, 2021 - Fri.Mar 05, 2021

article thumbnail

Build your data pipelines like the Toyota Way

François Nguyen

If there is one only book to read about lean manufacturing, this is the one. This is the kind of book you can read again and again and still learn something about your current context. It is also a book you can read whatever your industry, you will always find situations covered by this book. Today, we are going to apply these principles to the data pipelines. “The right process will deliver the right results” – Totoya way (section II) In the 14 Toyota way principles, you have

article thumbnail

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Start Data Engineering

Introduction Pre-requisites Setting up the data-ops pipeline Snowflake Local development environment dbt cloud Connect to Snowflake Link to github repository Setup deployment(release/prod) environment Setup CI PR -> CI -> merge cycle Schedule jobs Host data documentation Conclusion and next steps Further reading References Introduction With companies realizing the importance of having correct data, there has been a lot of attention on the data-ops side of things.

Cloud 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

Confluent

Today, every company is a data company. There are many different data pipeline, integration, and ingestion tools in the market, but before you can feed your data analytics needs, data […].

Kafka 124
article thumbnail

CFO Analytics: What Is It and Why Should You Care?

Teradata

Finance-driven analytics might be the largest untapped opportunity for organizations & a catalyst for driving business value & strategic vision. But, what exactly is CFO analytics?

IT 119
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

International Women’s Day 2021: Challenging what’s possible

Cloudera

This year’s International Women’s Day (IWD) on March 8th comes at a time when global communities, businesses, and governments find themselves continuing to pirouette, pivot, and adapt in the face of a relentless, global pandemic. . COVID-19 has touched every aspect of our lives. As women, overnight we suddenly found that we had a portfolio career – comprising our day jobs, caregiver, school teacher and house cleaner – that we had neither asked for, nor were consulted on. .

Portfolio 111
article thumbnail

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Netflix Tech

Stephanie Lane , Wenjing Zheng , Mihir Tendulkar Source credit: Netflix Within the rapid expansion of data-related roles in the last decade, the title Data Scientist has emerged as an umbrella term for myriad skills and areas of business focus. What does this title mean within a given company, or even within a given industry? It can be hard to know from the outside.

More Trending

article thumbnail

Enhancing Customer Experience with Every Journey

Teradata

Big Tech giants dominate by using data to improve product & experience. The auto industry can emulate this by analyzing data to improve customer experience & guide individual choices.

Data 95
article thumbnail

In-memory Caching in Finance

Data Science Blog: Data Engineering

Big data has been gradually creeping into a number of industries through the years, and it seems there are no exceptions when it comes to what type of business it plans to affect. Businesses, understandably, are scrambling to catch up to new technological developments and innovations in the areas of data processing, storage, and analytics. Companies are in a race to discover how they can make big data work for them and bring them closer to their business goals.

Finance 52
article thumbnail

Space-Time Tradeoff: Examining Snowflake's Compute Cost

Rockset

Imagine you had a big book, and you were looking for the section that talks about dinosaurs. Would you read through every page or use the index? The index will save you a lot of time and energy. Now imagine that it’s a big book with a lot of words in really tiny print, and you need to find all the sections that talk about animals. Using the index will save you a LOT of time and energy.

article thumbnail

Kafka Summit Europe 2021 – A Look at the Agenda

Confluent

As you may have heard, we are hosting not one, not two, but three Kafka Summits in 2021. No matter where you are in the world, there’s a Summit event […].

Kafka 79
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Is the Centralized Data Warehouse Dead?

Teradata

Learn how Teradata's founding vision, along with its technology, has evolved over time to deliver on its core principle: bringing data together to drive analytics that matter.

article thumbnail

Open Source Highlight: PostHog

Data Council

PostHog provides open-source product analytics, which users can deploy on their own infrastructure to collect every event on their website or app without having to send the data to third parties - an increasing source of concern in times of GDPR and CCPA.

Data 52
article thumbnail

Monte Carlo is SOC 2 Certified

Monte Carlo

When it comes to managing your company’s data, security is high on your list of priorities. Today, I’m thrilled to share that Monte Carlo has achieved SOC 2 Type I certification , an industry-leading standard for the security, availability, and confidentiality that our organization adopted. What does this mean for you? Our SOC 2 designation means that Monte Carlo has designed a set of internal controls, systems, policies, and procedures that meet industry best practices for protecting our custom

article thumbnail

5 Tips to Create a Job-Winning Data Science Resume in 2023

ProjectPro

You are about to create the best data science resume out there, but first: Data Scientists are unicorns. However, most of your overburdened hiring managers don't know this. They can't see the wonders you make with data-driven insights.It's all Greek to them. You need to cram all your data science superpowers onto your data science resume to prove that you are the best candidate out there for the open data science job.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Certifying Ripple's System and Organization Controls: SOC 2

Ripple Engineering

More than a year of cross-team collaboration has resulted in an important achievement: Ripple has been awarded the SOC 2 certification ! How do you make a computer system maximally secure and reliable? Disconnect it from all networks and never change any of the software or data. How do you make a computer system maximally useful? Connect it to networks and make frequent changes to the software and data!

Systems 52
article thumbnail

The Future Of Business Intelligence Is Open Source

Preset

It's time for the future of business intelligence to go open source, preventing lock in, providing extensibility, and fostering a community for innovation.

article thumbnail

Why Production Machine Learning Fails — And How To Fix It

Monte Carlo

Machine learning has emerged as a must-have tool for any serious data team: augmenting processes, generating smarter and more accurate predictions, and generally improving our ability to make use of data. However, discussing applications of machine learning, in theory, is much different than actually applying machine learning models at scale in production.

article thumbnail

Building an End to End load test automation system on top of Kubernetes

Zalando Engineering

Introduction At Zalando we continuously invent new ways for customers to interact with fashion. In order to provide an excellent customer experience, we must ensure our systems can technically handle high traffic events such as Cyber Week or other sales campaigns. We have published a detailed article on how Zalando prepares for the Cyberweek. Checkout and payments related systems are particularly important during sales events.

Systems 40
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

The Netflix Cosmos Platform

Netflix Tech

Orchestrated Functions as a Microservice by Frank San Miguel on behalf of the Cosmos team Introduction Cosmos is a computing platform that combines the best aspects of microservices with asynchronous workflows and serverless functions. Its sweet spot is applications that involve resource-intensive algorithms coordinated via complex, hierarchical workflows that last anywhere from minutes to years.

Media 90
article thumbnail

Time-series Analysis With Druid Superset and Prophet

Preset

Time-series analysis with Druid and Superset with in-chart analytics from Facebook's Prophet library.

40
article thumbnail

SQL Dialect differences in Sequelize

Grouparoo

Like many applications, Grouparoo stores data in a relational database. Unlike most applications, Grouparoo works with 2 different types of databases - Postgres and SQLite. We enable our customers to run Grouparoo in a number of different ways - on their laptop with no external decencies, and as part of a large cluster with many servers processing data in parallel.

SQL 52
article thumbnail

Recommender Systems Python-Methods and Algorithms

ProjectPro

Welcome to the World of Recommender Systems!!! Table of Contents What is a Recommender System? Recommender Systems – An Introduction Types of Recommender Systems 1) Content-Based Filtering 2) Collaborative Filtering Content-Based Recommender Systems Grab Some Popcorn and Coke –We’ll Build a Content-Based Movie Recommender System Analyzing Documents with TI-IDF Creating a TF-IDF Vectorizer Calculating the Cosine Similarity – The Dot Product of Normalized Vectors It’s

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

The New Rules of Data Quality

Monte Carlo

There are two types of data quality issues in this world: those you can predict (known unknowns) and those you can’t (unknown unknowns). Here’s how some of the best data teams are taking a more comprehensive approach to tackling both of them at scale. For the past several years, data teams have leveraged the equivalent of unit testing to detect data quality issues.

article thumbnail

Build a Slack Activity Dashboard Using Airbyte and Superset

Preset

In this post, we'll walk through how to use Airbyte with Superset to build a Slack dashboard.

article thumbnail

How we use GraphQL at Europe's largest fashion e-commerce company

Zalando Engineering

Background Today's large scale organizations leveraging microservice architecture face a plethora of problems at the data aggregation and presentation layers. Managing consistent and backwards-compatible APIs for Web and Mobile App frontends is definitely one of the complex ones. The balance between a frontend developer's need for consistent data source and of product managers for delivering new features quickly in a fast-paced, large organization is a tough nut to crack.

article thumbnail

Bridging The Gap Between Machine Learning And Operations At Iguazio

Data Engineering Podcast

Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize the technologies necessary to make machine learning operations maintainable.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

Using SQL to democratize streaming data

Cloudera

Streaming analytics is crucial to modern business – it opens up new product opportunities and creates massive operational efficiencies. In many cases, it’s the difference between creating an outstanding customer experience versus a poor one – or losing the customer altogether. However, in the typical enterprise, only a small team has the core skills needed to gain access and create value from streams of data.

SQL 114