Sat.Oct 30, 2021 - Fri.Nov 05, 2021

article thumbnail

Airflow Timetable: Schedule your DAGs like never before

Marc Lamberti

Airflow Timetable. This new concept introduced in Airflow 2.2 is going to change your way of scheduling your data pipelines. Or I would say, you’re finally going to have all the freedom and flexibility you ever dreamt of for scheduling your DAGs. What if you want to run your DAG for specific schedule intervals with “holes” in between?

article thumbnail

Design Patterns for Machine Learning Pipelines

KDnuggets

ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Future of SQL: Databases Meet Stream Processing

Confluent

SQL has proven to be an invaluable asset for most software engineers building software applications. Yet, the world as we know it has changed dramatically since SQL was created in […].

SQL 131
article thumbnail

Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL

Data Engineering Podcast

Summary The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository of information about how your customers interact with your organization they drove a wave of analytics about how to improve products based on actual usage data. A natural outgrowth of that capability is the more recent growth of reverse ETL systems that use those analytics to feed back into the operational systems used to engage with the customer.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows. The DataKitchen Platform is a “ process hub” that masters and optimizes those processes.

Process 98
article thumbnail

ORDAINED: The Python Project Template

KDnuggets

Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.

Python 159

More Trending

article thumbnail

Cloudera Ireland Center of Excellence Certified as a Great Place to Work

Cloudera

Today is an exciting day for Cloudera as our Ireland Centre of Excellence (COE) in Cork has been certified as a Great Place To Work. It is an outstanding achievement that is testament to the culture of Cloudera and we’re delighted that we smashed many of the set benchmarks. To achieve certification we needed a composite score of >64.5% on the Employee Engagement Survey and Culture Audit Submission.

article thumbnail

The vast majority of data engineers are burnt out. Those working in healthcare are no exception

DataKitchen

The post The vast majority of data engineers are burnt out. Those working in healthcare are no exception first appeared on DataKitchen.

article thumbnail

Machine Learning Safety: Unsolved Problems

KDnuggets

There remain critical challenges in machine learning that, if left resolved, could lead to unintended consequences and unsafe use of AI in the future. As an important and active area of research, roadmaps are being developed to help guide continued ML research and use toward meaningful and robust applications.

article thumbnail

Readings in Streaming Database Systems

Confluent

What will the next important category of databases look like? For decades, relational databases were the undisputed home of data. They powered everything: from websites to analytics, from customer data […].

Database 119
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

#ClouderaLife Spotlight: William Dailey, Senior Technical Instructor

Cloudera

On November 11 th we celebrate Veterans and Armistice Day honoring those who have served in the military. To commemorate this special occasion, this month, we will spotlight two Clouderans who have served in the military both in the United States and the United Kingdom. For this week’s spotlight, I sat down with Clouderan William Dailey who served in the United States Navy.

article thumbnail

Battle for Data Pros Heats Up as Burnout Builds

DataKitchen

The post Battle for Data Pros Heats Up as Burnout Builds first appeared on DataKitchen.

article thumbnail

Data Scientist Career Path from Novice to First Job

KDnuggets

If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.

Education 158
article thumbnail

4 Key Design Principles and Guarantees of Streaming Databases

Confluent

Classic relational database management systems (RDBMS) distribute and organize data in a relatively static storage layer. When queries are requested, they compute on the stored data and then return results […].

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

A Fresh Squeeze on Data

Cloudera

Guest Author Roozbeh Aliabadi is CEO at ReadyAI. Our children have the right to be AI-educated so they can thrive intellectually, emotionally, and morally alongside AI. In the next decade or so, for most children, AI will be their co-workers, drivers, insurance agents, customer service reps, bank tellers, receptionists, radiologists, in short, a natural part of their lives.

article thumbnail

Case Study: Powering Customer-Facing Dashboards at Scale Using Rockset with PostgreSQL at DataBrain

Rockset

Summary: DataBrain, a SaaS company, was using PostgreSQL through Amazon RDS to land and query incoming customer data. However, PostgreSQL couldn’t scale, quickly ingest schemaless data, or efficiently run analytics as DataBrain’s data grew. Plus, incoming customer data had a dynamic schema, making it painful and expensive for DataBrain to clean the data for PostgreSQL and run queries.

article thumbnail

AI Infinite Training & Maintaining Loop

KDnuggets

Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.

Designing 156
article thumbnail

The Data Janitor Letters - October 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. Eating the Cloud from Outside In Shawn Wang, Developer Experience, Temporal.io AWS is playing Chess. Cloudflare is playing Go. Why Lightspeed invested in ClickHouse: a database built for speed Gaurav Gupta, VC, Lightspeed Venture Partners $250M Series B financing of ClickHouse.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

7 Step Guide to Become a Freelance Data Scientist

ProjectPro

If you are tired of googling how to become a freelance data scientist , you need to relax because your search is finally over. In this blog, we have presented a step by step guide for becoming a freelance data scientist and a quick and easy way of getting hired as a freelance data scientist. So, take a backseat and simply continue reading our blog. With COVID-19 restrictions forcing companies to lay off their employees, millions of individuals who lost their jobs decided to navigate a freelance

article thumbnail

Getting Started with Apache Spark, S3 and Rockset for Real-Time Analytics

Rockset

Apache Spark is an open-source project that was started at UC Berkeley AMPLab. It has an in-memory computing framework that allows it to process data workloads in batch and in real-time. Even though Spark is written in Scala, you can interact with Spark with multiple languages like Spark, Python, and Java. Here are some examples of the things you can do in your apps with Apache Spark: Build continuous ETL pipelines for stream processing SQL BI and analytics Do machine learning, and much more!

Scala 52
article thumbnail

NLP for Business in the Time of BERTera: Seven Misplaced Passions

KDnuggets

This article is a brief summary of our observations on some common client misperceptions with respect to recent developments in NLP, especially the use of large-scale models and datasets.

Datasets 145
article thumbnail

Three Essential Elements of a Digital Fabric for Automotive

Teradata

Auto businesses must quickly evolve to become data-centric. Establishing & pulling on the digital threads that connect data through every aspect of the lifecycle of a vehicle will be critical.

Data 52
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

10 Tips to Overcome Data Engineer Burnout

DataKitchen

data.world's Bryon Jacob & DataKitchen's Chris Bergh discuss why Data Engineers are burnt out & how data teams can fix & prevent burnout with DataOps. The post 10 Tips to Overcome Data Engineer Burnout first appeared on DataKitchen.

article thumbnail

How The Modern Data Stack Is Reshaping Data Engineering

Preset

Data engineering is being reshaped heavily by different trends in the modern data stack arena. Superset creator Max Beauchemin shares his take.

article thumbnail

A First Principles Theory of Generalization

KDnuggets

Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.

160
160
article thumbnail

If Facebook Can Go Down, What About Your Cloud Provider?

Teradata

Banks’ reliance on a handful of global cloud providers presents regulators with a new headache. Find out more.

Cloud 52
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Introducing RudderStack's New, High-performance JavaScript SDK

RudderStack

We're thrilled to introduce our new, high-performance JavaScript SDK. We reduced the package size by 70% and the load time by 60%. Read our blog to learn more.

40
article thumbnail

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.

article thumbnail

7 of The Coolest Machine Learning Topics of 2021 at ODSC West

KDnuggets

At our upcoming event this November 16th-18th in San Francisco, ODSC West 2021 will feature a plethora of talks, workshops, and training sessions on machine learning topics, deep learning, NLP, MLOps, and so on. You can register now for 20% off all ticket types, or register for a free AI Expo Pass to see what some big names in AI are doing now.

article thumbnail

If Facebook Can Go Down, What About Your Cloud Provider?

Teradata

Banks’ reliance on a handful of global cloud providers presents regulators with a new headache. Find out more.

Cloud 52
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.