Sat.Jun 04, 2022 - Fri.Jun 10, 2022

article thumbnail

An In-Depth Data Mesh Discussion with Zhamak Dehghani

Jesse Anderson

In 2021 I had the pleasure to first get to know and speak with Zhamak Dheghani, Director of Emerging Technologies at ThoughtWorks, in season one of the Data Dream Team series. Zhamak is a software engineer and architect who is (in)famously known as the founder of the data mesh concept, a paradigm shift in how we manage data-driven value at scale. I interviewed Zhamak last season as more of an introduction to Data Mesh.

article thumbnail

Python: The programming language of machine learning

KDnuggets

You can't avoid learning Python if you work on machine learning problems. You need to know what other people's code means and you need to convey your ideas to them too.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Data Engineering Podcast

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that the second best way is to build a storage system dedicated to securely managing your sensitive information and making it easy to integrate with your applications and data systems. In this episode Sean Falconer explains the idea of a data privacy vault and how this new architectural element can drastically reduce the potential for making a mistake wit

article thumbnail

The Future Is Hybrid Data, Embrace It

Cloudera

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB. In fact, the total amount of data is expected to nearly triple by 2025.

IT 114
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

A Model Implementation

Teradata

How do you take the first steps to free the power of analytics from on-premise systems whilst protecting valuable data and de-risking transformation? Find out more.

Systems 85
article thumbnail

Learn MLOps with This Free Course

KDnuggets

Learn to train and track your experiments, create ML pipelines, model deployment, monitor the performance in production, and adopt best practices from DevOps.

159
159

More Trending

article thumbnail

#ClouderaLife Spotlight: Hassan Mirza

Cloudera

In this #ClouderaLife Spotlight Hassan talks about three life themes that have kept him moving and motivated: learning from his father’s work ethic despite his family’s forcible displacement from their country of origin, his early experience with organized sports, and the value of mentorship. Hassan describes how these experiences led him to give back to his family and community by becoming a Mental Health First Aider and a mentor for refugees seeking a better life.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

It’s the start of June. That means it’s time to start taking summer vacations and enjoying some fresh juice alongside your fresh news! Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community.

article thumbnail

NLP, NLU, and NLG: What’s The Difference? A Comprehensive Guide

KDnuggets

This article aims to quickly cover the similarities and differences between NLP, NLU, and NLG and talk about what the future for NLP holds.

160
160
article thumbnail

How Confluent Treats Incidents in the Cloud

Confluent

Fast infrastructure growth often comes with issues. Don't panic - learn from them! Here's how we analyze, monitor, and fix incidents at Confluent, and what we do to prevent risk.

Cloud 52
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Apache Hop 2.0 released!

know.bi

The Apache Hop PMC and community released Apache Hop 2.0.0 late last week. This is the second major release of the platform and the first major release after Hop graduated as a Top-Level ASF Project.

Project 52
article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

It’s the start of June. That means it’s time to start taking summer vacations and enjoying some fresh juice alongside your fresh news! Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community.

article thumbnail

A Structured Approach To Building a Machine Learning Model

KDnuggets

This article gives you a glimpse of how to approach a machine learning project with a clear outline of an easy-to-implement 5-step process.

article thumbnail

How Do We Transform and Model Data at Cloud Academy?

Cloud Academy

How Do We Transform and Model Data at Cloud Academy? “Data is the new gold”: a common phrase over the last few years. For all organizations, data and information have become crucial to making good decisions for the future and having a clear understanding of how they’re making progress — or otherwise. At Cloud Academy, we strive to make data-informed decisions.

Cloud 52
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Is the 4-Year Degree Obsolete?

Elder Research

The post Is the 4-Year Degree Obsolete? appeared first on Elder Research.

52
article thumbnail

Bringing The Modern Data Stack To Everyone With Y42

Data Engineering Podcast

Summary Cloud services have made highly scalable and performant data platforms economical and manageable for data teams. However, they are still challenging to work with and manage for anyone who isn’t in a technical role. Hung Dang understood the need to make data more accessible to the entire organization and created Y42 as a better user experience on top of the "modern data stack" In this episode he shares how he designed the platform to support the full spectrum of technical ex

MongoDB 100
article thumbnail

How is Data Mining Different from Machine Learning?

KDnuggets

How about we take a closer look at data mining and machine learning so we know how to catch their different ends?

article thumbnail

Streaming Edge Data Collection and Global Data Distribution

Cloudera

In the first blog of the Universal Data Distribution blog series , we discussed the emerging need within enterprise organizations to take control of their data flows. From origin through all points of consumption both on-prem and in the cloud, all data flows need to be controlled in a simple, secure, universal, scalable, and cost-effective way. With the rapid increase of cloud services where data needs to be delivered (data lakes, lakehouses, cloud warehouses, cloud streaming systems, cloud busi

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Building An External Data Product Is Different. Trust Me. (but read this anyway)

Monte Carlo

The data world moves unapologetically fast. It seems like just last year we started talking about how data teams were transitioning from providing a service, to treating data like a product or even building internal products across a decentralized data mesh architecture. Wait, that was *checks notes* January of this year?? Wow. Who knows, maybe Ferris Bueller became a data engineer.

article thumbnail

Scaling Appsec at Netflix (Part 2)

Netflix Tech

By Astha Singhal , Lakshmi Sudheer , Julia Knecht The Application Security teams at Netflix are responsible for securing the software footprint that we create to run the Netflix product, the Netflix studio, and the business. Our customers are product and engineering teams at Netflix that build these software services and platforms. The Netflix cultural values of ‘Context not Control’ and ‘Freedom and Responsibility’ strongly influence how we do Security at Netflix.

article thumbnail

Top Posts May 30 – June 5: 21 Cheat Sheets for Data Science Interviews

KDnuggets

Also: Decision Tree Algorithm, Explained; How to Become a Machine Learning Engineer; The Complete Collection of Data Science Books – Part 2; 15 Python Coding Interview Questions You Must Know For Data Science.

article thumbnail

Cloudera’s Applied ML Prototype Catalog Continues to Grow

Cloudera

Here at Cloudera, we’re committed to helping make the lives of data practitioners as painless as possible. For data scientists, we continue to provide new Applied Machine Learning Prototypes (AMPs), which are open source and available on GitHub. These pre-built reference examples are complete end-to-end data science projects. In Cloudera Machine Learning (CML), you can deploy them with the single click of a button, bringing data scientists that much closer to providing value.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

MongoDB vs DynamoDB Head-to-Head: Which Should You Choose?

Rockset

Note: We have updated this post to reflect comments and corrections we received from readers. We thank those who sent in comments for helping us make this post more accurate and useful. — Editor Databases are a key architectural component of many applications and services. Traditionally, organizations have chosen relational databases like SQL Server, Oracle , MySQL and Postgres.

MongoDB 52
article thumbnail

Accelerate testing in Apache Airflow through DAG versioning

Zalando Engineering

Introduction In the Performance Marketing department, we run paid advertisement campaigns for Zalando. To do so, we build services that allow us to manage campaigns, optimize and distribute content, and measure the performance of the campaigns at scale. Talking about measurement, one of the core systems we’ve built and continuously extended over the years is our so-called marketing ROI (return on investment) pipeline.

article thumbnail

KDnuggets News, June 8: 21 Cheat Sheets for Data Science Interviews; Top 18 Data Science Group on LinkedIn

KDnuggets

21 Cheat Sheets for Data Science Interviews; Top 18 Data Science Group on LinkedIn; A Beginner's Guide to Q Learning; 3 Ways Understanding Bayes Theorem Will Improve Your Data Science; Machine Learning Is Not Like Your Brain Part 3: Fundamental Architecture.

article thumbnail

Snowflake Observability and 4 Reasons Data Teams Should Invest In It

Monte Carlo

Adopting a cloud data warehouse like Snowflake is an important investment for any organization that wants to get the most value out of their data. The Forrester’s Total Economic Impact of Snowflake report uncovered a customer ROI of 612% with total benefits of more than $21 million across three years. ? This immediate value is just scratching the surface.

IT 52
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Understanding Functions for Data Science

KDnuggets

Most data science problems boil down to finding the mathematical function that describes the relationship between feature and target variables.

article thumbnail

Must-haves on Your Data Science Resume

KDnuggets

Recruiters look at a resume for 7.4 seconds before making a decision on the candidate. So that means you have basically less than 10 seconds to make a good impression. 10 seconds is not a lot of time; especially when you really want this job.

article thumbnail

European AI Act: The Simplified Breakdown

KDnuggets

The AI act aims to ensure excellence in the EU, provide the correct conditions for the development of AI and guarantee that AI systems are beneficial to people.

Systems 110
article thumbnail

Top 18 Data Science Facebook Groups

KDnuggets

Join the best data science groups on Facebook to share insights and experiences, ask for guidance, and build valuable connections.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.