Sat.Oct 15, 2022 - Fri.Oct 21, 2022

article thumbnail

Pollen’s enormous debt left behind: exclusive details

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe. Pollen, the events festival tech startup, went bankrupt in August after raising more than $200M in venture funding. In an exclusive investigative article , I covered the events and details leading up this bankruptcy.

Banking 130
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Independent Anniversary

Jesse Anderson

I have a calendar reminder that tells me when I founded Big Data Institute. It just told me I founded the company eight years ago. The reminder is called “Independent Anniversary.” It’s the day I split off and executed my vision for an independent, big data consulting company. Independence has all sorts of manifestations. For you, it’s an independent look at technology and vendors from someone who’s worked at a vendor (Cloudera) and worked in distributed systems for even longer.

article thumbnail

Frameworks for Approaching the Machine Learning Process

KDnuggets

This post is a summary of 2 distinct frameworks for approaching machine learning tasks, followed by a distilled third. Do they differ considerably (or at all) from each other, or from other such processes available?

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

Data Engineering Podcast

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Dremio is one of the companies leading the development of products and services that support the open lakehouse. In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to.

Data Lake 100
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

More Trending

article thumbnail

Working With Sparse Features In Machine Learning Models

KDnuggets

Sparse features can cause problems like overfitting and suboptimal results in learning models, and understanding why this happens is crucial when developing models. Multiple methods, including dimensionality reduction, are available to overcome issues due to sparse features.

article thumbnail

Speeding Up The Time To Insight For Supply Chains And Logistics With The Pathway Database That Thinks

Data Engineering Podcast

Summary Logistics and supply chains are under increased stress and scrutiny in recent years. In order to stay ahead of customer demands, businesses need to be able to react quickly and intelligently to changes, which requires fast and accurate insights into their operations. Pathway is a streaming database engine that embeds artificial intelligence into the storage, with functionality designed to support the spatiotemporal data that is crucial for shipping and logistics.

Database 100
article thumbnail

Public SQL Endpoints in Rockset

Rockset

Introduction Making use of real-time data for analytics is a deeply collaborative project. We’ve helped data engineers, data architects, engineering leaders, ML teams, and product managers connect the dots between various systems to deliver on Rockset’s promise of fast queries on fresh data. Not only are we collaborating with customers on analytics projects, we use our own product daily and collaborate across teams internally.

SQL 52
article thumbnail

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Cloudera

The telecommunications industry continues to develop hybrid data architectures to support data workload virtualization and cloud migration. However, while the promise of the cloud remains essential — not just for data workloads but also for network virtualisation and B2B offerings — the sheer volume and scale of data in the industry require careful management of the “journey to the cloud.”.

Cloud 78
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

7 Free Platforms for Building a Strong Data Science Portfolio

KDnuggets

Outshine others and increase your odds of getting hired by maintaining a data science portfolio with projects, resumes, blogs, and reports.

Portfolio 159
article thumbnail

Data and Analytics Keep the Wheels on the Bus!

Teradata

The complexity of modern vehicles means that spotting root-causes that prevent them from working is difficult. Mechanics, operators & OEMs must step into a new era of digital data-based diagnostics.

Data 52
article thumbnail

Building Real-Time Recommendations with Kafka, S3, Rockset and Retool

Rockset

Real-time customer 360 applications are essential in allowing departments within a company to have reliable and consistent data on how a customer has engaged with the product and services. Ideally, when someone from a department has engaged with a customer, you want up-to-date information so the customer doesn’t get frustrated and repeat the same information multiple times to different people.

Kafka 52
article thumbnail

Cybersecurity: A Big Data Problem

Cloudera

Information technology has been at the heart of governments around the world, enabling them to deliver vital citizen services, such as healthcare, transportation, employment, and national security. All of these functions rest on technology and share a valuable commodity: data. . Data is produced and consumed in ever-increasing amounts and therefore must be protected.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

A Data Science Portfolio That Will Land You The Job in 2022

KDnuggets

Check out this article on crafting a data science portfolio that will get you that job. And learn 4 resume mistakes to avoid at any cost.

Portfolio 150
article thumbnail

What Is Data Collection? Methods, Types, Tools, and Techniques

U-Next

Introduction . The primary goal of data collection is to gather high-quality information that aims to provide responses to all of the open-ended questions. Businesses and management can obtain high-quality information by collecting data that is necessary for making educated decisions. . It is necessary to gather data to draw conclusions and decide what is factual to increase the quality of the information. .

article thumbnail

React SEO: How To Optimize React Websites for SEO

Trio

React enables much of the modern web you’re familiar with: fluid, responsive, and animation-rich websites. It’s no wonder that React.js is the most used JavsScript framework for web development, according to the 2021 State of JavaScript survey.

article thumbnail

Apache Hop 2.1.0 is available

know.bi

The Apache Hop team just released version 2.1.0. This new release is the result of four and a half months of work on over 200 tickets and comes packed with new functionality, bug fixes and improvements.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

5 Free Courses to Master Calculus

KDnuggets

Calculus is one of the foundational pillars of understanding the mathematics behind machine learning algorithms. The post shares five free courses to help you master calculus and learn its real-world applications.

Algorithm 112
article thumbnail

Medical Datasets for Machine Learning: Aims, Types and Common Use Cases

AltexSoft

Everyday the global healthcare system generates tons of medical data that — at least, theoretically — could be used for machine learning purposes. Regardless of industry, data is considered a valuable resource that helps companies outperform their rivals, and healthcare is not an exception. In this post, we’ll briefly discuss challenges you face when working with medical data and make an overview of publucly available healthcare datasets, along with practical tasks they help solve.

Medical 52
article thumbnail

Recovering from Crashes with Safe Mode

Lyft Engineering

Feature flags are everywhere in modern software development: They’re a great tool for running A/B experiments, slowly rolling out changes to users, and even turning off problematic codepaths during incidents. When an engineer implements a new feature, it’s practically second-nature to gate it behind a feature flag. While this practice is largely beneficial for the most part, incidents are occasionally caused when a feature flag enables a buggy codepath and causes a crash or an otherwise degraded

article thumbnail

Announcing ksqlDB 0.28.2 and Improvements to ksqlDB in Confluent Cloud

Confluent

With ksqlDB 0.28.2 comes an easier getting started experience with auto-topic import,, new query options, and more support for data pipelines.

Cloud 52
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

KDnuggets Top Posts for September 2022: Free Python for Data Science Course

KDnuggets

Free Python for Data Science Course • 7 Machine Learning Portfolio Projects to Boost the Resume • Free Algorithms in Python Course • How to Select Rows and Columns in Pandas • 5 Data Science Skills That Pay & 5 That Don't • Everything You’ve Ever Wanted to Know About Machine Learning • Free SQL and Database Course • 7 Data Analytics Interview Questions & Answers.

article thumbnail

What Is Customer Journey Mapping, and Why Is It Important?

U-Next

Introduction . Customer behavior is very difficult to interpret. When you think you have finally understood them completely, some new factor jumps in and proves you wrong. Here is where customer journey mapping comes into the picture. Consider the time when the customer first interacts with your business at the beginning of the journey and when the customers make a purchase from your business at the end of the journey.

IT 40
article thumbnail

How to get your data from an AWS RDS database into Snowflake | Propel Data Analytics Blog

Propel Data

Learn how to move data from Amazon RDS into Snowflake so that it can be used for analytics.

article thumbnail

Lyft Employees Making Impact Through Self-led Fundraisers

Lyft Engineering

When the Roe v. Wade decision was announced, a lot of us felt disheartened.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Implementing Adaboost in Scikit-learn

KDnuggets

It is called Adaptive Boosting due to the fact that the weights are re-assigned to each instance, with higher weights being assigned to instances that are not correctly classified - therefore it ‘adapts’.

IT 110
article thumbnail

How To Creating, Communicating, and Delivering Value in Strategic Sales Management?

U-Next

Introduction . The success of the company is largely dependent on the sales department of any organization. The one-of-a-kind and crucial function of sales is to fill the gap between potential customers’ wants and the items or services that the company provides. Sales greatly impact how customers become loyal and how much faith they have in brands.

article thumbnail

Designing Events and Event Streams: Introduction and Best Practices

Confluent

Designing Events and Event Streams: Introduction and Best Practices.

article thumbnail

Stronger together: Python, dataframes, and SQL

dbt Developer Hub

For years working in data and analytics engineering roles, I treasured the daily camaraderie sharing a small office space with talented folks using a range of tools - from analysts using SQL and Excel to data scientists working in Python. I always sensed that there was so much we could work on in collaboration with each other - but siloed data and tooling made this much more difficult.

SQL 52
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.