Sat.Feb 05, 2022 - Fri.Feb 11, 2022

article thumbnail

Managing Your Reusable Python Code as a Data Scientist

KDnuggets

Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.

Python 158
article thumbnail

Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets

Data Engineering Podcast

Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores the many ways that sensitive data can be leaked, re-identified, or otherwise be at risk, as well as the different strategies that can be employed to mitigate those attack vectors.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

#ClouderaLife Spotlight: Marque Blackman, Director of Global Workplace

Cloudera

As we celebrate Black History Month, for this Employee Spotlight I sat down with Marque Blackman, co-lead of the Cloudera Black Employee Network (CBEN). We discussed his experience at Cloudera, his career transitions, and what he learned along the way. We also discussed his work with CBEN and his perspective on Black History Month. Meet Marque Blackman, Director of Global Workplace .

article thumbnail

New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time

DataKitchen

The post New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time first appeared on DataKitchen.

Data 98
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

The Complete Collection of Data Science Cheat Sheets – Part 1

KDnuggets

A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.

article thumbnail

ETL Testing Process

Grouparoo

Today, organizations are adopting modern ETL tools and approaches to gain as many insights as possible from their data. However, to ensure the accuracy and reliability of such insights, effective ETL testing needs to be performed. So what is an ETL tester’s responsibility? In this ETL testing tutorial, we’ll look at what ETL testing involves, the different types of ETL tests, and some challenges of ETL testing.

Process 52

More Trending

article thumbnail

Palantir Developers: Learn to build in Palantir Foundry

Palantir

Introducing new resources for developers to elevate their impact in Foundry. Everyone in an organization should be able to use the right data to make the best decisions. That’s why Palantir is committed to making Foundry as intuitive and accessible as possible — not only for data scientists and engineers, but also for sales, product development, recruiting, and more.

article thumbnail

How to Learn Math for Machine Learning

KDnuggets

So how much math do you need to know in order to work in the data science industry? The answer: Not as much as you think.

article thumbnail

Monte Carlo Data Observability Insights Now Available in the Snowflake Data Marketplace

Monte Carlo

Is your data quality improving? What is your most used data? Where in the pipeline are your most frequent data issues occurring? With Snowflake Secure Data Sharing, building custom workflows and dashboards to answer these questions has never been easier. I am excited to announce Monte Carlo Data Observability Insights , end-to-end operational analytics of an organization’s data platform, is now available in the Snow flake Data Marketplace.

article thumbnail

The JaffleGaggle Story: Data Modeling for a Customer 360 View

dbt Developer Hub

Editor's note: In this tutorial, Donny walks through the fictional story of a SaaS company called JaffleGaggle, who needs to group their freemium individual users into company accounts (aka a customer 360 view) in order to drive their product-led growth efforts. You can follow along with Donny's data modeling technique for identity resolution in this dbt project repo.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Time Series Forecasting: What, Why, and, How?

ProjectPro

This blog introduces the concept of time series forecasting models in the most detailed form. First, there will be a simple introduction to highlight the significance of such models. Next, you will find a section that presents the definition of a time series forecasting article. After that, you will explore popular time-series-forecasting models. The blog's last two parts cover various use cases of these models and projects related to time series analysis and forecasting problems.

article thumbnail

The Not-so-Sexy SQL Concepts to Make You Stand Out

KDnuggets

Databases are the houses of our data and data scientists HAVE TO HAVE A KEY! In this article, I discuss some lesser known concepts of SQL that data scientists do not familiarize themselves with.

SQL 116
article thumbnail

Releasing Connexion to the Community

Zalando Engineering

Connexion is a Python framework that automagically handles HTTP requests based on OpenAPI specification (formerly known as Swagger Spec) of your API described in YAML format. Connexion allows you to write an OpenAPI specification, then maps the endpoints to your Python functions; this makes it unique, as many tools generate the specification based on your Python code.

Scala 52
article thumbnail

Building the Business Case for DataOps

DataKitchen

The post Building the Business Case for DataOps first appeared on DataKitchen.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Announcing the GA of Cloudera DataFlow for the Public Cloud on Microsoft Azure

Cloudera

After the launch of Cloudera DataFlow for the Public Cloud (CDF-PC) on AWS a few months ago, we are thrilled to announce that CDF-PC is now generally available on Microsoft Azure, allowing NiFi users on Azure to run their data flows in a cloud-native runtime. . With CDF-PC, NiFi users can import their existing data flows into a central catalog from where they can be deployed to a Kubernetes based runtime through a simple flow deployment wizard or with a single CLI command.

Cloud 117
article thumbnail

Junior Data Scientist: The Next Level

KDnuggets

There is a difference in the level of experience compared to Junior, Mid-Level, and Senior Data Scientists. This article will go through the expectations for all job roles and what is required to move up the ladder.

Data 118
article thumbnail

Data pipeline asset management with Dataflow

Netflix Tech

by Sam Setegne, Jai Balani, Olek Gorajek Glossary asset ?—?any business logic code in a raw (e.g. SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a set of tasks (or jobs) to be executed in a predefined order (a.k.a. DAG) for the purpose of transforming data using some business logic. Dataflow ?

article thumbnail

Principal Engineering at Zalando

Zalando Engineering

In many companies, Senior Engineers who do not pursue Engineering Management, end up in a dead end in terms of their career progression. At Zalando, we have had a career path for individual contributors since 2016. Senior Software Engineers can choose one of the three possible career paths: Engineering Management Principal Engineering Technical Program Management In this post, we detail out how we leverage our senior individual contributors (Principal Engineers) throughout the company.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions

Data Engineering Podcast

Summary Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth in commerce, communications, and other activities. In order to make geospatial analytics more maintainable and scalable there has been an increase in the number of database engines that provide extensions to their SQL syntax that supports manipulation of spatial data.

SQL 100
article thumbnail

5 Ways to Apply AI to Small Data Sets

KDnuggets

It is better to use AI algorithms on small data sets for results free of human errors and false results when applied correctly. Here are some methods to apply AI to small data sets.

Algorithm 113
article thumbnail

Gartner® Recognizes Cloudera in Critical Capabilities for Cloud Database Management Systems for Operational Use Cases

Cloudera

Cloudera has been recognized as a Visionary in 2021 Gartner® Magic Quadrant for Cloud Database Management Systems (DBMS) and for the first time, evaluated CDP Operational Database (COD) against the 12 critical capabilities for Operational Databases. Overall, Gartner recognized 20 vendors for the Magic Quadrant of which 16 were evaluated in the 2021 Gartner Critical Capabilities for Cloud Database Management Systems for Operational Use Cases and 18 vendors for the 2021 Gartner Critical Capabil

article thumbnail

Data Engineering Annotated Monthly – January 2022

Big Data Tools

Due to the public holidays in Russia and my own vacation time, I didn’t get a chance to write an Annotated for December. Waiting a little longer might not be such a bad thing in this case, because now we have even more interesting releases to talk about! Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering sector and highlight new ideas from the wider community.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

How To Join Data in MongoDB

Rockset

MongoDB is one of the most popular databases for modern applications. It enables a more flexible approach to data modeling than traditional SQL databases. Developers can build applications more quickly because of this flexibility and also have multiple deployment options, from the cloud MongoDB Atlas offering through to the open-source Community Edition.

MongoDB 52
article thumbnail

KDnuggets™ News 22:n06, Feb 9: Data Science Programming Languages and When To Use Them; Complete Collection of Data Science Cheat Sheets

KDnuggets

Data Science Programming Languages and When To Use Them; The Complete Collection of Data Science Cheat Sheets – Part 1; Build a Web Scraper with Python in 5 Minutes; 8 Best Data Science Courses to Enroll in 2022 For Steep Career Advancement; Classifying Long Text Documents Using BERT.

article thumbnail

Build a Web Scraper with Python in 5 Minutes

KDnuggets

In this article, I will show you how to create a web scraper from scratch in Python.

Python 150
article thumbnail

Building a Visual Search Engine – Part 1: Data Exploration

KDnuggets

Ever wonder how Google or Bing finds similar images to your image? The algorithms for generating text based 10 blue-links are very different from finding visually similar or related images. In this article, we will explain one such method to build a visual search engine. We will use the Caltech 101 dataset which contains images of common objects used in daily life.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

The motivation behind using graph convolutions

KDnuggets

This article is an excerpt from the book Machine Learning with PyTorch and Scikit-Learn is the new book from the widely acclaimed and bestselling Python Machine Learning series, fully updated and expanded to cover PyTorch, transformers, graph neural networks, and best practices.

article thumbnail

Data Mesh & Its Distributed Data Architecture

KDnuggets

Going forward, data professionals have found a new way to address the scalability of sources through data mesh.

article thumbnail

Data Science Definition Humor: A Collection of Quirky Quotes Related to Data Science Definitions

KDnuggets

Read this collection of humorous, insightful quotes around data science that will hopefully brighten your day and make you laugh!

article thumbnail

What’s Your Passion? Make It a Reality During Our Challenge

KDnuggets

TigerGraph’s Graph for All Million Dollar Challenge is now open to engineers, innovators, founders, and dreamers wanting to transform their vision into reality - and potentially claim a piece of the $1 million prize pool at the same time.

IT 81
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.