Sat.Jul 03, 2021 - Fri.Jul 09, 2021

article thumbnail

Airflow on Kubernetes : Get started in 10 mins

Marc Lamberti

Airflow on Kubernetes is quite popular isn’t it? There is a good chance that you know Kubernetes, that you even have a Kubernetes cluster and you would like to deploy and run Airflow on it. However, Kubernetes is hard. There is so many things to deal with that it can be really laborious to just deploy an application. Hopefully for us, some super smart people have created Helm.

article thumbnail

Elastic Distributed Training with XGBoost on Ray

Uber Engineering

Introduction. Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases at Uber, spanning from optimizing marketplace dynamic pricing policies for Freight , improving times of … The post Elastic Distributed Training with XGBoost on Ray appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Data Engineering Podcast

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages.

Systems 100
article thumbnail

Reflecting on Cloudera’s Commitment to Address Workplace Inequality: One Year Later

Cloudera

It’s been a year of awakening and change across the U.S. and around the world. One year ago our CEO Rob Bearden vowed to take decisive action to make Cloudera a more diverse, equitable, and inclusive place to work and have Cloudera take an active role in promoting those attributes in the tech industry and our communities. . There is no one size fits all solution to creating an intentional and strategic plan for a diverse workforce.

Finance 122
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

What to Look Forward to at Kafka Summit APAC

Confluent

Kafka Summit, now in its sixth year, is coming to Asia-Pacific! After launching in the U.S. in 2016 and in Europe in 2018, Kafka Summit APAC will feature speakers and […].

Kafka 104
article thumbnail

Tired of First Dates? How to Build a Long-Term Relationship with Data

Teradata

Integrating data from R&D to customer experience and the after-market can deliver stand-out returns for auto companies. But how to go about it? Find out more.

More Trending

article thumbnail

4 Considerations When Building Your Government Data Strategy

Cloudera

If you’ve followed Cloudera for a while, you know we’ve long been singing the praises—or harping on the importance, depending on perspective—of a solid, standalone enterprise data strategy. While certainly not a new concept, Government missions are wholly dependent on real time access/analysis of data (wherever it may be (legacy data centers or public cloud) to render insight to support operational decisions.

article thumbnail

5 Can't Miss MongoDB.live Talks

Rockset

MongoDB.live is coming up on July 13-14, and we're going to be there! As with last year, it's going to be a virtual conference, so register (for free), find a comfy spot and surf the numerous sessions available to anyone interested in the MongoDB ecosystem. We spend a lot of time thinking about running analytics on MongoDB, as do many MongoDB users we speak with.

MongoDB 40
article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

The Ultimate Guide to Data Quality

Monte Carlo

Companies spend upwards of $15 million dollars per year firefighting bad data, with data engineering teams spending 30-50 percent of their time tackling broken pipelines, errant models, and stale dashboards. It’s no secret: data quality isn’t given the diligence it deserves. Fortunately, some of the best data teams are investing in new, smarter approaches to solving it.

Data 40
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Two Ways to Migrate Hortonworks DataFlow to Cloudera Flow Management

Cloudera

Hortonworks DataFlow (HDF) 3.5.2 was released at the end of 2020. The new releases will not continue under HDF as Cloudera brings the best and latest of Apache NiFi in the new Cloudera Flow Management (CFM) product. Getting the latest improvements and new features of NiFi is one of many reasons for you to move your legacy deployments of NiFi on this new platform.

article thumbnail

Automating Databricks with Terraform

Scribd Technology

The long term success of our data platform relies on putting tools into the hands of developers and data scientists to “choose their own adventure”. A big part of that story has been Databricks which we recently integrated with Terraform to make it easy to scale a top-notch developer experience. At the 2021 Data and AI Summit, Core Platform infrastructure engineer Hamilton Hord and Databricks engineer Serge Smertin presented on the Databricks terraform provider and how it’s been used by Scribd.

Kafka 40
article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

RudderStack Product News Vol. #008 - UI Refresh and New Integrations

RudderStack

This month's RudderStack's product updates talk about UI refresh and new integrations - New Product, Advertising, Analytics, Customer Success, and Data Infrastructure Updates

Data 40
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Cloudera Operational Database Replication in a Nutshell

Cloudera

In this previous blog post we provided a high-level overview of Cloudera Replication Plugin, explaining how it brings cross-platform replication with little configuration. In this post, we will cover how this plugin can be applied in CDP clusters and explain how the plugin enables strong authentication between systems which do not share mutual authentication trust.

article thumbnail

15 Neural Network Projects Ideas for Beginners to Practice 2023

ProjectPro

A curated list of interesting, simple, and cool neural network project ideas for beginners and professionals looking to make a career transition into machine learning or deep learning in 2021. Table of Contents Top 15 Neural Network Projects Ideas for 2023 What is a Neural Network? Applications of Neural Networks Why building Neural Network Projects is the best way to learn deep learning?

Project 40
article thumbnail

How to Handle Database Joins in Apache Druid vs Rockset

Rockset

Apache Druid is a real-time analytics database, providing business intelligence to drive clickstream analytics, analyze risk, monitor network performance, and more. When Druid was introduced in 2011, it did not initially support joins, but a join feature was added in 2020. This is important because it’s often helpful to include fields from multiple Druid files — or multiple tables in a normalized data set — in a single query, providing the equivalent of an SQL join in a relational database.

article thumbnail

Propensity Model: How to Predict Customer Behavior Using Machine Learning

AltexSoft

It’s a common practice for companies and their marketing teams to try guessing how likely certain groups of customers are going to act under certain circumstances. For this purpose, they create propensity models. Built in a traditional statistical fashion, the accuracy of outcomes predictive tools provide isn’t always high. To help companies unlock the full potential of personalized marketing, propensity models should use the power of machine learning technologies.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Democratize Data Cleaning Across Your Organization With Trifacta

Data Engineering Podcast

Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business.

SQL 100
article thumbnail

Top 10 Deep Learning Algorithms in Machine Learning [2023]

ProjectPro

When firing Siri or Alexa with questions, people often wonder how machines achieve super-human accuracy. All thanks to deep learning - the incredibly intimidating area of data science. This new domain of deep learning methods is inspired by the functioning of neural networks in the human brain. With the help of natural language processing (NLP) tools, it has led to the development of exciting artificial intelligence applications like language recognition, autonomous vehicles, and computer vision

article thumbnail

Apache Kafka Architecture and Its Components-The A-Z Guide

ProjectPro

A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges.

Kafka 40
article thumbnail

15 Deep Learning Projects Ideas for Beginners to Practice 2023

ProjectPro

As a beginner in the data industry, it can be overwhelming to step into AI and deep learning. After taking a deep learning course or two, you might find yourself getting stuck on how to proceed. You don't know what to learn next because you have the theoretical know-how of the concepts and no hands-on experience working with diverse deep learning frameworks and tools.This article will break down the steps you can take to enhance your deep learning skills.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating