Sat.Dec 05, 2020 - Fri.Dec 11, 2020

article thumbnail

Transactional Machine Learning at Scale with MAADS-VIPER and Apache Kafka

Confluent

This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].

article thumbnail

Proven Patterns For Building Successful Data Teams

Data Engineering Podcast

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.

Building 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.

article thumbnail

Books to level up your data skills!

Start Data Engineering

1.

SQL 130
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Getting Started with Scala and Apache Kafka

Confluent

If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].

Scala 120
article thumbnail

Data.What? What Can I Buy in a Data Marketplace?

Teradata

How does a Data Marketplace relate to Data Sharing? Here's a hint: enabling both internal and external users to access integrated data on demand to bring agility to business. Read more.

Data 52

More Trending

article thumbnail

What I've Learned in 2020: A Technical Version

Rockset

I'm on paternity leave till the end of year since my daughter is on the way, and since I have some little time left before getting really busy, I want to reflect on how I've grown as an engineer in 2020. I left Facebook at the end of 2019 to join Rockset, and it has been a fun year. For those who don't know, Rockset is a real-time analytics database.

article thumbnail

How to Run Apache Kafka on Windows

Confluent

Is Windows your favorite development environment? Do you want to run Apache Kafka® on Windows? Thanks to the Windows Subsystem for Linux 2 (WSL 2), now you can, and with […].

Kafka 113
article thumbnail

Looking Forwards Not Backwards: New Ways of Working for the CFO

Teradata

The bold CFO that steps into the breach and takes ownership of the bank’s data asset can transform the way they work and add massive value. Learn more.

Data 52
article thumbnail

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

Why worry about costs with cloud-native data warehousing? Have you been burned by the unexpected costs of a cloud data warehouse? If so, you know about the failed economics of some cloud-native solutions on the market today. If not, before adopting a cloud data warehouse, consider the true costs of a cloud-native data warehouse. Data warehouses have been broadly adopted to provide timely reports and valuable insights.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Preset Getting Started Guide is Now Available

Preset

End-user documentation is focused on taking you step-by-step through the entire onboarding Preset Cloud experience, from connecting your data to building your very first chart and dashboard.

Cloud 40
article thumbnail

Apache Kafka Lag Monitoring at AppsFlyer

Confluent

This article covers one crucial piece of every distributed system: visibility. At AppsFlyer, we call ourselves metrics obsessed and truly believe that you cannot know what you cannot see. We […].

Kafka 109
article thumbnail

The Economic Value of Supply Chain Investments

Teradata

What is the impact of adjusting various supply chain levers on a company's stock price? How do they impact shareholder value? Find out more.

52
article thumbnail

Covid Data: An anomalous blip, or the new normal?

Cloudera

COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN , shows that worldwide, 58 percent of customer interactions were digital as of July 2020.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Supporting content decision makers with machine learning

Netflix Tech

by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o

article thumbnail

Booking’s Journey with Brotli

Booking.com Engineering

Booking.com’s Journey with Brotli The challenges of improving performance in a complex environment The Transfăgărășan road in Romania is known for its jaw-dropping views. But you’re gonna have to work for it. Photo CC BY-SA 2.0 by Antony Stanley , from Flickr. Brotli is a lossless compression algorithm, designed and released by Google for use on the web.

Bytes 52
article thumbnail

How Much Security Is Too Much Security?

Teradata

In these budget conscious times, how much security is too much security? That depends on how much you value your data. Read more.

Data 52
article thumbnail

2020 Data Impact Award Winner Spotlight: Globe Telecom

Cloudera

It’s been a few weeks since we celebrated the 2020 Data Impact Awards, and everyone at Cloudera is still on a high. It was a brilliant event, and we are so pleased we were able to celebrate our fantastic customers virtually. Thank you again to all those who tuned in! . The Connect the Data Lifecycle award was our fifth award at this year’s ceremony.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 2)

Netflix Tech

In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The

IT 99
article thumbnail

How Cloudera Supports Government Data Encryption Standards

Cloudera

As part of our ongoing commitment to supporting Government regulations and standards in our enterprise solutions, including data protection, Cloudera recently introduced a version of our Cloudera Data Platform, Private Cloud Base product (7.1.5 release) that can be configured to use FIPS compliant cryptography. We have accomplished this significant improvement through supporting the deployment of the Cloudera Data Platform (CDP) Private Cloud Base on FIPS mode enabled RedHat Enterprise Linux (RH

article thumbnail

Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance

Cloudera

There are lessons to be learned from the brick and mortar or pure-play digital retailers that have been successful in the Covid-19 chaos. As the pandemic’s stress test of e-commerce, in-store insights, supply chain visibility, and fulfillment capabilities have revealed shortcomings, and long-lasting consumer experiences— it has also allowed many companies to pivot to very successful strategies built on enterprise data and the digitization efforts that accompany it.

Retail 61
article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 2: LDAP

Cloudera

In the previous post, we talked about Kerberos authentication and explained how to configure a Kafka client to authenticate using Kerberos credentials. In this post we will look into how to configure a Kafka client to authenticate using LDAP, instead of Kerberos. We will not cover the server-side configuration in this article but will add some references to it when required to make the examples clearer.

Kafka 52
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Medibank

Teradata

Teradata Vantage on AWS transforms private healthcare company to create “Better Health for Better Lives.

article thumbnail

Global View Distributed File System with Mount Points

Cloudera

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of Namenodes.

Systems 58
article thumbnail

Federated Learning, Machine Learning, Decentralized Data

Cloudera

Two years ago we wrote a research report about Federated Learning. We’re pleased to make the report available to everyone, for free. You can read it online here: Federated Learning. Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place — on smartphones, industrial sensing equipment, and other edge devices — and models are trained on-device.

article thumbnail

Toward a Better Quality Metric for the Video Community

Netflix Tech

by Zhi Li, Kyle Swanson, Christos Bampis, Lukáš Krasula and Anne Aaron Over the past few years, we have been striving to make VMAF a more usable tool not just for Netflix, but for the video community at large. This tech blog highlights our recent progress toward this goal. VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.