Sat.Jan 16, 2021 - Fri.Jan 22, 2021

article thumbnail

The last (but not least)”ops” you need for your data : DataGovops

François Nguyen

To finish the trilogy (Dataops, MLops), let’s talk about DataGovOps or how you can support your Data Governance initiative. The origin of the term : Datakitchen We must give credit to Chris Bergh and his team DataKictchen. You should visit their website , you will find incredible good stuff there. This article was published in October 2020 with this title : “Data Governance as Code” The idea behind that is you should “actively promotes the safe use of data with automation

article thumbnail

How to unit test sql transforms in dbt

Start Data Engineering

Introduction Setup Code Conditional logic to read from mock input Custom macro to test for equality Setup environment specific test Run ELT using dbt Conclusion Further reading Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse. While this provides a lot of functionality out of the box, it gets tricky when you want to test your sql code locally before deploying to production.

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Helpful Tools for Apache Kafka Developers

Confluent

Apache Kafka® is at the core of a large ecosystem that includes powerful components, such as Kafka Connect and Kafka Streams. This ecosystem also includes many tools and utilities that […].

Kafka 128
article thumbnail

Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

Data Engineering Podcast

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

article thumbnail

Do You Need a DataOps Dojo?

DataKitchen

As DataOps activity takes root within an enterprise, managers face the question of whether to build centralized or decentralized DataOps capabilities. Centralizing analytics brings it under control but granting analysts free reign is necessary to foster innovation and stay competitive. The beauty of DataOps is that you don’t have to choose between centralization and freedom.

More Trending

article thumbnail

What is the Business Case for Delivering a Good Customer Experience at Your Bank?

Teradata

Most banks talk about developing great customer experiences but don't understand the value that investment would deliver. Learn about the 6 key capabilities banks require to address this problem.

Banking 59
article thumbnail

Digital Transformation is a Data Journey From Edge to Insight

Cloudera

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

article thumbnail

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Silectis

For those using a robust analytics database, such as the Snowflake® Data Cloud , adding the power of a data engineering platform can help maximize the value you’re getting out of that database. In this demo, we’ll show you how native tools in the Magpie data engineering platform play well with Snowflake, ultimately, allowing your team to do more in a centralized data engineering environment.

article thumbnail

Event Streaming Across Networks and Corporate Firewalls Using PubNub and Confluent Platform

Confluent

This year’s pandemic has forced businesses all around the world to adopt a “remote-first” approach to their operations, with an emphasis on better enabling collaboration, remote work, and productivity. This […].

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Digital Payments Data Drives Increased Usage and Customer Retention

Teradata

Payment data drives opportunities to increase usage & prevent attrition through hyper-segmentation, personalized interactions & optimized rewards programs. Read more.

article thumbnail

Fostering community to help drive cultural change

Cloudera

2020 put on full display how humanity shows up in times of hardship. We saw everything from street celebrations to usher weary medical personnel home after long days fighting to save lives to places like food banks receiving more donations and volunteers than ever before. Some communities were harder hit than others, and we’ve seen the same in the global workplace.

Food 103
article thumbnail

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

Data Council

Here's our January 2021 roundup of links from across the web that could be relevant to you: 1. Storing Cold Metadata with Alki (Dropbox) Dropbox shared insights into Alki , the petabyte-scale metadata store it designed for infrequently accessed metadata (“cold data”). The post details how one-size-fits-all database Edgestore was reaching capacity limits, and why audit logs were a good candidate to be moved elsewhere than on costly SSDs.

article thumbnail

Hepta Analytics Microsoft Silver Partner

Hepta Analytics

Hepta Analytics is proud to announce that we have attained Silver Status within the Microsoft Partner Network ! This achievement means that we have demonstrated our proven expertise in delivering quality solutions in one or more specialized areas of business (namely Cloud Platform and, in future, Data Analytics and Security). Microsoft competencies are designed to prepare companies to meet their customers’ needs, and to help attract new customers who are looking for Microsoft-certified sol

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Head Pose Estimation with Computer Vision

InData Labs

Recently, head pose estimation has become a popular area of research. Data scientists have spent over 20 years researching the most effective approaches to it, уеt haven’t settled for one. The technology is needed for facial recognition, eye gaze estimation and emotion recognition. For instance, it can be used for safety monitoring on the road, Запись Head Pose Estimation with Computer Vision впервые появилась InData Labs.

article thumbnail

How Does UX Design Help in Visualizing Big Data?

Teradata

Learn about the UX principles that help in designing effective Big Data visualizations so users can better understand data and make more informed decisions.

article thumbnail

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

Confluent

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka® and software architecture in […].

Kafka 44
article thumbnail

Cloudera Cares Speaker Series guiding value: Diversity

Cloudera

With intention and creativity, we opened eyes and minds. What now seems like a lifetime ago, our worlds were upended. As the stay at home orders were extended again and again and we continued to work from home, many of us were faced with reimagining our work. . For me, an unexpected challenge as head of Cloudera Cares has been redesigning the employee volunteer experience to continue engaging Clouderans even when in-person activities were no longer possible.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Creating a uniform landscape for macOS Software

Zalando Engineering

At the time of this writing, we have a universe of Mac applications — that are identified and version-inventoried — within the fleet of little over 3,000 Mac devices in Zalando from which a subset — selected either by their importance, frequency of updates or size of the install base — are part of a so-called software lifecycle. However, in July 2019, when a vulnerability was discovered in Zoom (long before becoming the mainstream video conference app during the COVID-19 pandemic), Information S

article thumbnail

2020 Visual Recap of the Apache Superset Project

Preset

The Apache Superset project experienced a critical growth period in 2020 in all aspects. In this post, I'll document how the key facets of the project changed last year.

Project 40
article thumbnail

How to Build a Successful Cloud DataOps Program

DataKitchen

The post How to Build a Successful Cloud DataOps Program first appeared on DataKitchen.

article thumbnail

Optimizing the Aural Experience on Android Devices with xHE-AAC

Netflix Tech

By Phill Williams and Vijay Gondi Introduction At Netflix, we are passionate about delivering great audio to our members. We began streaming 5.1 channel surround sound in 2010, Dolby Atmos in 2017 , and adaptive bitrate audio in 2019. Continuing in this tradition, we are proud to announce that Netflix now streams Extended HE-AAC with MPEG-D DRC ( xHE-AAC ) to compatible Android Mobile devices (Android 9 and newer).

Metadata 107
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

IT 93
article thumbnail

Apache Superset 1.0 is out!

Preset

The best Superset release to date is finally out

40
article thumbnail

Elasticsearch or Rockset for Real-Time Analytics: Managing Clusters vs Going Serverless

Rockset

Having the right analytics backend for your real-time application makes all the difference when it comes to how much time your team spends managing and maintaining the underlying infrastructure. Today, distributed systems that used to require a lot of manual intervention can often be replaced by more operationally efficient solutions. One example of this evolution is the move from Elasticsearch —which has been a great open-source, full-text search and analytics engine—to a low-ops alternative in

article thumbnail

Defer Transaction Side-Effects in Node.js

Grouparoo

At Grouparoo, we use Actionhero as our Node.js API server and Sequelize for our Object Relational Mapping (ORM) tool - making it easy to work with complex records from our database. Within our Actions and Tasks, we often want to treat the whole execution as a single database transaction - either all the modifications to the database will succeed or fail as a unit.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera

Cloudera Flow Management , based on Apache NiFi and part of the Cloudera DataFlow platform , is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

article thumbnail

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality. .

article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 3: PAM authentication

Cloudera

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka 75