March, 2022

article thumbnail

End-to-end data engineering project - batch edition

Start Data Engineering

Objective Setup Pre-requisites Components Source systems Schedule & Orchestrate Extract Load Transform Data visualization Choosing tools & frameworks Future work & improvements Conclusion Further reading References Objective It can be difficult to know where to begin when starting a data engineering side project. If you have wondered What data to use for your data project?

article thumbnail

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Machine Learning Pipeline Optimization with TPOT

KDnuggets

Let's revisit the automated machine learning project TPOT, and get back up to speed on using open source AutoML tools on our way to building a fully-automated prediction pipeline.

article thumbnail

How to make Apache Kafka clients go fast(er) on Confluent Cloud

Confluent

Imagine your team wants to design a data streaming architecture and you’re in charge of creating the prototype. Within a few minutes, you provision a fully managed Apache Kafka® cluster […].

Kafka 124
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

The Soldiers, Rogues, and Mages of Data Teams

Jesse Anderson

Data Teams are like Role Playing Games (RPG). If you’re not familiar with RPGs, there is a person or group of characters all working together for a common goal. A crucial part of the individual characters are their levels, skills, and stats. In many games, higher levels are required to unlock specific skills. Likewise, stats show how well a character can utilize their skills.

article thumbnail

The Telecommunications Service Provider Journey – From Telco to Techco

Cloudera

Earlier this month, the multi-national carrier MTN announced a rebranding, and along with its logo refresh, announced that it was moving to focus on being a technology provider. The new look, “aligns with our evolution from a telecommunications company to a technology company,” said Nompilo Morafo, Chief Corporate Affairs officer at the company. Across APAC too, telcos are looking at the shift to becoming technology companies, and last week’s TMForum Leadership Summit “ The Tech Driven Telco ” s

More Trending

article thumbnail

You Have More Data Quality Issues Than You Think 

Monte Carlo

Say it with me: your data will never be perfect. Any team striving for completely accurate data will be sorely disappointed. Data testing , anomaly detection, and cataloging are important steps, but technology alone will not solve your data quality problem. Like any entropic system, data breaks. And as we’ve learned building solutions to curb the causes and downstream impact of data issues, it happens more often than you think.

article thumbnail

WTF is a Tensor?!?

KDnuggets

A tensor is a container which can house data in N dimensions, along with its linear operations, though there is nuance in what tensors technically are and what we refer to as tensors in practice.

IT 160
article thumbnail

Announcing ksqlDB 0.24.0

Confluent

We are excited to announce ksqlDB 0.24! It comes with a slew of improvements and new features. Access to Apache Kafka® record headers will enable a whole host of new […].

Kafka 109
article thumbnail

This 6-Month Product Management Program Is The Ultimate Choice For Next-Gen Product Experts!

U-Next

Let’s face it! Product Management CAN BE TOUGH, but only if you haven’t laid your hands on the best training experience for Product enthusiasts in all its glory: the PG Certificate Program in Product Management by IIM Indore & Jigsaw. Several present-day Product Experts started their journeys with this exclusive 6-month program & found multiple doors of opportunities, wide open to welcome them.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Do Data Companies Need Chief Ethics Officers?

Cloudera

Sometimes it takes a billion-dollar mistake to bring the murkier side of data ethics into sharp focus. Equifax found this out to their own cost in 2017 when they failed to protect the data of almost 150 million users globally. The catastrophic breach was bad enough on its own — but Equifax waited three months to go public with the news. As the public furore rose to a crescendo, the credit organization dragged its feet on disclosing exactly what kind of information had been leaked.

Data 119
article thumbnail

Crystal Ball, Black Box or Advanced Forecasting and Demand Planning in Retail and CPG

Teradata

Neither crystal balls nor black boxes will provide the agility needed for accurate demand forecasting in today’s retail & CPG environment. Learn more about new approaches to FDP.

Retail 98
article thumbnail

Which Google Cloud certification is best for me?

A Cloud Guru: Data Engineering

Considering your options when it comes to Google Cloud (GCP) certification paths? This post will talk about the various GCP cloud certifications, what each cert covers, what it could mean for your career, and how you can set (and achieve) your own personal goals. Accelerate your career Get started with ACG and transform your career with […] The post Which Google Cloud certification is best for me?

article thumbnail

A Guide On How To Become A Data Scientist (Step By Step Approach)

KDnuggets

Becoming a Data Scientists is an exciting path, but you cannot learn data science within one year or six months—instead, it’s a lifetime process that you have to follow with proper dedication and hard work. To guide your journey, the skills outlined here are the first you must acquire to become a data scientist.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Securing Your Logs in Confluent Cloud with HashiCorp Vault

Confluent

Logging is an important component of managing service availability, security, and customer experience. It allows Site Reliability Engineers (SREs), developers, security teams, and infrastructure teams to gain insights to how […].

Cloud 104
article thumbnail

These Sales Enthusiasts Mastered Strategic Sales In Just 4 Months With The Executive Program in Strategic Sales Management

U-Next

With the onset of the 5th industrial revolution, the world is moving closer towards embracing newer technologies in almost every walk of life. In the business ecosphere, those who upskill & transform into the best professionals versions of themselves are bound to be at the forefront of this revolution. The Sales domain, too, cannot be home to traditional sales methods for too long.

article thumbnail

Women Leaders in Data Discuss Breaking Bias on International Women’s Day

Cloudera

As an official sponsor of International Women’s Da y, Cloudera is excited to celebrate Women’s History Month and International Women’s Day, and to take up the mantle of this year’s theme #BreakTheBias. . Even in industries where women are underrepresented, like tech, women have made a lot of progress. Progress over many decades has slowly transformed the workplace into an environment where women’s strengths are recognized and valued.

Big Data 117
article thumbnail

Women of Teradata: Molly Treese

Teradata

In honor of Women's History Month, we are spotlighting Molly Treese, Teradata's Chief Legal Officer, as she looks back at her career in law & recounts the importance of inclusion in the workplace.

98
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Object equality in Java and Kotlin

Booking.com Engineering

Introduction We are going to review the subtleties and complications of trying to compare objects for equality in Java, where the problem originates, why it is important, Kotlin’s approach on the problem and some recommendations on the topic. Determining if two entities are the same is a fundamental operation in mathematics and we implement this operation in programming by the weaker notion of equivalency; the difference being that we are content with equality across a specific subset of propert

Java 52
article thumbnail

The Range of NLP Applications in the Real World: A Different Solution To Each Problem

KDnuggets

Most companies look at it like it’s one big technology, and assume the vendors’ offerings might differ in product quality and price but ultimately be largely the same. Truth is, NLP is not one thing; it’s not one tool, but rather a toolbox.

article thumbnail

Introducing Stream Processing Use Case Recipes Powered by ksqlDB

Confluent

From fraud detection and predictive analytics, to real-time customer experiences and cyber security, stream processing has countless benefits for use cases big and small. By unlocking the power of continuous […].

Process 87
article thumbnail

Short-Term and Vacation Rental Data: Sources and Analysis

AltexSoft

Vacation and short-term rentals are experiencing a post-COVID renaissance. The data clearly shows the stable, worldwide increase in demand for alternative accommodations, from apartments to farm stays to igloos. The data also indicates that more and more companies in the sector tie their bright future with… data. According to the Global Vacation Rental Report 2022 , 40 percent of property managers rely on market business intelligence (BI) or analytics services, a big leap compared to just 13 per

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

#BreakTheBias: It’s a Journey

Cloudera

Bias is everywhere. . We’re surrounded by it. . And it’s natural. We are alive today as a species because of biases. But it has a tangible impact on our personal and professional lives. Biases shape us and our experience. . As primary caregivers, women have felt the impact of biases and expectations more keenly during the pandemic. Last year women in my network felt like they were being expected to do everything at home and at work.

Education 109
article thumbnail

Closing the Gap Left by Third Party Cookie Deprecation

Teradata

Consumers expect personalized experiences when they interact with a brand. But organizations are losing the ability to listen to their customers via digital channels. Fixing this is critical.

98
article thumbnail

Case Study: Rockset Enables Real-Time Operational Analytics in Hardware Manufacturing for PCH

Rockset

Summary: PCH International is a leading hardware manufacturer with global operations that requires ultra-fast analysis of huge volumes of streaming data. The existing data infrastructure built on MongoDB and DynamoDB couldn’t support real-time querying of data. PCH initially considered data warehouses such as Snowflake and Redshift , but found them too costly for real-time analytics.

article thumbnail

3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks

KDnuggets

While there may always seem to be something new, cool, and shiny in the field of AI/ML, classic statistical methods that leverage machine learning techniques remain powerful and practical for solving many real-world business problems.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

An Introduction to Data Mesh

Confluent

Decentralized architectures continue to flourish as engineering teams look to unlock the potential of their people and systems. From Git, to microservices, to cryptocurrencies, these designs look to decentralization as […].

article thumbnail

Here Is What Happens Post Completion Of Our IIM Certified Integrated Program in Business Analytics!

U-Next

With data increasingly becoming an irreplaceable part of businesses growth; organizations and industries have actively embraced the use of Business Analytics to propel their growth to newer heights. However, utilizing data and implementing analytics crucial to making informed, intelligent, and effective business decisions is no easy task. With over a decade of experience in identifying, analyzing, and creating relevant programs in emerging technologies, Jigsaw has been a pioneer in imparting kn

article thumbnail

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Cloudera

Whether the enterprise uses dozens or hundreds of data sources for multi-function analytics, all organizations can run into data governance issues. Bad data governance practices lead to data breaches, lawsuits, and regulatory fines — and no enterprise is immune. . Everyone Fails Data Governance. In 2019, the U.K.’s Information Commissioner’s Office fined Marriott International over £99 million ($136 million) for violating the General Data Protection Regulation (GDPR), a European law govern

article thumbnail

Women of Teradata: Claire Bramley

Teradata

In honor of Women's History Month, we are spotlighting Claire Bramley, Teradata's Chief Financial Officer, as she looks back at her career in finance and tech.

Finance 98
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.