April, 2021

article thumbnail

Writing memory efficient data pipelines in Python

Start Data Engineering

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

article thumbnail

Flipr: Making Changes Quickly and Safely at Scale

Uber Engineering

Introduction. Uber’s many software systems require a high volume of changes every day. Because of our systems’ size and complexity, it is a significant challenge to implement these changes without unintended consequences, ultimately slowing down developer productivity. Flipr is a … The post Flipr: Making Changes Quickly and Safely at Scale appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What’s New in Apache Kafka 2.8

Confluent

I’m proud to announce the release of Apache Kafka 2.8.0 on behalf of the Apache Kafka® community. The 2.8.0 release contains many new features and improvements. This blog post highlights […].

Kafka 138
article thumbnail

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Relationship intelligence will shape the workplace of the future

Cloudera

Our latest Influential Women in Data session featured Brenda Le Sueur from Cambridge Assessments. Brenda has worked across many organisations and continents, but what has always been crucial to her is relationships – how we cultivate them, how we nurture them and how they, in turn, define us. I sat down with Brenda to ask her about her journey as a woman in tech and understand more about the impact of relationships on our career.

article thumbnail

DataOps Enables Your Data Fabric

DataKitchen

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

More Trending

article thumbnail

Making Customer Experience Your Competitive Advantage

Teradata

Customers expect organizations to know them, provide relevant & personalized experiences, and be good stewards of their data. Yet many businesses still struggle with this. Why?

Data 91
article thumbnail

How to Survive a Kafka Outage

Confluent

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka 134
article thumbnail

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served

article thumbnail

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

Cloudera

As we continue to work toward diversity, equality, and inclusion in every aspect of our company culture and beyond, we’ve learned so much from our employees’ unique perspectives on allyship. One such employee is Suzy Tonini, a Talent Researcher with a globe-trotting childhood. Growing up with parents who worked for the U.S. State Department, Suzy had the opportunity to hop from country to country with her family, experiencing a variety of cultures. .

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

10 Upcoming Data Science Platforms for Massive Disruption

DataKitchen

The post 10 Upcoming Data Science Platforms for Massive Disruption first appeared on DataKitchen.

article thumbnail

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

article thumbnail

Meet the New Analytics Superhero - The CFO

Teradata

The CFO’s broad remit & natural ownership of core financial data can provide the foundation for an enhanced role that leverages data analytics to enable new value opportunities.

article thumbnail

Building the Confluent UI with React Hooks – Benefits and Lessons Learned

Confluent

Updating a fundamental paradigm in your React app can be as easy as search and replace, or at other times, as difficult as convincing your entire frontend engineering to buy […].

Building 125
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

Data Engineering Podcast

Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is growing and introducing more specialized roles. In this episode Josh Benamram, CEO and co-founder of Databand, describes the motivations for these emerging roles, how these positions affect the team dynamics, and the types of visibility that they need into the data platform to do their jobs effectively.

article thumbnail

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

article thumbnail

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

85
article thumbnail

Apple Migration Tips for M1 Macs

Grouparoo

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Reshaping the supermarket post-pandemic

Retail Insight

Social distancing and a life lived largely online have been the reality for over a year. But, as the world gradually emerges from lockdown, ha s the shape of retail really changed forever?

Retail 52
article thumbnail

Announcing ksqlDB 0.17.0

Confluent

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

article thumbnail

What is Streaming Analytics?

Cloudera

What is Streaming Analytics? Streaming Analytics is a type of data analysis that processes data streams for real-time analytics. It continuously processes data from multiple streams and performs simple calculations to complex event processing for delivering sophisticated use cases. The primary purpose is to present the most up-to-date operational events for the user to stay on top of the business needs and take action as changes happen in real-time.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

How to Approach Your Data Engineering Transformation

Silectis

Should you build your own tooling, take a “best of breed” approach, or buy a turnkey data engineering platform? We’ve got you covered. Data Engineering Platforms: Build, Best of Breed, or Buy? Every company wants to be data-driven. Modern organizations that thrive based on data have a common strength: a solid data engineering practice.

article thumbnail

Welcome, Pedro!

Grouparoo

Building an open source tool to connect data to many different services means a lot of integrations. It can be pretty tricky, so we were lucky to meet Pedro S Lopez a few weeks back when he started adding several plugins to that integration list. He has now come aboard officially and will work more on the core product. Pedro makes the Grouparoo team an international one.

article thumbnail

Roadside convenience retail: technology and data insights

Retail Insight

The last year has seen dramatic shifts in food and grocery retail. And the rate of change show s no signs of slowing down. However, there are some common truths that exist no matter the time or channel – features that usually indicate success.

Retail 52
article thumbnail

Debuting a Modern C++ API for Apache Kafka

Confluent

Morgan Stanley uses Apache Kafka® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. […].

Kafka 121
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

CFO Analytics - CFO of the Future

Teradata

As finance teams evolve into the providers of strategic insights, leveraging analytics will result in a new user base, new insights & reposition the CFO to a predictor of the future.

Finance 52
article thumbnail

Converting HBase ACLs to Ranger policies

Cloudera

CDP is using Apache Ranger for data security management. If you wish to utilize Ranger to have a centralized security administration, HBase ACLs need to be migrated to policies. This can be done via the Ranger webUI, accessible from Cloudera Manager. But first, let’s take a quick overview of HBase method for access control. HBase Authorization. If authorization is set up ( for example with Kerberos and setting the hbase.security.authorization property to true ), users can have rules defined on r

Finance 96
article thumbnail

Data Engineer, Data Analyst, Data Scientist — What’s the Difference?

Dataquest

Data engineer, data analyst, and data scientist — these are job titles you’ll often hear mentioned together when people are talking about the fast-growing field of data science. There are plenty of other job titles in data science and data analytics too. But here, we’re going to talk about: The “big three” roles (data analyst, data scientist, and data engineer) How they differ from each other Which role is best for you Although precisely how these roles are defined can va

article thumbnail

Sync modes - Intentional data syncing

Grouparoo

Grouparoo supports syncing data to an ever-growing number of destinations. While building these integrations and talking to our users, we have found it's important to be intentional about how exactly data syncing to these destinations is performed. For example, our Salesforce data integration has a "Sync Mode" option that allows you to control whether contacts will be created, deleted or only updated.

Data 52
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.