2017

article thumbnail

The Rise of the Data Engineer

Maxime Beauchemin

I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence. The role we’d created for ourselves was a new discipline entirely. My team was at forefront of this transformation.

article thumbnail

Wallaroo with Sean T. Allen - Episode 12

Data Engineering Podcast

Summary Data oriented applications that need to operate on large, fast-moving sterams of information can be difficult to build and scale due to the need to manage their state. In this episode Sean T. Allen, VP of engineering for Wallaroo Labs, explains how Wallaroo was designed and built to reduce the cognitive overhead of building this style of project.

Kafka 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Evolving Distributed Tracing at Uber Engineering

Uber Engineering

Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber Engineering, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds … The post Evolving Distributed Tracing at Uber Engineering appeared first on Uber Engineering Blog.

article thumbnail

8 Key Facts You Should know if You are a HR Professional

U-Next

Two of the most common reasons why people think they can be great HR professionals are either they are very organized and systematic or they have good people skills. But these two qualities alone are not enough for anyone to make it big in their career in human resource management. The two attributes can land them jobs but to move up the ladder, they definitely need some qualities that will set them apart from other employees.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Constant Gardening

Zalando Engineering

How effective management is a continuing story of growth Producers’ Style One of the things I struggled the most with in the past year was identifying the best way to lead my teams. I worked a lot on myself, observed my peers, and tried to learn from my leads, but in the end, I ran into into the well known dilemma: task-focused or people-focused management, which one is best?

article thumbnail

Recap of Hadoop News for June 2017

ProjectPro

News on Hadoop - June 2017 Hadoop Servers Expose Over 5 Petabytes of Data. BleepingComputer.com, June 2, 2017. According to John Matherly, the founder of Shodan, a search engine used for discovering IoT devices found that Hadoop installed improperly configured HDFS based servers exposed over 5 PB of information. He found approximately 4487 HDFS servers available without authentication through public IP addresses that in total exposed 5120 TB of data.The expert said that 47820 MongoDB servers exp

Hadoop 52

More Trending

article thumbnail

What is a Data Engineer?

Dataquest

From helping cars drive themselves to helping Facebook tag you in photos , data science has attracted a lot of buzz recently. Data scientists have become extremely sought after , and for good reason — a skilled data scientist can add incredible value to a business. But what about data engineers? Who are they, and what do they do? A data scientist is only as good as the data they have access to.

article thumbnail

SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11

Data Engineering Podcast

Summary Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in production. In this episode Jeroen van der Heijden explains his motivation for writing a new database, SiriDB, the challenges that he faced in doing so, and how it works under the hood. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll

Database 100
article thumbnail

Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

Data Engineering Podcast

Summary To process your data you need to know what shape it has, which is why schemas are important. When you are processing that data in multiple systems it can be difficult to ensure that they all have an accurate representation of that schema, which is why Confluent has built a schema registry that plugs into Kafka. In this episode Ewen Cheslack-Postava explains what the schema registry is, how it can be used, and how they built it.

Kafka 100
article thumbnail

data.world with Bryon Jacob - Episode 9

Data Engineering Podcast

Summary We have tools and platforms for collaborating on software projects and linking them together, wouldn’t it be nice to have the same capabilities for data? The team at data.world are working on building a platform to host and share data sets for public and private use that can be linked together to build a semantic web of information. The CTO, Bryon Jacob, discusses how the company got started, their mission, and how they have built and evolved their technical infrastructure.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.

Hadoop 100
article thumbnail

Buzzfeed Data Infrastructure with Walter Menendez - Episode 7

Data Engineering Podcast

Summary Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture all of those interactions.

article thumbnail

Astronomer with Ry Walker - Episode 6

Data Engineering Podcast

Summary Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform that lets you skip straight to processing your valuable business data. Ry Walker, the CEO of Astronomer, explains how the company got started, how the platform works, and their commitment to open source.

article thumbnail

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

Data Engineering Podcast

Summary Yelp needs to be able to consume and process all of the user interactions that happen in their platform in as close to real-time as possible. To achieve that goal they embarked on a journey to refactor their monolithic architecture to be more modular and modern, and then they open sourced it! In this episode Justin Cunningham joins me to discuss the decisions they made and the lessons they learned in the process, including what worked, what didn’t, and what he would do differently

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

ScyllaDB with Eyal Gutkind - Episode 4

Data Engineering Podcast

Summary If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it differentiates itself in the crowded database market. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch

Database 100
article thumbnail

Defining Data Engineering with Maxime Beauchemin - Episode 3

Data Engineering Podcast

Summary What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more. Transcript provided by CastSource Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.

article thumbnail

Dask with Matthew Rocklin - Episode 2

Data Engineering Podcast

Summary There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how Dask fills the gap between a task oriented workflow tool and an in memory processing framework, and how it brings the power of Python to bear on the problem of big data. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the news

Hadoop 100
article thumbnail

Pachyderm with Daniel Whitenack - Episode 1

Data Engineering Podcast

Summary Do you wish that you could track the changes in your data the same way that you track the changes in your code? Pachyderm is a platform for building a data lake with a versioned file system. It also lets you use whatever languages you want to run your analysis with its container based task graph. This week Daniel Whitenack shares the story of how the project got started, how it works under the covers, and how you can get started using it today!

Data Lake 100
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Introducing The Show

Data Engineering Podcast

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes , or Google Play Music , share it on social media, and tell your friends and co-workers.

Media 100
article thumbnail

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared first on Uber Engineering Blog.

Hadoop 105
article thumbnail

Re-Architecting Cash and Digital Wallet Payments for India with Uber Engineering

Uber Engineering

Uber is developing a payment platform for India that enables operations teams to more seamlessly collect and distribute cash and digital wallet payments to drivers. In this article, San Francisco-based software engineer Yijun Liu reflects on his experiences working with … The post Re-Architecting Cash and Digital Wallet Payments for India with Uber Engineering appeared first on Uber Engineering Blog.

article thumbnail

The Road to uChat: Building Uber’s Internal Chat Solution

Uber Engineering

Two years ago, Uber’s previous chat application began showing signs that it would not be able to adapt to our growth. There were app crashes, performance hiccups, and outages that crippled our company’s ability to effectively communicate online. With user … The post The Road to uChat: Building Uber’s Internal Chat Solution appeared first on Uber Engineering Blog.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Engineering Uber Predictions in Real Time with ELK

Uber Engineering

Uber’s services rely on the accuracy of our event prediction a n d f o r e c a s t i n g t o o l s. From estimating rider demand on a given date to predicting … The post Engineering Uber Predictions in Real Time with ELK appeared first on Uber Engineering Blog.

article thumbnail

Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform

Uber Engineering

Uber facilitates seamless and more enjoyable user experiences by channeling data from a variety of real-time sources. These insights range from in-the-moment traffic conditions that provide guidance on trip routes to the Estimated Time of Delivery (ETD) of an UberEATS … The post Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform appeared first on Uber Engineering Blog.

article thumbnail

Engineering On-Demand Transportation for Business with Uber Central

Uber Engineering

When Uber launched in 2009, our mission was simple: make transportation as reliable as running water everywhere, for everyone. While our mission remains the same today, the number of Uber use cases have grown dramatically, motivating our engineers to think … The post Engineering On-Demand Transportation for Business with Uber Central appeared first on Uber Engineering Blog.

article thumbnail

Spaghetti and Marshmallows at Zalando: An Exercise to Inspire Deep Learning

Zalando Engineering

Some months ago I had the opportunity, with two fellow Zalandos, to organize the “Dortmund 5PM”; a gathering across all Dortmund teams, scheduled once a month on Fridays in our local event space. We want to foster further cross-team collaboration between individuals, making these meetings a memorable experience for all. We opted for running The Marshmallow Challenge ; a funny design exercise that encourages teams to experience simple yet profound lessons in collaboration, innovation, and creativ

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Recap of Hadoop News for May 2017

ProjectPro

News on Hadoop - May 2017 High-end backup kid Datos IO embraces relational, Hadoop data.theregister.co.uk , May 3 , 2017. Datos IO has extended its on-premise and public cloud data protection to RDBMS and Hadoop distributions. Its RecoverX distributed database backup product of latest version v2.0 now provides hadoop support. RecoverX is described as app-centric and can back up applications data whilst being capable of recovering it at various granularity levels to enhance storage efficiency.

Hadoop 52
article thumbnail

Hack Around The Clock – Hack Night @ Zalando Hamburg

Zalando Engineering

Here at Zalando Tech, hacking has a long tradition and it’s not merely limited to writing lines of code. Back in the day, when there were no blinds to protect our screens from the reflection of the sun, staff covered the office windows with paper bags. This pragmatism is what makes working at Zalando unique. With our first Hack Night in Hamburg, a similar approach could be felt.

Media 52
article thumbnail

Recap of Hadoop News for April 2017

ProjectPro

News on Hadoop-April 2017 AI Will Eclipse Hadoop, Says Forrester, So Cloudera Files For IPO As A Machine Learning Platform. Forbes.com, April 3, 2017. Apache Hadoop was one of the revolutionary technology in the big data space but now it is buried deep by Deep Learning. According to Forrester Research report, declared in March 2017, artificial intelligence will eclipse hadoop.

Hadoop 52
article thumbnail

Improving Swift Compilation Times from 12 to 2 Minutes

Zalando Engineering

With our Fleek app growing and new features being introduced, its compile times have started to become a real challenge. We recently discovered that to compile the app or just make a minor change it would take approximately 12 minutes. We wanted to cut this time dramatically to improve the customer experience overall, as well as our work overhead. In this article, I’ll show how we managed to decrease it to just 2 minutes.

Coding 52
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.