February, 2019

article thumbnail

Machine Learning In The Enterprise

Data Engineering Podcast

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change.

article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

Building a scalable, reliable and performant machine learning (ML) infrastructure is not easy. It takes much more effort than just building an analytic model with Python and your favorite machine learning framework. After all, machine learning with Python requires the use of algorithms that allow computer programs to constantly learn, but building that infrastructure is several levels higher in complexity.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Managing Uber’s Data Workflows at Scale

Uber Engineering

At Uber’s scale, thousands of microservices serve millions of rides and deliveries a day, generating more than a hundred petabytes of raw data. Internally, engineering and data teams across the company leverage this data to improve the Uber experience. … The post Managing Uber’s Data Workflows at Scale appeared first on Uber Engineering Blog.

article thumbnail

Building a Cross-platform In-app Messaging Orchestration Service

Netflix Tech

George Abraham , Devika Chawla , Chris Beaumont , and Daniel Huang. Thoughtful, relevant, and timely messaging is an integral part of a customer’s Netflix experience. The Netflix Messaging Engineering team builds the platform and the messages to communicate with Netflix customers. Messages in the Netflix App In-app messages at Netflix fall broadly into two channels?

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Cash Is Still King – Make Sure Your Business Is Prepared for the Next Recession

Teradata

If your organization understands customer profitability in detail, then your organization can easily navigate through a recession.

80
article thumbnail

How ATB Financial is Utilizing Hybrid Cloud to Reduce the Time to Value for Big Data Analytics by 90 Percent

Cloudera

ATB Financial is Alberta’s largest home grown financial institution, and prides itself on its customer obsession, putting the over 750,000 Albertans at the centre of all that they do. As a result, ATB is constantly transforming in order to ensure it can continue to deliver unparalleled value to Albertans. A key pillar in the transformation journey is focused on robust data operations that can help ATB deliver timely, relevant and delightful service.

More Trending

article thumbnail

All About the Kafka Connect Neo4j Sink Plugin

Confluent

Only a little more than one month after the first release, we are happy to announce another milestone for our Kafka integration. Today, you can grab the Kafka Connect Neo4j Sink from Confluent Hub. . Neo4j extension – Kafka sink refresher. We’ve been using the work we did for the Kafka sink – Neo4j extension and have made it available via remote connections over our binary bolt protocol.

Kafka 87
article thumbnail

How to Run SQL on PDF Files

Rockset

PDFs are the de facto standard for distributing and sharing fixed-layout documents today. A quick survey of my laptop folders reveals account statements, receipts, technical papers, book chapters, and presentation slides—all PDFs. Lots of valuable information finds its way into all manner of PDF files. Which is a great reason for Rockset to support SQL queries on PDF files, in our mission to make data more usable to everyone.

SQL 52
article thumbnail

How to Make Space for Research & Innovation?

Zalando Engineering

Redesigning research and product development so that the explorative nature of data science becomes a driver for innovation Zalando leverages cutting edge machine learning technologies to be Europe’s leading online platform for fashion and lifestyle. In order to develop these products, data scientists and product roles have to work together closely.

article thumbnail

It's the Relationship - Not Just the Data - That is Critical to Success

Teradata

Rob Armstrong explains that while data is important, the real key is preserving the relationships across the data models that leads to insight and successful business outcomes.

IT 40
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

How HelloFresh is Disrupting the Grocery Industry Using Deep Customer Insights.

Cloudera

We’ve just published our most recent customer success story ! This story gives a look at how HelloFresh is becoming a more data centric organization to better serve its customers. HelloFresh is the leading global provider of fresh ingredients and recipes that help families enjoy wholesome home-cooked meals with no planning or shopping. The company packages over 10 million meals a month for more than one and a half million customers worldwide.

article thumbnail

What Is Readable Code?

Pandora Engineering

Code creates interfaces. But code itself is also an interface.

Coding 52
article thumbnail

Sysmon Security Event Processing in Real Time with KSQL and HELK

Confluent

During a recent talk titled Hunters ATT&CKing with the Right Data , which I presented with my brother Jose Luis Rodriguez at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. Defining relationships among Windows security event logs such as Sysmon , for example, helped us to appreciate the extra context that two or more events together can provide for a hunt.

Process 81
article thumbnail

How to Build a Facebook Messenger Chatbot Powered by Fast SQL on CSV

Rockset

A chatbot, like any human customer service rep, needs data about your business and products in order to respond to customers with the correct information. What is an efficient way to hook up your data to a chat application without significant data engineering? In this blog, I will demonstrate how you can build a Facebook Messenger chatbot to help users find vacation rentals using CSV data on Airbnb rentals.

SQL 40
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

A Journey On End To End Testing A Microservices Architecture

Zalando Engineering

End to end testing is a testing technique used to test the flow of an application through a business transaction. In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky. In this article you can read about our team’s journey: What our system looks like What do you get from e2e testing?

article thumbnail

Is There Such a Thing as Too Much Parallelism?

Teradata

In her blog, Carrie Ballinger discusses parallelism and how you can fashion it to specific needs by using the new sparse map capability

IT 45
article thumbnail

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Cloudera

The Magic Quadrant (MQ) is an established, widely-referenced series of research reports published by the analyst firm Gartner, Inc. The January 2019 “Magic Quadrant for Data Management Solutions for Analytics” provides valuable insights into the status, direction, and players in the DMSA market. A total of 19 vendors satisfied Gartner’s extensive inclusion criteria for insertion in this year’s MQ DMSA report.

article thumbnail

Protecting a Story’s Future with History and Science

Netflix Tech

By Kylee Peña, Chris Clark, and Mike Whipple Kylee’s parents after their wedding in 1978. I?—?Kylee?—?have two photos from my parents’ wedding. Just two. This year they celebrated 40 years of marriage, so both photos were shot on film. Both capture a joy and awkwardness that come with young weddings. They’re fresh and full of life, candid captures from another era.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Kafka Summit 2019: 3 Big Things!

Confluent

How many Kafka Summits should there be in a year? Experts disagree. Some say there should be one giant event where everybody gathers at once. Some say there should be one once a month in different regions of the world. Others say you should live every day like it’s Kafka Summit. As you may know, we have adopted a happy medium: three Summits in 2019.

Kafka 66
article thumbnail

Using Smart Schema to Accelerate Insights from Nested JSON

Rockset

Developers often need to work with datasets without a fixed schema, like heavily nested JSON data with several deeply nested arrays and objects, mixed data types, null values, and missing fields. In addition, the shape of the data is prone to change when continuously syncing new data. Understanding the shape of a dataset is crucial to constructing complex queries for building applications or performing data science investigations.

article thumbnail

Open Source: January Updates - Celebrate 'I Love Free Software Day

Zalando Engineering

Project Highlights Lionel Montrieux brought Nakadi to FOSDEM 2019. This is one of the largest open source projects released by Zalando. Nakadi is a distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues. It is used in production by over a hundred teams daily and handles over 100 TB of data every day. Try out Nakadi !

article thumbnail

What Lessons Can Apollo 13 Teach Us About Analytics?

Teradata

Tom Casey explains lessons from the Apollo 13 program and how they can be applied to day to day dealings in the analytics world.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Transforming the business of communication with 5G

Cloudera

3.2 billion. That is the number of unique mobile subscribers that Asia Pacific is projected to have by 2025, which accounts for more than half of the world’s mobile subscribers. Mobile data traffic is predicted to grow at a 40 to 50 percent rate annually, and Internet of Things (IoT) connections from 25 to 30 percent. As technology adoption increases, more service providers require 5G to support the surge of incoming data.

article thumbnail

Deep Learning For Data Engineers

Data Engineering Podcast

Summary Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and managing the platforms that power these models. To help us understand what is involved, we are joined this week by Thomas Henson. In this episode he shares his experiences experimenting with deep learning, what data engineers need to know about the infrastructure and data requirements to power the models that your team is building, and how it can be u

article thumbnail

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Confluent

With serverless being all the rage, it brings with it a tidal change of innovation. Given that it is at a relatively early stage, developers are still trying to grok the best approach for each cloud vendor and often face the following question: Should I go cloud native with AWS Lambda, GCP functions, etc., or invest in a vendor-agnostic layer like the serverless framework ?

Kafka 109
article thumbnail

Extending Vector with eBPF to inspect host and container performance

Netflix Tech

by Jason Koch , with Martin Spier , Brendan Gregg , Ed Hunter Improving the tools available to our engineers to help them diagnose, triage, and work through software performance challenges in the cloud is a key goal for the cloud performance engineering team at Netflix. Today we are excited to announce latency heatmaps and improved container support for our on-host monitoring solution?

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Defining a company policy to handle harassment in open source

Zalando Engineering

Open Source Participation When you as a Zalando employee engage in open source communities as part of your work, you will interact with the wider open source communities outside Zalando - this is generally a good experience and collaborating with many different types of developers with different backgrounds is generally a positive input to your personal development.

Coding 40
article thumbnail

The Utah Jazz Uses Pervasive Data Intelligence for Next Generation Sports Analytics

Teradata

Larry H. Miller is using data and analytics to successfully increase customer satisfaction from a multitude of data sources and customer touchpoints.

Data 40
article thumbnail

Introducing Cloudera DataFlow (CDF)

Cloudera

Late last year, the news of the merger between Hortonworks and Cloudera shook the industry and gave birth to the new Cloudera – the combined company with a focus on being an Enterprise Data Cloud leader and a product offering that spans from edge to AI. One of the most promising technology areas in this merger that already had a high growth potential and is poised for even more growth is the Data-in-Motion platform called Hortonworks DataFlow (HDF).

article thumbnail

Speed Up Your Analytics With The Alluxio Distributed Storage System

Data Engineering Podcast

Summary Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent storage systems to provide a scalable, in-memory storage layer for scaling computational workloads independent of the size of your data.

Systems 100
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.