March, 2020

article thumbnail

How to process simple data stream and consume with Lambda

Team Data Science

I built a serverless architecture for my simulated credit card complaints stream using, AWS S3 AWS Lambda AWS Kinesis the above picture gives a high-level view of the data flow. I assume uploading the CSV file as a data producer, so once you upload a file, it generates object created event and the Lambda function is invoked asynchronously. The file data content will be written to the Kinesis stream as a record (record = data + partition key), which triggers another Lambda function and persist th

Process 130
article thumbnail

Scheduling a SQL script, using Apache Airflow, with an example

Start Data Engineering

One of the most common use cases for Apache Airflow is to run scheduled SQL scripts. Developers who start with Airflow often ask the following questions “How to use airflow to orchestrate sql?

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

20+ Machine Learning Datasets & Project Ideas

KDnuggets

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

Datasets 158
article thumbnail

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Uber Engineering

The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform , regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture … The post Why We Leverage Multi-tenancy in Uber’s Microservice Architecture appeared first on Uber Engineering Blog.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

article thumbnail

The Life Of A Non-Profit Data Professional

Data Engineering Podcast

Summary Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a shoe-string budget makes it even more challenging. In this episode Tyler Colby shares his experiences working as a data professional in the non-profit sector. From managing Salesforce data models to wrangling a multitude of data sources and compliance challenges, he describes the biggest challenges that he is facing.

AWS 100

More Trending

article thumbnail

10 Key skills, to help you become a data engineer

Start Data Engineering

This article gives you an overview of the 10 key skills you need to become a better data engineer. If you are struggling to get started on what to learn, start with the first topic and proceed through the list.

article thumbnail

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

KDnuggets

We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!

article thumbnail

Kafka Connect Elasticsearch Connector in Action

Confluent

The Elasticsearch sink connector helps you integrate Apache Kafka® and Elasticsearch with minimum effort. You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be […].

Kafka 118
article thumbnail

Improving Prediction of the Unconfirmed COVID-19 Cases

Teradata

With the lack of available tests & uncertainty around the true number of COVID-19 cases, Teradata Epidemiologist Daniel Ulatowski & Data Scientist Jack McCush hypothesize how symptomatic data & the Vantage ML Engine can be utilized to predict cases.

Utilities 128
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Behind The Scenes Of The Linode Object Storage Service

Data Engineering Podcast

Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at Linode recently completed to bring a fast and reliable S3 compatible object storage to production for your benefit.

Media 100
article thumbnail

Introducing Dispatch

Netflix Tech

By Kevin Glisson, Marc Vilanova, Forest Monsen Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch! Okay, but what is Dispatch? Put simply, Dispatch is: All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

article thumbnail

Learn to Optimize Algorithms in Our New Algorithm Complexity Course

Dataquest

Algorithms are at the center of almost any programming job. And particularly in the world of data engineering, using efficient algorithms is important enough that it’s a common topic to be quizzed about in job interviews. That’s why we’ve just launched a new course! Algorithm Complexity is the latest course in our Data Engineer career path.

article thumbnail

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Building a Cloud ETL Pipeline on Confluent Cloud

Confluent

As enterprises move more and more of their applications to the cloud, they are also moving their on-prem ETL (extract, transform, load) pipelines to the cloud, as well as building […].

Cloud 117
article thumbnail

People, We Need to Talk About Mass Electronic Surveillance

Teradata

With the COVID-19 epidemic in full swing, the countries that are faring the best are employing large-scale testing and electronic surveillance. But what does this mean for our civil liberties?

article thumbnail

Building A New Foundation For CouchDB

Data Engineering Podcast

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB.

Building 100
article thumbnail

Open-Sourcing riskquant, a library for quantifying risk

Netflix Tech

Netflix has a program in our Information Security department for quantifying the risk of deliberate (attacker-driven) and accidental… Continue reading on Netflix TechBlog ».

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Query Lambdas: Increasing Developer Velocity for Application Development

Rockset

At Rockset we strive to make building modern data applications easy and intuitive. Data-backed applications come with an inherent amount of complexity - managing the database backend, exposing a data API (often using hard-coded SQL or an ORM to write queries), keeping the data and application code in sync. the list goes on. Just as Rockset has reimagined and dramatically simplified the traditional ETL pipeline on the data-loading side , we’re now proud to release a new product feature - Query La

SQL 52
article thumbnail

What is the most effective policy response to the new coronavirus pandemic?

KDnuggets

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

IT 146
article thumbnail

Sharpening your Stream Processing Skills with Kafka Tutorials

Confluent

In the Apache Kafka® ecosystem, ksqlDB and Kafka Streams are two popular tools for building event streaming applications that are tightly integrated with Apache Kafka. While ksqlDB and Kafka Streams […].

Kafka 112
article thumbnail

Five Books Every CX Leader Should Read in this Time of Social Distancing

Teradata

Check out this curated reading list of books on customer experience. From updated classics to new research and insights into how large enterprises can drive business outcomes from a CX initiative.

59
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Scaling Data Governance For Global Businesses With A Data Hub Architecture

Data Engineering Podcast

Summary Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well considered and coherent strategy. In this episode Tim Ward describes an architecture that he has used successfully with multiple organizations to scale compliance. By treating it as a graph problem, where each hub in the network has localized control with inheritance of higher level controls it reduces overhead and provides greater flexibility.

article thumbnail

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

Netflix Tech

By Ben Sykes Continue reading on Netflix TechBlog ».

Kafka 98
article thumbnail

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

Rockset

Intro In recent years, Kafka has become synonymous with “streaming,” and with features like Kafka Streams, KSQL, joins, and integrations into sinks like Elasticsearch and Druid, there are more ways than ever to build a real-time analytics application around streaming data in Kafka. With all of these stream processing and real-time data store options, though, also comes questions for when each should be used and what their pros and cons are.

Kafka 40
article thumbnail

When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

Data 141
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Announcing ksqlDB 0.8.0

Confluent

The latest ksqlDB release introduces long-awaited features such as tunable retention and grace period for windowed aggregates, new built-in functions including LATEST_BY_OFFSET, a peek at the new server API under […].

Process 97
article thumbnail

Saudi Telecom Company

Teradata

STC uses Teradata to serve each segment as one team, increasing response rates, customer satisfaction, and revenue as well as reducing operating and call center costs.

52
article thumbnail

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers.

Kafka 100
article thumbnail

How to work remotely at Zalando

Zalando Engineering

This document is heavily informed by remote work guidance from other companies and authors. Notable sources include FYI's 11 Best Practices for Working Remotely and Laurel Farrer’s How to Design Powerful Rituals for Successful Distributed Companies. Special thanks to Timo from GiantSwarm for sharing learnings in an ad-hoc phone call. Other sources are linked in the appendix.

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.