Sat.Jan 30, 2021 - Fri.Feb 05, 2021

article thumbnail

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java 98
article thumbnail

Pitching a DataOps Project That Matters

DataKitchen

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project 98
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

8 Years of Event Streaming with Apache Kafka

Confluent

Since I first started using Apache Kafka® eight years ago, I went from being a student who had just heard about event streaming to contributing to the transformational, company-wide event […].

Kafka 96
article thumbnail

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Cloudera

Financial services institutions need the ability to analyze and act on massive volumes of data from diverse sources in order to monitor, model, and manage risk across the enterprise. They need a comprehensive data and analytics platform to model risk exposures on-demand. Cloudera is that platform. I am pleased to announce that Cloudera was just named the Risk Data Repository and Data Management Product of the Year in the Risk Markets Technology Awards 2021. .

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

article thumbnail

Why You Need to Set SLAs for Your Data Pipelines

Monte Carlo

For today’s data engineering teams, the demand for real-time, accurate data has never been higher, yet data downtime is an all-too-common reality. So, how can we break this vicious cycle and achieve reliable data? Just like our software engineering counterparts 20 years ago, data teams in the early 2020s are facing a significant challenge: reliability.

More Trending

article thumbnail

Embracing the conversation – The reawakening of civil rights movements in the workplace

Cloudera

The 2020 murders of Ahmad Aubrey, Breonna Taylor, and George Floyd within a three month span of one another brought discussions about racial-social justice to dining rooms and boardrooms alike; and just like with the African-American catalysts before them, their tragedies reopened the door to larger discussions around economic, social, and civil rights.

Media 75
article thumbnail

Digital Payments Analytics Rapidly Respond to Changing Preferences and Emerging Value Propositions

Teradata

Data & analytics now allow rapid response to changing preferences and emerging value propositions to seed future growth in the digital payments area. Read more.

Data 59
article thumbnail

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

Silectis

We’re excited to share that Silectis has released a new suite of automated data lineage features within Magpie, the end-to-end data engineering platform. These features equip users with knowledge of where data originates, when it was published, and who it was published by. This additional context provides users the transparency and accountability necessary to trust their data and react to inevitable data quality issues.

article thumbnail

Announcing the Confluent Community Forum

Confluent

Today, we’re delighted to launch the Confluent Community Forum. Built on Discourse, a platform many developers will already be familiar with, this new forum is a place for the community […].

78
article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Open Source Highlight: OpenLineage

Data Council

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch , OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

article thumbnail

Building CI/CD with Airflow, GitLab and Terraform in GCP

Ripple Engineering

The Ripple Data Engineering team is expanding, which means higher frequency changes to our data pipeline source code. This means we need to build better, more configurable and more collaborative tooling that prevents code collisions and enforces software engineering best practices. To ensure the quality of incoming features, the team sought to create a pipeline that automatically validated those features, build them to verify their interoperability with existing features and GitLab,  and al

article thumbnail

Data Governance for Self-Service Analytics Best Practices

DataKitchen

The post Data Governance for Self-Service Analytics Best Practices first appeared on DataKitchen.

article thumbnail

Stop using constants. Feed randomized input to test cases.

Zalando Engineering

Introduction Testing is widely accepted practice in software industry. I am an iOS Engineer and have been writing tests, like most of us. The way I approach testing changed radically a few years back. And I have used and shared this new technique for a few years within Zalando and outside. In this post, I will explain what is wrong with most test cases and how to apply randomized input to improve tests.

Coding 40
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Announcing Preset Cloud Beta!

Preset

Preset cloud offers an extremely easy way to get up and running with Superset on a highly scalable, highly performant, and secure cloud service.

Cloud 40
article thumbnail

Vantage Trial Delights Cloud Data Analytic Users

Teradata

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud for analysts, developers, and operations personnel. Find out more.

Cloud 40
article thumbnail

System Observability For The Cloud Native Era With Chronosphere

Data Engineering Podcast

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly.

Systems 100
article thumbnail

Data, The Unsung Hero of the Covid-19 Solution

Cloudera

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

How to update millions of records in MySQL?

Start Data Engineering

Introduction Setup Problems with a single large update Updating in batches Conclusion Further reading Introduction When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. If those records are locked, they will not be editable(update or delete) by other transactions on your database.

MySQL 130
article thumbnail

How DataOps Kitchens Enable Version Control

DataKitchen

This blog builds on earlier posts that defined Kitchens and showed how they map to technical environments. We’ve also discussed how toolchains are segmented to support multiple kitchens. DataOps automates the source code integration, release, and deployment workflows related to analytics development. To use software dev terminology, DataOps supports continuous integration, continuous delivery, and continuous deployment.

Coding 59
article thumbnail

It's Never Too Late For a Career Change

Zalando Engineering

Is it ever too late to follow your dream and start a new career? Well, I was 30 and had been working for Zalando for more than 4 years when I decided to change my career path for the second time. I made the decision a year ago, joined my new team in April 2020, and I didn't regret it for a single day. Since that transition, a lot of people approached me with questions and asked me for advice.

IT 40
article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Cloudera

In the previous posts in this series, we have discussed Kerberos , LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka 77
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

MLOps = more money

DareData

If you are a business person wondering why you should invest in DevOps / MLOps, this is your guide in terms of real live money. We'll try to keep this as non-technical as possible. The technical terms we mention are because they are ABSOLUTELY essential and should have budget allocated to them. Be sure to read to the end for instructions on how to execute an analysis of expected gains from infrastructure changes required for MLOps in your organisation.

article thumbnail

Exploring MNIST Dataset using PyTorch to Train an MLP

ProjectPro

From the visual search for improved product discoverability to face recognition on social networks- image classification is fueling a visual revolution online and has taken the world by storm. Image classification , a subfield of computer vision helps in processing and classifying objects based on trained algorithms. Image Classification had its Eureka moment back in 2012 when Alexnet won the ImageNet challenge and since then there has been an exponential growth in the field.