Top Data Engineering Digest Google Cloud Data Management Content for April, 2019

April, 2019

Running Your Database On Kubernetes With KubeDB

Data Engineering Podcast

APRIL 28, 2019

Summary Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your application. In this episode Tamal Saha explains how the KubeDB project got started, why you might want to run your database with Kubernetes, and how to get started.

Database

Database PostgreSQL MongoDB MySQL

12 Programming Languages Walk into a Kafka Cluster…

Confluent

APRIL 23, 2019

When it was first created, Apache Kafka ® had a client API for just Scala and Java. Since then, the Kafka client API has been developed for many other programming languages which enables you to pick the language you want. This freedom of choice ultimately allows you to build an event streaming platform with the language best suited to your business needs.

Programming Language

Programming Language Kafka Programming Scala

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Product Manager’s Guide to Optimizing DX for Systemic Impact

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

3 Ways New As-a-Service Offerings Bring Choice and Flexibility to Teradata Vantage

Teradata

APRIL 23, 2019

At Teradata, we think a lot about our customers in the cloud, and continue on our promise to deliver choice and flexibility by adding new as-a-service options for Teradata Vantage.

Cloud

Webinars

The Product Manager’s Guide to Optimizing DX for Systemic Impact

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Introducing SVT-AV1: a scalable open-source AV1 framework

Netflix Tech

APRIL 17, 2019

by Andrey Norkin, Joel Sole, Kyle Swanson, Mariana Afonso, Anush Moorthy, Anne Aaron Netflix Headquarters, Winchester Circle. Netflix headquarters circa 2014. It’s a nice building with good architecture! This was the primary home of Netflix for a number of years during the company’s growth, but at some point Netflix had outgrown its home and needed more space.

Coding

Coding Algorithm Programming Language Manufacturing

Leading the Development of Profitable and Sustainable Products

Speaker: Jason Tanner

While growth of software-enabled solutions generates momentum, growth alone is not enough to ensure sustainability. The probability of success dramatically improves with early planning for profitability. A sustainable business model contains a system of interrelated choices made not once but over time. Join this webinar for an iterative approach to ensuring solution, economic and relationship sustainability.

Breaking Down Data Silos in Financial Services with a Centralized Data Management Platform

Cloudera

APRIL 25, 2019

Organizations in the financial services industry rely on data to make strategic decisions, drive their businesses, and maintain a competitive edge. The Bank of England was discovering that legacy tools were no longer sufficient to satisfy the growing demands of analysts and economists. The Bank of England is the central bank of the United Kingdom formed in 1694.

Data Management

Data Management Banking Management Insurance

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Rockset

APRIL 29, 2019

In this blog post I compare options for real-time analytics on DynamoDB - Elasticsearch , Athena, and Spark - in terms of ease of setup, maintenance, query capability, latency. There is limited support for SQL analytics with some of these options. I also evaluate which use cases each of them are best suited for. Developers often have a need to serve fast analytical queries over data in Amazon DynamoDB.

NoSQL

NoSQL PostgreSQL AWS SQL

Unpacking Fauna: A Global Scale Cloud Native Database

Data Engineering Podcast

APRIL 22, 2019

Summary One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is the co-founder and CEO of Fauna and in this episode he explains the unique capabilities of Fauna, compares the consensus and transaction algorithm to that used in other NewSQL systems, and describes the

Database

Database Cloud NoSQL Scala

More Trending

Unpacking Fauna: A Global Scale Cloud Native Database

Data Engineering Podcast

APRIL 22, 2019

Database

Database Cloud NoSQL Scala

Announcing Confluent Cloud for Apache Kafka as a Native Service on Google Cloud Platform

Confluent

APRIL 9, 2019

I’m excited to announce that we’re partnering with Google Cloud to make Confluent Cloud, our fully managed offering of Apache Kafka ® , available as a native offering on Google Cloud Platform (GCP). This means you will have the ability to use Confluent Cloud’s managed Apache Kafka service with familiar Google tools and processes, including integration into the Google Cloud Console and GCP Marketplace to provide a seamless sign-up experience, and integrated billing and first-line support provided

Google Cloud

Google Cloud Kafka Cloud MongoDB

Why Smart Cities Need Intelligent Data

Teradata

APRIL 3, 2019

In his blog, Bob McQueen defines smart cities, their challenges and opportunities, and the use of smart data management.

Data

Data Data Management Management

Open Source: March Updates - A new Kubernetes operator & more Cloud Native Apps

Zalando Engineering

APRIL 24, 2019

Project Highlights A new operator is added to Zalando’s list of Cloud Native Applications. Elasticsearch Operator - an operator for running Elasticsearch in Kubernetes with focus on operational aspects, like safe draining and offering auto-scaling capabilities for Elasticsearch data nodes, rather than just abstracting manifest definitions. To make things even simpler for developers, we also released a new framework that helps to build Kubernetes operators in Python.

Cloud

Cloud Python Big Data Data Science

Machine Learning in Production: Software Architecture

Domino Data Lab: Data Engineering

APRIL 17, 2019

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Software Architecture" chapter from the book, Machine Learning in Production. This chapter excerpt provides data scientists with insights and tradeoffs to consider when moving machine learning models to production. Also, if you’re interested in learning about how Domino provides an API endpoint for your model, check out this video tutorial on the Domino Support site.

Machine Learning

Machine Learning Architecture Data Data Engineering

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

Data

What customer centric corporate culture really means and why it is so important

Cloudera

APRIL 4, 2019

All organizations, big or small, have a unique corporate culture that has been nurtured and mastered over the years. A company’s culture is its basic personality and the essence of how employees interact and work. It is the sum of company beliefs, ethics, expectations, goals, value and mission. The company culture is normally where brand promises are either kept or broken.

Insurance

Insurance IT Consulting Media

Index Your Big Data With Pilosa For Faster Analytics

Data Engineering Podcast

APRIL 15, 2019

Summary Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for building an index of your data to enable high-speed aggregate analysis. In this episode Seebs explains how Pilosa fits in the broader data landscape, how it is architected, and how you can start using it for your own analysis.

Big Data

Big Data Relational Database Media Database

KSQL: What’s New in 5.2

Confluent

APRIL 3, 2019

KSQL enables you to write streaming applications expressed purely in SQL. There’s a ton of great new features in 5.2, many of which are a result of requests and support from the community—we use GitHub to track these, and I’ve indicated in each point below the corresponding issue. If you have suggestions for new features, please do be sure to search our GitHub issues page and upvote, or create a new issue as appropriate.

Food

Food Kafka Bytes Data Cleanse

How to Analyze Data at Speed and Scale Using Pervasive Data Intelligence

Teradata

APRIL 9, 2019

Chris Twogood explains while large companies who utilize data need Pervasive Data Intelligence in order to leverage all of their data, all of the time.

Utilities

Utilities Data

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

Database

How to set an ideal thread pool size

Zalando Engineering

APRIL 17, 2019

We all know that thread creation in Java is not free. The actual overhead varies across platforms, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. This is where the Thread Pool comes to the rescue. The thread pool reuses previously created threads to execute current tasks and offers a solution to the problem of thread cycle overhead and resource thrashing.

Java

Java Utilities Database Systems

How We Structure our dbt Projects

dbt Developer Hub

APRIL 30, 2019

As the maintainers of dbt, and analytics consultants, at Fishtown Analytics (now dbt Labs) we build a lot of dbt projects. Over time, we’ve developed internal conventions on how we structure them. This article does not seek to instruct you on how to design a final model for your stakeholders — it won’t cover whether you should denormalize everything into one wide master table , or have many tables that need to be joined together in the BI layer.

Project

Project Database-centric Raw Data Data Warehouse

Secondary Indexes For Analytics On DynamoDB

Rockset

APRIL 29, 2019

In this post I explore how to support analytical queries without encountering prohibitive scan costs, by leveraging secondary indexes in DynamoDB. I also evaluate the pros and cons of this approach in contrast to extracting data to another system like Athena, Spark or Elastic. Rockset recently added support for DynamoDB - which basically means you can run fast SQL on DynamoDB tables without any ETL.

NoSQL

NoSQL SQL AWS Systems

Announcing the General Availability of Cloudera Flow Management and Cloudera Edge Management

Cloudera

APRIL 15, 2019

Last month at Strata, San Francisco, we made an announcement about two upcoming products – Cloudera Flow Management and Cloudera Edge Management. Today, we are super excited to announce that both the products are generally available for use. While Cloudera Flow Management has been eagerly awaited by our Cloudera customers for use on their existing Cloudera platform clusters, Cloudera Edge Management has generated equal buzz across the industry for the possibilities that it brings to enterp

Management

Management Data Ingestion Data Collection Government

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Confluent

APRIL 10, 2019

Enterprises run modern data systems and services across multiple cloud providers, private clouds and on-prem multi-datacenter deployments. Instead of having many point-to-point connections between sites, the Confluent Platform provides an integrated event streaming architecture with frictionless data replication between sites. Applications can publish streams of data to a self-hosted on-prem cluster, replicate them to another on-prem cluster or to different cloud providers, load them into data s

Kafka

Kafka Metadata Java Cloud

How U.S. Bank Uses A.I. and Machine Learning to Deeply Personalize Your Banking Experience

Teradata

APRIL 15, 2019

Katherine Knowles-Marchione explains how US. Bank is using AI to improve and personalize the banking experience.

Banking

Banking Machine Learning

Learning DevOps as a Software Engineer

Zalando Engineering

APRIL 24, 2019

At Zalando the teams are autonomous and involved in the entire software development process - from gathering stakeholder requirements to design, implementation, testing and deployment. For me, this was one of the greatest challenges/opportunities of joining Zalando and it allowed me to grow on so many dimensions of software development, one of these being DevOps.

Software Engineer

Software Engineer Software Engineering Engineering Architecture

Python at Netflix

Netflix Tech

APRIL 29, 2019

By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members.

Python

Python Amazon Web Services Machine Learning Algorithm

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

Building

Serverless Data Pipelines On DataCoral

Data Engineering Podcast

APRIL 7, 2019

Summary How much time do you spend maintaining your data pipeline? How much end user value does that provide? Raghu Murthy founded DataCoral as a way to abstract the low level details of ETL so that you can focus on the actual problem that you are trying to solve. In this episode he explains his motivation for building the DataCoral platform, how it is leveraging serverless computing, the challenges of delivering software as a service to customer environments, and the architecture that he has de

Data Pipeline

Data Pipeline Pipeline-centric Database-centric AWS

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Cloudera

APRIL 2, 2019

Cloudera and Intel have a long history of innovation, driving big data analytics and machine learning into the enterprise with unparalleled performance and security. We are pleased to build upon that direction with our collaboration on Intel® Optane DC persistent memory. Available to customers running 2nd Generation Intel® Xeon® Scalable processors, Intel Optane DC persistent memory can significantly enhance the performance of real-time and streaming applications.

NoSQL

NoSQL Google Cloud Hadoop Machine Learning

Reshaping Entire Industries with IoT and Confluent Cloud

Confluent

APRIL 18, 2019

While the current hype around the Internet of Things (IoT) focuses on smart “things”—smart homes, smart cars, smart watches—the first known IoT device was a simple Coca-Cola vending machine at Carnegie Mellon University in Pittsburgh. Students in the 1980s, tired of long walks to an empty machine, installed a board that tracked the machine’s sensors to determine whether the machine was stocked and the bottles were cold.

Food

Food Cloud Retail Kafka

The Eight Functions You Should Consider When Choosing a Self-Service Analytics Platform

Teradata

APRIL 2, 2019

This blog discusses the functions one should consider when choosing a self-service analytics platform.

How To Get Promoted In Product Management

Speaker: John Mansour

If you're looking to advance your career in product management, there are more options than just climbing the management ladder. Join our upcoming webinar to learn about highly rewarding career paths that don't involve management responsibilities. We'll cover both career tracks and provide tips on how to position yourself for success in the one that's right for you.

Management

Developing Zalando APIs

Zalando Engineering

APRIL 3, 2019

How Zalando software engineers develop internal and external APIs Imagine a distributed system consisting of 8,000+ active service applications; developed and operated by 300+ delivery teams in six tech hubs. 1,200+ software engineers use various technologies to implement business needs and are responsible end-to-end for those components. A pretty complex system of people and software.

Scala

Scala Software Engineer Software Engineering Java

How to Use AI and Video Analytics to Give Your Retail Business a Competitive Edge

Teradata

APRIL 21, 2019

Peter Mackenzie explains the advancements in AI and video analytics in the retail sector.

Retail

Optimizing Kafka Streams Applications

Confluent

APRIL 30, 2019

With the release of Apache Kafka ® 2.1.0, Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. This framework opens the door for various optimization techniques from the existing data stream management system (DSMS) and data stream processing literature. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1, with a focus on stateful operations like aggregations and joins.

Kafka

Kafka Coding Process Bytes

Putting Events in Their Place with Dynamic Routing

Confluent

APRIL 4, 2019

Event-driven architecture means just that: It’s all about the events. In a microservices architecture, events drive microservice actions. No event, no shoes, no service. In the most basic scenario, microservices that need to take action on a common stream of events all listen to that stream. In the Apache Kafka ® world, this means that each of those microservice client applications subscribes to a common Kafka topic.

Kafka

Kafka Data Cleanse Retail Finance

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.

Data Analysis

April, 2019

Running Your Database On Kubernetes With KubeDB

12 Programming Languages Walk into a Kafka Cluster…

Webinars

Trending Sources

3 Ways New As-a-Service Offerings Bring Choice and Flexibility to Teradata Vantage

Webinars

Introducing SVT-AV1: a scalable open-source AV1 framework

Leading the Development of Profitable and Sustainable Products

Breaking Down Data Silos in Financial Services with a Centralized Data Management Platform

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Unpacking Fauna: A Global Scale Cloud Native Database

Sign up to get articles personalized to your interests!

More Trending

Unpacking Fauna: A Global Scale Cloud Native Database

Announcing Confluent Cloud for Apache Kafka as a Native Service on Google Cloud Platform

Why Smart Cities Need Intelligent Data

Open Source: March Updates - A new Kubernetes operator & more Cloud Native Apps

Machine Learning in Production: Software Architecture

Navigating the Future: Generative AI, Application Analytics, and Data

What customer centric corporate culture really means and why it is so important

Index Your Big Data With Pilosa For Faster Analytics

KSQL: What’s New in 5.2

How to Analyze Data at Speed and Scale Using Pervasive Data Intelligence

Get Better Network Graphs & Save Analysts Time

How to set an ideal thread pool size

How We Structure our dbt Projects

Secondary Indexes For Analytics On DynamoDB

Announcing the General Availability of Cloudera Flow Management and Cloudera Edge Management

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

How U.S. Bank Uses A.I. and Machine Learning to Deeply Personalize Your Banking Experience

Learning DevOps as a Software Engineer

Python at Netflix

Reimagined: Building Products with Generative AI

Serverless Data Pipelines On DataCoral

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Reshaping Entire Industries with IoT and Confluent Cloud

The Eight Functions You Should Consider When Choosing a Self-Service Analytics Platform

How To Get Promoted In Product Management

Developing Zalando APIs

How to Use AI and Video Analytics to Give Your Retail Business a Competitive Edge

Optimizing Kafka Streams Applications

Putting Events in Their Place with Dynamic Routing

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Stay Connected