2018

article thumbnail

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

Simon Späti

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system. I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

article thumbnail

Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28

Data Engineering Podcast

Summary The information about how data is acquired and processed is often as important as the data itself. For this reason metadata management systems are built to track the journey of your business data to aid in analysis, presentation, and compliance. These systems are frequently cumbersome and difficult to maintain, so Octopai was founded to alleviate that burden.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Netflix OSS and Spring Boot?—?Coming Full Circle

Netflix Tech

Netflix OSS and Spring Boot?—?Coming Full Circle Taylor Wicksell, Tom Cellucci, Howard Yuan, Asi Bross, Noel Yap, and David Liu In 2007, Netflix started on a long road towards fully operating in the cloud. Much of Netflix’s backend and mid-tier applications are built using Java, and as part of this effort Netflix engineering built several cloud infrastructure libraries and systems?

Java 110
article thumbnail

Maximizing Process Performance with Maze, Uber’s Funnel Visualization Platform

Uber Engineering

At Uber, we spend a considerable amount of resources making the driver sign-up experience as easy as possible. At Uber’s scale, even a one percent increase in the rate of sign-ups to first trips (the driver conversion rate) carries a … The post Maximizing Process Performance with Maze, Uber’s Funnel Visualization Platform appeared first on Uber Engineering Blog.

Process 110
article thumbnail

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

article thumbnail

Do These Things if you Want to Succeed as an HR Professional

U-Next

Success in today’s businesses has taken several meanings. Apart from just pay hikes and promotions, success has gotten new dimensions that have been of very recent origins. Today, success has become synonymous with happiness at a workplace, challenging tasks, compensatory rewards, incentives, authoritative job profiles, influential role, and more. The current talent pools in organizations have become wiser and more mature than their previous generation counterparts.

article thumbnail

Bringing AIOps to Machine Learning & Analytics

Cloudera

Two years ago I founded Hyperpilot with the mission to enable autopilot for container infrastructure. We learned a lot about data center automation based on real-time application and diagnostic feedback using applied machine learning. Last month, I joined Cloudera along with former team members Xiaoyun Zhu and Che-Yuan Liang to bring our expertise in intelligent automation to Cloudera’s modern platform for machine learning and analytics.

More Trending

article thumbnail

One Audio Sequencer to Rule Them All

Pandora Engineering

Photo credit: Carol Yepes Last month Pandora announced a public podcast beta in conjunction with the Podcast Genome Project. This rollout introduced many exciting features to our current mobile application offerings, including fully integrated and native podcast support. Ironically, one of the most interesting features and perhaps our biggest engineering win with this iteration is something that’s transparent to our end users: the inclusion of a new audio playback sequencer used exclusively for

Media 52
article thumbnail

Open Source: November Review - Maintainer training, new releases and more

Zalando Engineering

Project Highlights ExternalDNS version 0.5.9 is ready for testing. This project allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way. ExternalDNS also successfully made its way to the Kubernetes Incubator. Check out the list of changes in this new release. Zalando-Incubator welcomed two brand new open source projects 1) Darty - a data dependency manager for data science projects.

article thumbnail

Announcing my session at #SQLBits - Azure Databricks

Advancing Analytics: Data Engineering

Simon Whiteley and I will be back at #SQLBits 2019 talking about hashtag#DataEngineering and #DataScience in Databricks. We will look at #ApacheSpark #Python #Engineering & #MachineLearning in this full day training day. Register Now Have you looked at Azure DataBricks yet? No! Then you need to. Why you ask, there are many reasons. The number 1, knowing how to use Apache Spark will earn you more money.

article thumbnail

Making slow queries fast with composite indexes in MySQL

nodeSWAT

Making slow queries fast using composite indexes in MySQL This post expects some basic knowledge of SQL. Examples were made using MySQL 5.7.18 and run on my mid 2014 Macbook Pro. Query execution times are based on multiple executions so index caching can kick in. The use-case came from a real application and the solution is used in production. So you have inserted preliminary data to your database and run a simple COUNT(*) query against it with a simple WHERE clause and… the spinner is still run

MySQL 52
article thumbnail

Leading the Development of Profitable and Sustainable Products

Speaker: Jason Tanner

While growth of software-enabled solutions generates momentum, growth alone is not enough to ensure sustainability. The probability of success dramatically improves with early planning for profitability. A sustainable business model contains a system of interrelated choices made not once but over time. Join this webinar for an iterative approach to ensuring solution, economic and relationship sustainability.

article thumbnail

OLAP, what’s coming next?

Simon Späti

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture. What is OLAP? OLAP is an acronym for Online Analytical Processing.

Big Data 130
article thumbnail

Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

Data Engineering Podcast

Summary As more companies and organizations are working to gain a real-time view of their business, they are increasingly turning to stream processing technologies to fullfill that need. However, the storage requirements for continuous, unbounded streams of data are markedly different than that of batch oriented workloads. To address this shortcoming the team at Dell EMC has created the open source Pravega project.

article thumbnail

Netflix Information Security: Preventing Credential Compromise in AWS

Netflix Tech

by Will Bengtson Previously we wrote about a method for detecting credential compromise in your AWS environment. The methodology focused on a continuous learning model and first use principle. This solution still is reactive in nature?—?we only detect credential compromise after it has already happened. Even with detection capabilities, there is a risk that exposed credentials can provide access to sensitive data and/or the ability to cause damage in our environment.

AWS 95
article thumbnail

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data. Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, … The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Metadata 110
article thumbnail

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

article thumbnail

Recap of Hadoop News for July 2018

ProjectPro

News on Hadoop - July 2018 Hadoop data governance services surface in wake of GDPR.TechTarget.com, July 2, 2018. GDPR has turned out to be a strong motivator that would bring greater governance to big data. At the recent DataWorks Summit 2018 , though most of the attention was focussed on how Hadoop pioneer Hortonworks is all set to expand its service in the cloud, there was great interest and importance put on managing data privacy as well.

Hadoop 52
article thumbnail

Meet the newest Data Superheros: The Sixth Annual Data Impact Awards Finalists Are…

Cloudera

Drum roll… Starting from well over 100 nominations, we are excited to announce the finalists for this year’s Data Impact Awards ! Each year, nominees have raised the bar, and this year is no exception. The level of impact that organizations have shown and the variety of use cases are inspiring. From AI models that power retail customer decision engines to utility meter analysis that disables underperforming gas turbines, these finalists demonstrate how machine learning and analytics have become

article thumbnail

Data Science vs Engineering: Tension Points

Domino Data Lab: Data Engineering

This blog post provides highlights and a full written transcript from the panel, “ Data Science Versus Engineering: Does It Really Have To Be This Way? ” with Amy Heineike , Paco Nathan , and Pete Warden at Domino HQ. Topics discussed include the current state of collaboration around building and deploying models, tension points that potentially arise, as well as practical advice on how to address these tension points.

article thumbnail

AI at the Forefront of Digital Transformation Process in 2018

InData Labs

Digital Transformation Definition Digital transformation has been a big topic for a few years now, and it has many definitions. From a business perspective, digital transformation is about leveraging digital technologies to improve processes, competencies, and business models. It is also about changing the culture of the company because it requires letting go of old.

Process 52
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

New on Cloud Academy: Machine Learning on Google Cloud and AWS, Big Data Analytics, Terraform, and more

Cloud Academy

A 2017 IDC White Paper “recommend[s] that organizations that want to get the most out of cloud should train a wide range of stakeholders on cloud fundamentals and provide deep training to key technical teams ” (emphasis ours). Regular readers of the Cloud Academy blog know we’ve been talking about this for a long time. Future-proofing your organization requires technical excellence, collective experience, business context, and shared understanding.

article thumbnail

Investing in the Future of Engineering and Design

Zalando Engineering

Our cooperation with CODE University At Zalando, we strive to create an environment in which all our engineers, product, and design specialists feel they can inspire each other, make their ideas a reality, and contribute to providing the best possible platform for Zalando’s customers to have the ultimate customer experience. Part of this is making sure we understand what the future generation of product managers, interaction designers, and software engineers are thinking and what ideas and innov

article thumbnail

Cloud Nine: All Your Analytics, Wherever You Want Them. Really!

Teradata

Brian Wood explains how Teradata Vantage in the cloud has your back when it comes to analytic simplicity, control, effectiveness, and results.

Cloud 60
article thumbnail

Rockset's RocksDB-Cloud Library - Enabling the Next Generation of Cloud Native Databases

Rockset

Rockset and I began collaborating in 2016 due to my interest in their RocksDB-Cloud open-source key-value store. This post is primarily about the RocksDB-Cloud software, which Rockset open-sourced in 2016, rather than Rockset's newly launched cloud service. In it, I will explore how RocksDB-Cloud can be used to build an open-source cloud-friendly storage system.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Continuously Query Your Time-Series Data Using PipelineDB with Derek Nelson and Usman Masood - Episode 62

Data Engineering Podcast

Summary Processing high velocity time-series data in real-time is a complex challenge. The team at PipelineDB has built a continuous query engine that simplifies the task of computing aggregates across incoming streams of events. In this episode Derek Nelson and Usman Masood explain how it is architected, strategies for designing your data flows, how to scale it up and out, and edge cases to be aware of.

article thumbnail

Cache warming: Agility for a stateful service

Netflix Tech

by Deva Jayaraman , Shashi Madappa , Sridhar Enugula , and Ioannis Papapanagiotou EVCache has been a fundamental part of the Netflix platform (we call it Tier-1), holding Petabytes of data. Our caching layer serves multiple use cases from signup, personalization, searching, playback, and more. It is comprised of thousands of nodes in production and hundreds of clusters all of which must routinely scale up due to the increasing growth of our members.

AWS 51
article thumbnail

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data 109
article thumbnail

Recap of Hadoop News for June 2018

ProjectPro

News on Hadoop - June 2018 RightShip uses big data to find reliable vessels.HoustonChronicle.com,June 15, 2018. RightShip is using IBM’s predictive big data analytics platform to calculate the likelihood of compliance or mechanical troubles that an individual merchant ship will experience within the next year.It also leverages big data to analyse carbon emissions and vessel efficiency.

Hadoop 52
article thumbnail

How To Get Promoted In Product Management

Speaker: John Mansour

If you're looking to advance your career in product management, there are more options than just climbing the management ladder. Join our upcoming webinar to learn about highly rewarding career paths that don't involve management responsibilities. We'll cover both career tracks and provide tips on how to position yourself for success in the one that's right for you.

article thumbnail

Data Engineering is Critical to Big Data Success

Cloudera

I mentioned in an earlier blog titled, “Staffing your big data team, ” that data engineers are critical to a successful data journey. That said, most companies that are early in their journey lack a dedicated engineering group. And the longer it takes to put a team in place, the likelier it is that your big data project will stall. The data engineering team is responsible for collecting and ingesting batch and stream-oriented data, inventorying the data, working through ingest bottlenecks, and d

article thumbnail

Collaboration Between Data Science and Data Engineering: True or False?

Domino Data Lab: Data Engineering

This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down with Don Miner and Marshall Presser to discuss the state of collaboration between data science and data engineering. The blog post provides distilled insights, audio clips, excerpted quotes as well as the full audio and written transcript.

article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing

article thumbnail

Postgres Internals: Building a Description Tool

Dataquest

In previous blog posts , we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood.

article thumbnail

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.