January, 2018

article thumbnail

CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

Data Engineering Podcast

Summary As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distribute the computational resources for managing that information becomes more pronounced. In order to ensure that all of the distributed nodes in our systems agree with each other we need to build mechanisms to properly handle replication of data and conflict resolution.

article thumbnail

Do These Things if you Want to Succeed as an HR Professional

U-Next

Success in today’s businesses has taken several meanings. Apart from just pay hikes and promotions, success has gotten new dimensions that have been of very recent origins. Today, success has become synonymous with happiness at a workplace, challenging tasks, compensatory rewards, incentives, authoritative job profiles, influential role, and more. The current talent pools in organizations have become wiser and more mature than their previous generation counterparts.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Engineering is Critical to Big Data Success

Cloudera

I mentioned in an earlier blog titled, “Staffing your big data team, ” that data engineers are critical to a successful data journey. That said, most companies that are early in their journey lack a dedicated engineering group. And the longer it takes to put a team in place, the likelier it is that your big data project will stall. The data engineering team is responsible for collecting and ingesting batch and stream-oriented data, inventorying the data, working through ingest bottlenecks, and d

article thumbnail

Building a Better Tech Radar

Zalando Engineering

How Zalando helps its engineering teams navigate the tech landscape Zalando has more than 200 engineering teams, which regularly face tricky technology choices. To help them make good decisions, we created the Zalando Tech Radar as a "navigation" tool. Inspired by ThoughtWorks , it assigns each technology to one of four rings — Adopt, Trial, Assess and Hold — which represents the current consensus within Zalando.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing

article thumbnail

Postgres Internals: Building a Description Tool

Dataquest

In previous blog posts , we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood.

More Trending

article thumbnail

Recap of Hadoop News for December 2017

ProjectPro

News on Hadoop - December 2017 Apache Impala gets top-level status as open source Hadoop tool.TechTarget.com, December 1, 2017. The massively parallel processing engine born at Cloudera acquired the status of a top-level project within the Apache Foundation. The main objective of Impala is to provide SQL-like interactivity to big data analytics just like other big data tools - Hive, Spark SQL, Drill, HAWQ , Presto and others.

Hadoop 52
article thumbnail

The Top 10 Most Popular VISION Blogs of 2017

Cloudera

The New Year is a great time to make resolutions, but it’s also a great time to reflect on the previous year. Before we get too far into 2018, let’s take a look at the ten most popular Cloudera VISION blogs from 2017. Today is an important day in the life of Cloudera. On April 28, 2017, Mike Olson , as one of the founders of Cloudera, writes about the initial public offering, and what the milestone means.

article thumbnail

Why We Do Scala in Zalando

Zalando Engineering

Leveraging the full power of a functional programming language In Zalando Dublin, you will find that most engineering teams are writing their applications using Scala. We will try to explain why that is the case and the reasons we love Scala. This content is coming both from my own experience and the team I'm working with in building the new Zalando Customer Data Platform.

Scala 40
article thumbnail

The three certainties in life: death, taxes and GDPR

Cloudera

As the GDPR clock ticks down to implementation, it is clear that this will not be a non-event like the Millennium Bug – it will happen and there will be dire consequences, potentially company-closures, in the event of non-compliance. The three certainties in life: death, taxes and GDPR. 1999 was a milestone year for the development of technology.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Cybersecurity On Call: Goodbye 2017, Hello 2018! Top Five Tips from 2017

Cloudera

This was an amazing year for our inaugural “Cybersecurity On Call” season. It was truly an honor hosting amazing guests as we explored the world of cybersecurity. From industry thought leaders, to New York Times best sellers, to hackers, I learned a ton about the future of cybersecurity and I hope you did as well. Today’s episode won’t be our usual programming, today is our end of the year special where we will dive into our top five tips from this year’s season.

article thumbnail

Breaking through the clouds in Asia Pacific

Cloudera

To quote Sam Walton, Walmart’s founder, “There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else”. This very much forms the lens for our focus here at Cloudera Asia Pacific. And it is this unwavering passion and commitment that has driven the team to strive for the very best for our customers and partners, and milestones that we have collectively attained since 2015.

Cloud 40
article thumbnail

Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16

Data Engineering Podcast

Summary Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a simpler way to distribute and version data sets among collaborators the Dat Project was created. In this episode Danielle Robinson and Joe Hand explain how the project got started, how it functions, and some of the many ways that it can be used.

Data 100
article thumbnail

The Faces Behind the Fashion-MNIST

Zalando Engineering

We talk to Han and Kashif from Zalando Research Employer Branding Specialist Data Science, Nana Yamazaki catches up with the team using literal fashion icons in Deep Learning. Tell us about Fashion-MNIST. What did you want to accomplish? Fashion-MNIST is a freely available dataset of Zalando articles that most importantly has the same format as the MNIST dataset.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Rabbit in the Cloud

Zalando Engineering

How we deployed RabbitMQ on AWS In an effort to move away from our legacy monolithic service, we decided take on the challenge of building a new communication platform based on a micro service architecture, which would be more focused and more easily manageable. The challenge was exciting and big; we had to make crucial decisions early on, decisions that we would have to live with for the foreseeable future.

Cloud 40
article thumbnail

Rock Solid Kafka and ZooKeeper Ops on AWS

Zalando Engineering

Reducing ops effort while maintaining Kafka and Zookeeper This post is targeted to those looking for ways to reduce ops effort while maintaining Kafka and Zookeeper deployments on AWS and also improving their availability and stability. In a nutshell, we are going to explain how using Elastic Network Interfaces can improve over a straight out of the box setup.

Kafka 40
article thumbnail

Staffing your big data team

Cloudera

Building the right team is as important as assembling the right IT infrastructure – and the needs differ just as dramatically. A traditional BI and analytics organization consists of three main groups: Analysts that develop reports often using sample data. The data management team – modelers that take requests, find data, and develop models to answer the questions.

article thumbnail

Six Strategies for Advancing Customer Knowledge: Bringing Data Together

Cloudera

I often meet with our customers to help them understand how to connect modern technology to business success. The ever-present question at these encounters is “Where do I start?” For them, they may understand that they need a data-driven strategy or the culture may aim to take a shift to being guided by data. These are often goals set by the executive team with little guidance on how to execute or implement.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Simplicity by Distributing Complexity

Zalando Engineering

Building an aggregated view of data in the event-driven microservice architecture In the world of microservices, where a domain model gets decomposed into related, but independently handled entities, we often face the challenge of building an aggregate view of the data that brings together different parts of that model. While this can already be interesting with “traditional” designs, the move to event-driven architectures can magnify these difficulties, especially with simplistic event streams.

Media 40
article thumbnail

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

Data Engineering Podcast

Summary The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so the rest of the sources of knowledge in a company are housed in so-called “Dark Data” sets. In this episode Alex Ratner explains how the work that he and his fellow researchers are doing on Snorkel can be used to extract value by leveraging labeling functions written by

article thumbnail

Drawn Together

Zalando Engineering

How to talk about design in the agile world How we improved design communication in the Retail Ops Team With an agile and lean approach, most of us here at Zalando changed the way we build digital products. Design processes also evolved,  with  designers usually working alongside cross-functional product teams. But, at first, one thing did not change too much: how we talk about the design.