July, 2019

article thumbnail

Simplifying Data Integration Through Eventual Connectivity

Data Engineering Podcast

Summary The ETL pattern that has become commonplace for integrating data from multiple sources has proven useful, but complex to maintain. For a small number of sources it is a tractable problem, but as the overall complexity of the data ecosystem continues to expand it may be time to identify new ways to tame the deluge of information. In this episode Tim Ward, CEO of CluedIn, explains the idea of eventual connectivity as a new paradigm for data integration.

article thumbnail

Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras

KDnuggets

Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras.

Python 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Our Commitment to Open Source Software

Cloudera

Open source has been core to the missions of both Hortonworks and Cloudera and central to our values and culture. With more than 700 engineers in the new Cloudera, our company writes a prodigious amount of open source code each year that’s contributed to more than 30 different open source projects. We’re also a very innovative open source company, having collectively launched more than a dozen new open source projects since the founding of the two companies. .

article thumbnail

The Power of Integrated Data and Analytics

Teradata

Integrated data and analytics has a proven track record of helping organize operations, enhance customer experience and improve revenue and market growth.

Data 104
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

KSQL Training for Hands-On Learning

Confluent

I’ve been using KSQL from Confluent since its first developer preview in 2017. Reading, writing, and transforming data in Apache Kafka ® using KSQL is an effective way to rapidly deliver event streaming applications for clients (e.g., streaming insurance events ). Plus, I’ve also had the opportunity to deploy KSQL in some not-so-serious hobby projects (see Noise Mapping with KSQL, a Raspberry Pi and a Software-Defined Radio and ML and KSQL Let Me Know When I’ve Left the Heater Running ).

Kafka 83
article thumbnail

Evolution of Netflix Conductor:

Netflix Tech

v2.0 and beyond By Anoop Panicker and Kishore Banala Conductor is a workflow orchestration engine developed and open-sourced by Netflix. If you’re new to Conductor, this earlier blogpost and the documentation should help you get started and acclimatized to Conductor. Netflix Conductor: A microservices orchestrator In the last two years since inception, Conductor has seen wide adoption and is instrumental in running numerous core workflows at Netflix.

More Trending

article thumbnail

Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning

KDnuggets

Check out our latest Top 10 Most Popular Data Science and Machine Learning podcasts available on iTunes. Stay up to date in the field with these recent episodes and join in with the current data conversations.

article thumbnail

Crafting the Perfect Internship Playlist

Pandora Engineering

Credit: Kanok Sulaiman Disclaimer: These are my experiences from being a Pandora software developer intern in the summer of 2019. All opinions expressed are my own, and represent no one except myself. I recently spent the last summer of my undergraduate program as an intern for Pandora Media in Oakland, CA. I gained a lot from my experience, and I’m writing this post to detail the application process, the lessons that I learned, and the company culture.

Java 52
article thumbnail

How to Enjoy Hybrid Partitioning with Teradata Columnar

Teradata

Teradata Vantage's NewSQL Engine's performance-enhancing options include column-row hybrid partitioning. Find out how to take advantage of this great feature.

article thumbnail

Introduction to Streaming Data

Cloud Academy

Designing a streaming data pipeline presents many challenges, particularly around specific technology requirements. When designing a cloud-based solution, an architect is no longer faced with the question, “How do I get this job done with the technology we have?” but rather, “What is the right technology to support my use case?” In this blog post, we will walk through some initial scoping steps and walk through an example.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Open Source: June Updates - New releases, continue to foster diversity and inclusion in tech

Zalando Engineering

Project Highlights Kopf - Kubernetes Operator Pythonic Framework now supports built-in resources and can be used to write controllers of any kind (pods, namespaces, mixed), not only of custom resources. Check out the latest release for more details [link] Skipper publishes new releases weekly. Some of the important features were implemented such as support to proxy Kubernetes API server and support Kubernetes externalName services from ingress.

AWS 52
article thumbnail

Data Labeling That You Can Feel Good About With CloudFactory

Data Engineering Podcast

Summary Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box.

article thumbnail

This New Google Technique Help Us Understand How Neural Networks are Thinking

KDnuggets

Recently, researchers from the Google Brain team published a paper proposing a new method called Concept Activation Vectors (CAVs) that takes a new angle to the interpretability of deep learning models.

article thumbnail

What is Data Extraction and How It Can Serve Your Business

InData Labs

In the highly competitive business world of today, data reign supreme. Customer personal data, comprehensive operating statistics, sales figures, or inter-company information may play a core role in strategic decision making. It’s vital to keep an eye on the quantity and quality of data that can be captured and extracted from different web sources.

IT 52
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Enterprise Data Strategy: The Upside of Scarce Funding

Teradata

In a cost-cutting culture, directly linking data projects to top business initiatives is a good way to keep them from getting clipped. Learn more.

Data 73
article thumbnail

Educating Data Analysts at Scale: Cloudera Launches Modern Big Data Analysis with SQL on Coursera

Cloudera

At a time when machine learning, deep learning, and artificial intelligence capture an outsize share of media attention, jobs requiring SQL skills continue to vastly outnumber jobs requiring those more advanced skills. Influential data scientists often point to SQL as the most important yet underrated skill for anyone who works with data. SQL is today—and will remain for the foreseeable future—a vital foundational skill for a wide range of data professionals working in different roles across dif

article thumbnail

From Good to Great: How Operational Analytics Gives Businesses a Real-Time Edge

Rockset

Published on Forbes All businesses today are a series of real-time events. But what separates the good from the great is how they capture and operationalize that data. Companies like Uber have talked in-depth about how they use real-time analytics to create seamless trip experiences, from determining the most convenient rider pick-up points to predicting the fastest routes.

BI 40
article thumbnail

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

Data Engineering Podcast

Summary Anomaly detection is a capability that is useful in a variety of problem domains, including finance, internet of things, and systems monitoring. Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. In this interview he explains the system design that he tested, his findings for how these tools were able to work together, and how they behaved at diffe

Kafka 100
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

7 Tips for Dealing With Small Data

KDnuggets

At my workplace, we produce a lot of functional prototypes for our clients. Because of this, I often need to make Small Data go a long way. In this article, I’ll share 7 tips to improve your results when prototyping with small datasets.

Datasets 121
article thumbnail

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. This API enables users to leverage ready-to-use components that can stream data from external systems into Kafka topics, as well as stream data from Kafka topics into external systems.

MongoDB 21
article thumbnail

How Analytics Answer the Most Challenging Business Questions

Teradata

Analytics can help enterprises answer the toughest business questions by leveraging all of the data across an organization.

Data 80
article thumbnail

Solving the Pain Points of Big Data Management

Cloudera

Every business aims to deliver products and services quickly and efficiently based upon customer wants and needs. Today, much of that speed and efficiency relies on insights driven by big data. Yet big data management often serves as a stumbling block, because many businesses continue to struggle with how to best capture and analyze their data. Unorganized data presents another roadblock.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

SQL Query Planning for Operational Analytics

Rockset

Rockset is a schemaless SQL data platform. It is designed to support SQL on raw data. While most SQL databases are strongly and statically typed, data within Rockset is strongly but dynamically typed. Dynamic typing makes it difficult for us to adopt off-the-shelf SQL query optimizers since they are designed for statically typed data where the types of the columns are known ahead of time.

SQL 40
article thumbnail

Scale Your Analytics On The Clickhouse Data Warehouse

Data Engineering Podcast

Summary The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production.

article thumbnail

Ten more random useful things in R you may not know about

KDnuggets

I had a feeling that R has developed as a language to such a degree that many of us are using it now in completely different ways. This means that there are likely to be numerous tricks, packages, functions, etc that each of us use, but that others are completely unaware of, and would find useful if they knew about them.

IT 120
article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams

Confluent

Building off part 1 where we discussed an event streaming architecture that we implemented for a customer using Apache Kafka, KSQL, and Kafka Streams, and part 2 where we discussed how Gradle helped us address the challenges we faced developing, building, and deploying the KSQL portion of our application, here in part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices.

Kafka 87
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

What Should Your Enterprise Expect from its Cloud Analytics Vendor?

Teradata

Large enterprises are investing heavily in cloud-based analytics technologies. What qualities should they be looking for in these cloud vendors? Find out more.

Cloud 69
article thumbnail

Has the Data Engineer replaced the Business Intelligence Developer?

Advancing Analytics: Data Engineering

It seems these days that every person I talk to is either a scientist, engineer or architect, we’re fairly obsessed with aligning our technical roles to respected professions that denote the amount of education & training that go into it – and that’s fair given how much time & effort goes into attaining these roles… but it really doesn’t help us define them.

article thumbnail

Methods for Running SQL on JSON in PostgreSQL, MySQL and Other Relational Databases

Rockset

One of the main hindrances to getting value from our data is that we have to get data into a form that’s ready for analysis. It sounds simple, but it rarely is. Consider the hoops we have to jump through when working with semi-structured data, like JSON, in relational databases such as PostgreSQL and MySQL. JSON in Relational Databases In the past, when it came to working with JSON data, we’ve had to choose between tools and platforms that worked well with JSON or tools that provided good suppor

article thumbnail

Bringing Rich Experiences to Memory-constrained TV Devices

Netflix Tech

Bringing Rich Experiences to Memory-Constrained TV Devices By Jason Munning, Archana Kumar, Kris Range Netflix has over 148M paid members streaming on more than half a billion devices spanning over 1,900 different types. In the TV space alone, there are hundreds of device types that run the Netflix app. We need to support the same rich Netflix experience on not only high-end devices like the PS4 but also memory and processor-constrained consumer electronic devices that run a similar chipset as w

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.