Sat.Aug 17, 2019 - Fri.Aug 23, 2019

article thumbnail

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale.

Big Data 100
article thumbnail

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

KDnuggets

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

Coding 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

Building 111
article thumbnail

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Rockset

Many of our users implement operational reporting and analytics on DynamoDB using Rockset as a SQL intelligence layer to serve live dashboards and applications. As an engineering team, we are constantly searching for opportunities to improve their SQL-on-DynamoDB experience. For the past few weeks, we have been hard at work tuning the performance of our DynamoDB ingestion process.

Bytes 52
article thumbnail

How To Get Promoted In Product Management

Speaker: John Mansour

If you're looking to advance your career in product management, there are more options than just climbing the management ladder. Join our upcoming webinar to learn about highly rewarding career paths that don't involve management responsibilities. We'll cover both career tracks and provide tips on how to position yourself for success in the one that's right for you.

article thumbnail

Data is Not the New Oil. Data is Water!

Teradata

If you work in data analytics or a related field, you’ve probably heard the mantra that data is the new oil. But data is not oil, it's water. Find out why.

Data 45
article thumbnail

Is Kaggle Learn a “Faster Data Science Education?”

KDnuggets

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

Education 118

More Trending

article thumbnail

The Kafka Connect Plugin for Rockset and How It Works

Rockset

Rockset continuously ingests data streams from Kafka, without the need for a fixed schema, and serves fast SQL queries on that data. We created the Kafka Connect Plugin for Rockset to export data from Kafka and send it to a collection of documents in Rockset. Users can then build real-time dashboards or data APIs on top of the data in Rockset. This blog covers how we implemented the plugin.

Kafka 40
article thumbnail

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Teradata

Teradata is named yet again to the Constellation ShortList™ for “Hybrid and Multi-Cloud Relational Database Management Systems." Read more!

Cloud 15
article thumbnail

Top Handy SQL Features for Data Scientists

KDnuggets

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

SQL 116
article thumbnail

Applying Netflix DevOps Patterns to Windows

Netflix Tech

Baking Windows with Packer By Justin Phelps and Manuel Correa Customizing Windows images at Netflix was a manual, error-prone, and time consuming process. In this blog post, we describe how we improved the methodology, which technologies we leveraged, and how this has improved service deployment and consistency. Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of t

AWS 82
article thumbnail

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

article thumbnail

Announcing Bottom Navigator

Pandora Engineering

An Android Multiple Backstack Bottom Navigation Library Pandora’s latest mobile redesign brings the bottom navigation pattern to our apps. Bottom navigation has become a popular design choice for many apps due to its many advantages including easy one-handed use and enhanced discoverability of top app destinations. When Pandora embarked on this project our designers had a clear vision of how navigation should work, a vision that in many ways is familiar to users of other popular apps like Instag

article thumbnail

Optimizing Bulk Load in RocksDB

Rockset

What’s the fastest we can load data into RocksDB? We were faced with this challenge because we wanted to enable our customers to quickly try out Rockset on their big datasets. Even though the bulk load of data in LSM trees is an important topic, not much has been written about it. In this post, we’ll describe the optimizations that increased RocksDB’s bulk load performance by 20x.

Bytes 40
article thumbnail

Order Matters: Alibaba’s Transformer-based Recommender System

KDnuggets

Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.

Systems 108
article thumbnail

A Guide to the Confluent Verified Integrations Program

Confluent

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This post specifically outlines the process by which we verify partner integrations, and is a means of letting the world know about our partner’s contributions to our connector ecosystem.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Detecting stationarity in time series data

KDnuggets

Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

Data 107
article thumbnail

Understanding Decision Trees for Classification in Python

KDnuggets

This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.

Python 103
article thumbnail

Deep Learning for NLP: Creating a Chatbot with Keras!

KDnuggets

Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?

article thumbnail

Gender Diversity in AI Research

KDnuggets

Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.

91
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Proptech and the proper use of technology for house sales prediction

KDnuggets

Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.

article thumbnail

Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

KDnuggets

What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.

article thumbnail

Automate Stacking In Python: How to Boost Your Performance While Saving Time

KDnuggets

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.

Python 84
article thumbnail

Math for Programmers

KDnuggets

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Which skills / knowledge areas do you currently have, and which do you want to add or improve?

KDnuggets

New KDnuggets survey looks to find out what skills our readers currently use, and which they are looking to add or improve. Take a few minutes to participate.

article thumbnail

Crafting an Elevator Pitch for your Data Science Startup

KDnuggets

If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.

article thumbnail

Comparing Decision Tree Algorithms: Random Forest vs. XGBoost

KDnuggets

Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.

article thumbnail

eBook: How to Enhance Privacy in Data Science

KDnuggets

Check out this eBook, How to Enhance Privacy in Data Science, to equip yourself with the tools to enhance privacy in data science, including transforming data in a manner that protects the privacy, an overview of the challenges and opportunities of privacy-aware analytics, and more.

article thumbnail

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.

article thumbnail

Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)

KDnuggets

Crunch is coming to Budapest, Hungary on 16-18 Oct. Use code KDNuggets to save on Data Science, Data Engineering, or BI tracks. But first, read this interview with keynote speaker Andy Cotgreave.

BI 60
article thumbnail

How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

KDnuggets

As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.

article thumbnail

Lincoln Clean Energy: Director, Asset Performance [Austin, TX]

KDnuggets

Seeking an Asset Performance Director, a role which requires an individual that possesses a strong technical skill set and the ability to communicate findings effectively throughout the organization.

53
article thumbnail

University of North Florida: Data Scientist [Jacksonville, FL]

KDnuggets

Seeking a Data Scientist to design, develop, and execute data analytics projects and initiatives, and deliver findings and recommendations to support the mission of the university.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.