Sat.Sep 25, 2021 - Fri.Oct 01, 2021

article thumbnail

dbt(Data Build Tool) Tutorial

Start Data Engineering

1. Introduction 2. Dbt, the T in ELT 3. Project 3.1. Prerequisites 3.2. Configurations and connections 3.2.1. profiles.yml 3.2.2. dbt_project.yml 3.3 Data flow 3.3.1. Source 3.3.2. Snapshots 3.3.3. Staging 3.3.4. Marts 3.3.4.1. Core 3.3.4.2. Marketing 3.4. dbt run 3.5. dbt test 3.6. dbt docs 3.7. Scheduling 4. Conclusion 5. Further reading 6. References 1.

Building 130
article thumbnail

How to Take Notes in 2021?

Simon Späti

Taking notes helps you not to forget things, teaches you to express yourself, brainstorms your thoughts, research a topic, and so many more things. I used to take notes all my life. Maybe it’s because I’m Swiss, they say we are well organised. I used to write in OneNote for 10+ years. I have notebooks for my bachelor studies and every workplace I worked.

IT 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Uber Engineering

Introduction. The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions … The post Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner appeared first on Uber Engineering Blog.

article thumbnail

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Confluent

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed […].

Cloud 120
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

#ClouderaLife Spotlight: Liz Lashgari, Senior Employee Relations Manager

Cloudera

September 15th marks the beginning of National Hispanic Heritage Month – a month in which the contributions and influence of Hispanic people on the history, culture and achievements of the US are recognized. To commemorate the month, we are spotlighting an employee who is as active within the community as they are in the company and LatinX Employee Resource Group (ERG).

article thumbnail

How We Build Micro Frontends With Lattice

Netflix Tech

Written by Michael Possumato , Nick Tomlin , Jordan Andree , Andrew Shim , and Rahul Pilani. As we continue to grow here at Netflix, the needs of Revenue and Growth Engineering are rapidly evolving; and our tools must also evolve just as rapidly. The Revenue and Growth Tools (RGT) team decided to set off on a journey to build tools in an abstract manner to have solutions readily available within our organization.

More Trending

article thumbnail

Kafka Connect Fundamentals: What is Kafka Connect?

Confluent

Apache Kafka® is an enormously successful piece of data infrastructure, functioning as the ubiquitous distributed log underlying the modern enterprise. It is scalable, available as a managed service, and has […].

Kafka 98
article thumbnail

Serving the Public Through Data

Cloudera

Digital transformation has been talked about for many years, but the pandemic has accelerated the digital transformation journeys for many enterprises. Forced to adapt to changes in the business landscape and customer behavior, businesses have adopted more digital tools and technologies to drive innovation and increase resilience. . While going digital may be commonly associated with the private sector, governments and the organizations in the public sector have much to gain by going digital as

Medical 82
article thumbnail

Rockset Is Now SOC 2 Type II Compliant

Rockset

The Rockset team is proud to announce that we have been accredited as SOC 2 Type II compliant. Our customers entrust Rockset with their data, and now they have rigorous, independent assurance that we protect it by following security best practices. What is SOC 2 Type II? SOC is one of several System and Organization Controls audits developed by the American Institute of CPAs (AICPA), the world’s largest member association of accountants.

article thumbnail

Machine Learning (ML) vs NLP - What's the Difference?

ProjectPro

The term artificial intelligence is always synonymously used Awith complex terms like Machine learning, Natural Language Processing, and Deep Learning that are intricately woven with each other. One of the trending debates is that of the differences between natural language processing and machine learning. This post attempts to explain two of the crucial sub-domains of artificial intelligence - Machine Learning vs.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Trigger AWS Lambda Functions Directly from a Confluent Cloud Apache Kafka Topic

Confluent

The distributed architecture of Apache Kafka® can cause the operational burden of managing it to quickly become a limiting factor for adoption and developer agility. For this reason, it is […].

Kafka 97
article thumbnail

Closing the Gap Between the Digital Haves and Have-Nots

Cloudera

by Pedro Pereira. The digital race is on. To pull ahead of the pack, a company needs to know what to do with its data. Without a data-driven strategy, you’re bound to lose ground to competitors who apply their data to operational improvements, product development, go-to-market strategies, and the customer experience. It isn’t enough to collect, interpret, and act on the data.

article thumbnail

Survey: Cloud, Data Management, and Emerging Technology Needs Are Driving Changes to 2022 IT Plans

Teradata

Our latest global industry survey, in partnership with Vanson Bourne, reveals that enterprises are contemplating long-term, data-focused IT investments to address changing market conditions.

article thumbnail

The Ultimate Guide to Statistics for Machine Learning Beginners

ProjectPro

Probability and Statistics are two intertwined topics that smoothen one’s path to becoming a Machine Learning pro. In this blog, you will find a detailed description of all you need to learn about probability and statistics for machine learning. If you are a regular user of social media sites, you must have encountered on your timeline at least one of the memes that reflect machine learning is nothing but glamorised statistics.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Connecting a Linux VPS to an AWS VPC using a S2S VPN with Static Routing

Hepta Analytics

This blogpost will cover how to connect a standalone Virtual Private Server (VPS) running Linux (specifically Debian) to AWS’ Virtual Private Cloud (VPC) using a site-to-site VPN with Static Routing. This blogpost is relevant for those who find themselves having to integrate their AWS infrastructure with external sites where they do not own, or do not have permission to configure the gateway device (e.g. a Cisco ASA appliance).

AWS 52
article thumbnail

Migrate to CDP Private Cloud Base – A Step by Step Guide

Cloudera

Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. The overall upgrade follows a seven-step process illustrated below. In the video below we walk through a complete end to end upgrade of CDH to CDP Private Cloud Base.

Cloud 72
article thumbnail

Databricks Delta Cache and Spark Cache

Advancing Analytics: Data Engineering

As data sizes and demand increases as time goes on, you often see slowness on Databricks this can be due to number of factors from security, network transfers, read/write requests, and memory space. A common cause of this is when Databricks has to contently reads parquet files from the file system, increasing the I/O and network throughput. Databricks has to manage and monitor the cluster to ensure it does not exceed the I/O treads threshold and that the workers have enough memory to cope with t

SQL 52
article thumbnail

Big Data Engineer Salary - How Much Can You Make in 2023?

ProjectPro

Big Data Engineer is one of the most popular job profiles in the data industry. But, wait. Is it actually worth pursuing? Does it offer good pay? Read this blog to find out! This blog on Big Data Engineer salary gives you a clear picture of the salary range according to skills, countries, industries, job titles, etc. So, let's get started! Big Data gets over 1.2 trillion searches on Google annually.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Best Practices for Leveraging Orphan Data in Your Analytical Ecosystem

Teradata

Businesses struggle to manage orphan data -- data not maintained by traditional transaction systems. Learn what your company can do to turn orphan data challenges into competitive advantages.

Data 52
article thumbnail

Web Scraping & Getting Data with Beautiful Soup | Domino

Domino Data Lab: Data Engineering

Data is all around us, from the spreadsheets we analyse on a daily basis, to the weather forecast we rely on every morning or the webpages we read. In many cases, the data we consume is simply given to us, and a simple glance is enough to make a decision. For example, knowing that the chance of rain today is 75% all day makes me take my umbrella with me.

Data 40
article thumbnail

Armadillo makes audio players in Android easy

Scribd Technology

Armadillo is the fully featured audio player library Scribd uses to play and download all of its audiobooks and podcasts, which is now open source. It specializes in playing HLS or MP3 content that is broken down into chapters or tracks. It leverages Google’s Exoplayer library for its audio engine. Exoplayer wraps a variety of low level audio and video apis but has few opinions of its own for actually using audio in an Android app.

Media 40
article thumbnail

Correlation vs. Covariance

ProjectPro

Are you tired of searching the web for ‘correlation vs. covariance’ to understand the two terms better? If yes, read this article that compares correlation vs. covariance and explains the two popular statistical tools in detail. After the birth of the new domain of Data Science, data has become a prized possession for most companies. They rely on data science algorithms to understand customer behavior, predict sales, etc.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Everything You Need to Know About DataOps Solutions

DataKitchen

The post Everything You Need to Know About DataOps Solutions first appeared on DataKitchen.

52
article thumbnail

RudderStack and Mixpanel Announce Partnership Advancing Product Analytics for the Modern Data Stack

RudderStack

Mixpanel and RudderStack are proud to partner together to deliver better analytics to product teams everywhere, fueled by rich data from the data warehouse.

article thumbnail

Delivering Your Personal Data Cloud With Prifina

Data Engineering Podcast

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience.

Cloud 100
article thumbnail

CNN vs RNN- Choose the Right Neural Network for Your Project

ProjectPro

Machine learning (ML) is the study and implementation of algorithms that can mimic the human learning process. The algorithms’ goals are to enable a computer to think and make decisions without emphatic instructions from a human user. As we know it today, machine learning came into existence in 1959 when the pioneer computer programmer and game developer Arthur Samuel coined the phrase.

Project 52
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Group vs Fine-Grained Access Control in Cloudera Data Platform Public Cloud

Cloudera

Cloudera Data platform ( CDP ) provides a Shared Data Experience ( SDX ) for centralized data access control and audit in the Enterprise Data Cloud. The Ranger Authorization Service (RAZ) is a new service added to help provide fine-grained access control (FGAC) for cloud storage. We covered the value this new capability provides in a previous blog. RAZ for S3 and RAZ for ADLS introduce FGAC and Audit on CDP’s access to files and directories in cloud storage making it consistent with the re

article thumbnail

Overcoming the Limitations of Client-Side Form Tracking With Webhooks

RudderStack

How to use RudderStack’s Webhook Source to submit form data to RudderStack without it being susceptible to client-side script blocking tools.

IT 40
article thumbnail

Digging Into Data Reliability Engineering

Data Engineering Podcast

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability engineering practices in companies that rely heavily on their data.

article thumbnail

15 OpenCV Projects Ideas for Beginners to Practice in 2023

ProjectPro

This blog contains OpenCV project ideas for beginners and intermediate professionals. You will find interesting OpenCV based projects that are industry-relevant and easy to implement. Table of Contents What is OpenCV? OpenCV Projects Ideas OpenCV Projects for Beginners Image Processing Projects using OpenCV Simple OpenCV Projects with Source Code Interesting OpenCV Projects for Intermediate Professionals OpenCV Biology Projects Other Projects Ideas What is OpenCV?

Project 52
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.