June, 2020

article thumbnail

Aws Account

Start Data Engineering

1. AWS account Sign up for an AWS account at AWS Sign Up. You will be eligible for some free services for the first time sign up, ref: AWS Free Tier get your access key by clicking on your name -> My Security Credentials on the top pane and then clicking Create New Access Key.

AWS 130
article thumbnail

Business Intelligence meets Data Engineering with Emerging Technologies

Simon Späti

Today we have more requirements with ever-growing tools and framework, complex cloud architectures, and with data stack that is changing rapidly. I hear claims: “Business Intelligence (BI) takes too long to integrate new data”, or “understanding how the numbers match up is very hard and needs lots of analysis”. The goal of this article is to make business intelligence easier, faster and more accessible with techniques from the sphere of data engineering.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

EC2 & Session Manager (Toronto Project)

Team Data Science

Welcome back to this Toronto Specific data engineering project. We left off last time concluding finance has the largest demand for data engineers who have skills with AWS, and sketched out what our data ingestion pipeline will look like. I began building out the data ingestion pipeline by launching an EC2 instance. I should note that if you have created an AWS account, but have not yet created an Identity Access Management (IAM) admin role, and are therefore still using root credentials, I am s

Project 130
article thumbnail

Stream Processing with IoT Data: Challenges, Best Practices, and Techniques

Confluent

The rise of IoT devices means that we have to collect, process, and analyze orders of magnitude more data than ever before. As sensors and devices become ever more ubiquitous, […].

Process 125
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Data Collection And Management To Power Sound Recognition At Audio Analytic

Data Engineering Podcast

Summary We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology. In this episode Dr. Chris Mitchell and Dr.

article thumbnail

There Are No Perfect Words…

Teradata

Juneteenth has been declared a U.S. holiday at Teradata, as we stand with the black community and reflect on what we can do to fight racism and injustice, and embrace diversity.

119
119

More Trending

article thumbnail

Netflix Studio Engineering Overview

Netflix Tech

By Steve Urban , Sridhar Seetharaman , Shilpa Motukuri , Tom Mack , Erik Strauss , Hema Kannan , CJ Barker Netflix is revolutionizing the way a modern studio operates. Our mission in Studio Engineering is to build a unified, global, and digital studio that powers the effective production of amazing content. [link] Netflix produces some of the world’s most beloved and award-winning films and series, including The Irishman, The Crown, La Casa de Papel, Ozark, and Tiger King.

article thumbnail

Understanding Azure Synapse Analytics

Advancing Analytics: Data Engineering

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace.

SQL 59
article thumbnail

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

Confluent

tl;dr When a client wants to send or receive a message from Apache Kafka®, there are two types of connection that must succeed: The initial connection to a broker (the […].

Kafka 122
article thumbnail

Bringing Business Analytics To End Users With GoodData

Data Engineering Podcast

Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Modernization Means Simplicity and Sophistication

Teradata

When it comes to being a modern data warehouse, your age really is just a number. It’s the underlying capabilities that actually count. Read more.

article thumbnail

A proven approach to land a Data Engineering job

Start Data Engineering

I have seen and been asked the following questions by students, backend engineers and analysts who want to get into the data engineering industry. What approach should i take to land a Data Engineering job? I really want to get into DE. What can I do to learn more about it? In this article, I will try to provide a general approach that you as a beginner, student, backend engineer or analyst can use to land your first data engineering job.

article thumbnail

How will Cloud HR Software change the human resources function?

U-Next

Today, the way Human Resources function has changed from how it used to operate in the 90s and early 2000. Introduction of Human Resource Management (HRM) software in the late 90s had already revolutionized the way HR departments across various industries functioned. The HR department is not contained in the back-office anymore, where dealing with paperwork and recruiting were the only processes they were involved in.

Cloud 59
article thumbnail

How to Create Near Real-time Models With Just dbt + SQL

dbt Developer Hub

Before I dive into how to create this, I have to say this. You probably don’t need this. I, along with my other Fishtown colleagues, have spent countless hours working with clients that ask for near-real-time streaming data. However, when we start digging into the project, it is often realized that the use case is not there. There are a variety of reasons why near real-time streaming is not a good fit.

SQL 52
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Spring for Apache Kafka – Beyond the Basics: Can Your Kafka Consumers Handle a Poison Pill?

Confluent

You know the fundamentals of Apache Kafka®. You are a Spring Boot developer working with Apache Kafka. You have chosen Spring Kafka for your integration. You have implemented your first […].

Kafka 120
article thumbnail

Accelerate Your Machine Learning With The StreamSQL Feature Store

Data Engineering Podcast

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage.

article thumbnail

Announcing Vantage Trial

Teradata

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud along with easy-to-use, web-based tools and applications for performing advanced analytics. Learn more.

Cloud 98
article thumbnail

3 Key techniques, to optimize your Apache Spark code

Start Data Engineering

Intro A lot of tutorials show how to write spark code with just the API and code samples, but they do not explain how to write “efficient Apache Spark” code.

Coding 130
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Learnings from Distributed XGBoost on Amazon SageMaker

Zalando Engineering

Overview XGBoost is a popular Python library for gradient boosted decision trees. The implementation allows practitioners to distribute training across multiple compute instances (or workers), which is especially useful for large training sets. One tool used at Zalando for deploying production machine learning models is the managed service from Amazon called SageMaker.

article thumbnail

Top 10 sessions for MongoDB.live 2020

Rockset

MongoDB World is going all virtual with MongoDB.live. Registration is free and there’s tons of content to get excited about. It’s so easy to get overwhelmed on what to pick (heck, you could just watch all of them)! If you’re short on time, fear not- here are our top 10 MongoDB sessions to watch out for: 10 Join the Data Movement: MongoDB and Apache Kafka One of the go-to picks for companies that need a streaming platform is Apache Kafka.

MongoDB 52
article thumbnail

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Confluent

When I have a small software project that I want to share with the world, I don’t write my own version control system with a web UI. I don’t even […].

Kafka 123
article thumbnail

Data Management Trends From An Investor Perspective

Data Engineering Podcast

Summary The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained steady, but as the industry matures new trends emerge and gain prominence. In this episode Astasia Myers of Redpoint Ventures shares her perspective as an investor on which categories she is paying particular attention to for the near to medium term.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Identifying the Infodemic Amidst the COVID-19 Pandemic

Teradata

Fighting the pandemic means fighting the misinformation that comes along with it. That's why one of our Global COVID-19 Hackathon teams created a tool to help identify fake news.

IT 98
article thumbnail

Aws Emr

Start Data Engineering

EMR AWS EMR is a managed service provided by AWS to run Spark, HDFS, HIVE and other select software.

AWS 130
article thumbnail

Launching the Engineering Blog

Zalando Engineering

Our Engineering Blog was launched in June 2020 after a long break of the previous tech blog. This post describes the technical setup behind engineering.zalando.com. You will learn: Which static site generator we selected and why. What customizations we applied to design the blog and the publishing process. How we serve static HTML using Skipper and S3.

article thumbnail

Getting Started - Time Series Charts

Preset

In this blog we will understand better what are Time Series and provide some examples of time series visualizations in Superset

40
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

How Merging Companies Will Give Rise to Unified Data Streams

Confluent

Company mergers are becoming more common as businesses strive to improve performance and grow market share by saving costs and eliminating competition through acquisitions. But how do business mergers relate […].

Data 113
article thumbnail

12 Data Quality Metrics That ACTUALLY Matter

Monte Carlo

One of our customers recently posed this question related to data quality metrics: I would like to set up an OKR for ourselves [the data team] around data availability. I’d like to establish a single data quality KPI that would summarize availability, freshness, quality. What’s the best way to do this? I can’t tell you how much joy this request brought me.

Data 59
article thumbnail

How to Leverage Advanced Analytics in the Healthcare Domain

Teradata

Learn how Teradata Vantage's advanced analytics capabilities can analyze and predict useful diagnoses and insights in biomedicine and healthcare.

article thumbnail

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

MongoDB.live took place last week, and Rockset had the opportunity to participate alongside members of the MongoDB community and share about our work to make MongoDB data accessible via real-time external indexing. In our session, we discussed the need for modern data-driven applications to perform real-time aggregations and joins, and how Rockset uses MongoDB change streams and Converged Indexing to deliver fast queries on data from MongoDB.

MongoDB 52
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.