Sat.Jan 28, 2023 - Fri.Feb 03, 2023

article thumbnail

Getting Started with The Basics of Docker

Analytics Vidhya

Introduction “Let’s containerize your code to ship worldwide!” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is. Let me explain it with an example. Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya.

Coding 257
article thumbnail

Table file formats - Change Data Capture: Delta Lake

Waitingforcode

It's time to start the 4th part of the Table file formats series. This time the topic will be Change Data Capture, so how to stream all changes made on the table. As for the 3rd part, I'm going to start with Delta Lake.

Data 147
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apple cracking down to enforce its RTO policy

The Pragmatic Engineer

Originally published 2 February 2023. 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of seven topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. Apple was the first Big Tech giant to mandate a proper return to the office and back in September 2022, this initiative was in full swing, being rolled out in the US and with 3 days per week in the office mandated in the UK.

IT 144
article thumbnail

Data News — Week 23.05

Christophe Blefari

Delivering the data news ( credits ) Hey you, it's already February. Every week same analysis for me. I plan too many tasks but I slowly deliver. I guess that's how it is. Still I love this Friday rendezvous that we have together. I'm still amazed by how I changed my old habits to add the writing in my workflow. And it brings me a lot of joy.

BI 130
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

The Impact of Big Data on Healthcare Decision Making

Analytics Vidhya

Introduction Big data is revolutionizing the healthcare industry and changing how we think about patient care. In this case, big data refers to the vast amounts of data generated by healthcare systems and patients, including electronic health records, claims data, and patient-generated data. With the ability to collect, manage, and analyze vast amounts of data, […] The post The Impact of Big Data on Healthcare Decision Making appeared first on Analytics Vidhya.

article thumbnail

YARN or Kubernetes for Apache Spark?

Waitingforcode

I've written my first Kubernetes on Apache Spark blog post in 2018 with a try to answer the question, what Kubernetes can bring to Apache Spark? Four years later this resource manager is a mature Spark component, but a new question has arisen in my head. Should I stay on YARN or switch to Kubernetes?

More Trending

article thumbnail

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

KDnuggets

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

article thumbnail

How to Develop Serverless Code Using Azure Functions?

Analytics Vidhya

Introduction Azure Functions is a serverless computing service provided by Azure that provides users a platform to write code without having to provision or manage infrastructure in response to a variety of events. Whether we are analyzing IoT data streams, managing scheduled events, processing document uploads, responding to database changes, etc. Azure functions allow developers […] The post How to Develop Serverless Code Using Azure Functions?

Coding 237
article thumbnail

What's new on the cloud for data engineers - part 7 (05-08.2022)

Waitingforcode

Four months in cloud history is a huge period of time. Even when 2 of the 4 months are the usual "holiday" months. As you can guess from the title, it's time to see what changed recently on the cloud from a data engineering perspective!

article thumbnail

AI / ML Survival Guide: Conquer DataOps and Data Composability Challenges and Transform into a Truly Data-Driven Organization

The Modern Data Company

Get to the Future Faster – Modernize Your Manufacturing Data Architecture Without Ripping and Replacing Implementing customer lifetime value as a mission-critical KPI has many challenges. Companies need consistent, high-quality data and a straightforward way to measure CLV. In the past, organizations have struggled to implement CLV as a practical, value-generating metric, but a new data solution could help.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

10 Free Machine Learning Courses from Top Universities

KDnuggets

Learn the basics of machine learning, including classification, SVM, decision tree learning, neural networks, convolutional, neural networks, boosting, and K nearest neighbors.

article thumbnail

Top 10 Applications of Sentiment Analysis in Business

Analytics Vidhya

Introduction We are all aware of the Internet’s explosive expansion as a primary source of information and a platform for opinion expression. It has now become essential to gather and analyze the ever-expanding data that follows. While in the past, manual analysis of data has been possible and even served us well, the same cannot […] The post Top 10 Applications of Sentiment Analysis in Business appeared first on Analytics Vidhya.

Data 234
article thumbnail

Predicate pushdown, why it doesn't work every time?

Waitingforcode

Pushdowns in Apache Spark are great to delegate some operations to the data sources. It's a great way to reduce the data volume to be processed in the job. However, there is one important gotcha. Watch out the definition of your predicate because from time to time, even though the pushdown predicate is supported by the data source, the predicate can still be executed by the Apache Spark job!

IT 130
article thumbnail

A Year of Modern: Our Top 2022 Blog Posts — Chosen by You

The Modern Data Company

Another year, another chance to learn more about the world of data. In 2023, The Modern Data Company (Modern) hopes to reach more companies and organizations with our data operating system, build incredible value from existing and upcoming data assets, and share insights into major shifts in what it means to be data-driven. If you haven’t been with us long, we had some incredible pieces in the past few years.

Retail 98
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

skops: a new library to improve scikit-learn in production

KDnuggets

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

IT 133
article thumbnail

Practicing Machine Learning with Imbalanced Dataset

Analytics Vidhya

Introduction In today’s world, machine learning and artificial intelligence are widely used in almost every sector to improve performance and results. But are they still useful without the data? The answer is No. The machine learning algorithms heavily rely on data that we feed to them. The quality of data we feed to the algorithms […] The post Practicing Machine Learning with Imbalanced Dataset appeared first on Analytics Vidhya.

article thumbnail

Table formats - reading: Delta Lake

Waitingforcode

In the previous blog post about Delta Lake you discovered the logic for the writing part. Meantime Delta Lake 2 was released and it's for this brand new version that I'm going to share with you some findings related to the data reading.

IT 130
article thumbnail

Exception Handling Sql Scripting

Cloudyard

Read Time: 1 Minute, 32 Second During this post we will discuss about the Exception handling in snowflake via sql scripting. Snowflake Scripting raises an exception if an error occurs while executing a statement. When an exception is raised in a Snowflake Scripting block, Snowflake Scripting attempts to find a handler for that exception. If there is no handler for the exception in the current block or in any enclosing blocks, execution of the block stops and code reports error.

SQL 98
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Learn Machine Learning From These GitHub Repositories

KDnuggets

Kickstart your Machine Learning career with these curated GitHub repositories.

article thumbnail

An Ultimate Manual to Apache Oozie

Analytics Vidhya

Introduction Big data processing is crucial today. Big data analytics and learning help corporations foresee client demands, provide useful recommendations, and more. Hadoop, the Open-Source Software Framework for scalable and scattered computation of massive data sets, makes it easy. While MapReduce, Hive, Pig, and Cascading are all useful tools, completing all necessary processing or computing […] The post An Ultimate Manual to Apache Oozie appeared first on Analytics Vidhya.

Hadoop 230
article thumbnail

Observable metrics

Waitingforcode

Observability is a hot topic nowadays, not only for the data but also the software industry. Apache Spark innovates in this field a lot, including new metrics for Structured Streaming and an important update added in the 3.0.0 release that I missed at the time, which are the observable metrics.

Data 130
article thumbnail

The Future of Retail: Key Challenges and Opportunities

The Modern Data Company

Get to the Future Faster – Modernize Your Manufacturing Data Architecture Without Ripping and Replacing Implementing customer lifetime value as a mission-critical KPI has many challenges. Companies need consistent, high-quality data and a straightforward way to measure CLV. In the past, organizations have struggled to implement CLV as a practical, value-generating metric, but a new data solution could help.

Retail 97
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

KDnuggets News, February 1: The ChatGPT Cheat Sheet • An Introduction to Markov Chains

KDnuggets

The ChatGPT Cheat Sheet • An Introduction to Markov Chains • Top 10 Advanced Data Science SQL Interview Questions You Must Know How to Answer • How I Make $3,500 Online Every Month With Data Science • Hyperparameter Optimization: 10 Top Python Libraries

article thumbnail

YARN for Large Scale Computing: Beginner’s Edition

Analytics Vidhya

Introduction YARN stands for Yet Another Resource Negotiator. It is a powerful resource management system for a horizontal server environment. It is designed to be more flexible and generic than the original Hadoop MapReduce system, making it an attractive choice for companies looking to implement Hadoop. It allows companies to process data types and run […] The post YARN for Large Scale Computing: Beginner’s Edition appeared first on Analytics Vidhya.

Hadoop 229
article thumbnail

PySpark and vectorized User-Defined Functions

Waitingforcode

The Scala API of Apache Spark SQL has various ways of transforming the data, from the native and User-Defined Function column-based functions, to more custom and row-level map functions. PySpark doesn't have this mapping feature but does have the User-Defined Functions with an optimized version called vectorized UDF!

Scala 130
article thumbnail

Creating Health Plan Price Transparency in Coverage With the Lakehouse

databricks

What is price transparency and what challenges does it present? In the United States, health care delivery systems and health plans alike are.

Systems 113
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

Top Posts January 23-29: The ChatGPT Cheat Sheet

KDnuggets

The ChatGPT Cheat Sheet • ChatGPT as a Python Programming Assistant • How to Select Rows and Columns in Pandas Using [ ],loc, iloc,at and.

article thumbnail

Top 8 Interview Questions on Apache Sqoop

Analytics Vidhya

Introduction In this constantly growing technical era, big data is at its peak, with the need for a tool to import and export the data between RDBMS and Hadoop. Apache Sqoop stands for “SQL to Hadoop,” and is one such tool that transfers data between Hadoop(HIVE, HBASE, HDFS, etc.) and relational database servers(MySQL, Oracle, PostgreSQL, […] The post Top 8 Interview Questions on Apache Sqoop appeared first on Analytics Vidhya.

Hadoop 222
article thumbnail

Table file formats - reading path: Apache Hudi

Waitingforcode

After Delta Lake and Apache Iceberg it's time to see the reading part of Apache Hudi. Despite an apparent similarity with the aforementioned table formats, Apache Hudi has an interesting reading specificity related to the different table types.

IT 130
article thumbnail

Secure Shared Services with Data Streaming: OAuth, Client Quotas, and more

Confluent

Confluent’s fully managed, cloud data streaming platform introduces new security features: OAuth, Enhanced RBAC, and Cloud Client Quotas.

Cloud 103
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.