7 Python Libraries Every Data Engineer Should Know
KDnuggets
APRIL 25, 2024
Interested in switching to data engineering? Here’s a list of Python libraries you’ll find super helpful.
This site uses cookies to improve your experience. By viewing our content, you are accepting the use of cookies. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country we will assume you are from the United States. View our privacy policy and terms of use.
KDnuggets
APRIL 25, 2024
Interested in switching to data engineering? Here’s a list of Python libraries you’ll find super helpful.
Snowflake
APRIL 17, 2024
In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. Recognizing this shift, Snowflake is taking a Python-first approach to bridge the gap and help users leverage the power of both worlds.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication
Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications
From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success
Understanding User Needs and Satisfying Them
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know
Data Engineering Podcast
JUNE 25, 2023
Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. Can you describe what SQLMesh is and the story behind it?
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication
Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications
From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success
Understanding User Needs and Satisfying Them
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know
Confessions of a Data Guy
FEBRUARY 26, 2023
Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.
Christophe Blefari
JANUARY 20, 2024
Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. Who are the data engineers?
Ascend.io
SEPTEMBER 14, 2023
The rise of data-intensive operations has positioned data engineering at the core of today’s organizations. As the demand to efficiently collect, process, and store data increases, data engineers have started to rely on Python to meet this escalating demand. Why Python for Data Engineering?
Simon Späti
OCTOBER 19, 2022
Will Rust kill Python for Data Engineers? But then again, you have to ask: was Python made for Data Engineering in the first place? Let’s explore why Rust has potential for data engineers, what it does well and why it has become the most loved programming language for 7 years running.
Simon Späti
OCTOBER 19, 2022
Will Rust kill Python for Data Engineers? But then again, you have to ask: was Python made for Data Engineering in the first place? Let’s explore why Rust has potential for data engineers, what it does well and why it has become the most loved programming language for 7 years running.
Analytics Vidhya
JUNE 20, 2023
Introduction In today’s data-driven world, organizations across industries are dealing with massive volumes of data, complex pipelines, and the need for efficient data processing.
Towards Data Science
NOVEMBER 4, 2023
Platform Specific Tools and Advanced Techniques Photo by Christopher Burns on Unsplash The modern data ecosystem keeps evolving and new data tools emerge now and then. In this article, I want to talk about crucial things that affect data engineers. Are your data pipelines efficient? Data warehouse exmaple.
Jesse Anderson
DECEMBER 12, 2022
They were the first companies to commercialize open source big data technologies and pushed the marketing and commercialization of Hadoop. With an immutable file system like HDFS, we needed scalable databases to read and write data randomly. Apache Kafka came in 2011 and gave the industry a much better way to move real-time data.
Analytics Vidhya
FEBRUARY 6, 2023
This ensures easy […] The post What are Data Access Object and Data Transfer Object in Python? Especially while working with databases, it is often considered a good practice to follow a design pattern. appeared first on Analytics Vidhya.
Data Engineering Podcast
FEBRUARY 5, 2023
In that time there have been a number of generational shifts in how data engineering is done. Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? __init__ covers the Python language, its community, and the innovative ways it is being used.
Data Engineering Weekly
MARCH 17, 2024
Compliance is mandatory, with strict penalties for violations, emphasizing the importance of data scientists familiarizing themselves with the law to avoid prohibited AI uses and ensure ethical, safe AI development. It discusses the significance of data governance, sharing history, and generative AI's impact on data economy standards.
Knowledge Hut
DECEMBER 26, 2023
Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? What is Data Science?
Towards Data Science
OCTOBER 21, 2023
Advanced ETL techniques for beginners Continue reading on Towards Data Science »
Data Engineering Podcast
JULY 2, 2023
In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features. What is feature engineering is and why/to whom it matters?
Waitingforcode
FEBRUARY 3, 2023
In this blog post I'll share with you a list of Java and Scala classes I use almost every time in data engineering projects. The part for Python will follow next week! We all have our habits and as programmers, libraries and frameworks are definitely a part of the group.
Confessions of a Data Guy
APRIL 16, 2023
You might think […] The post DuckDB vs Polars for Data Engineering. appeared first on Confessions of a Data Guy. I haven’t seen this since Databricks and Snowflake first came out and started throwing mud at each other.
Confessions of a Data Guy
SEPTEMBER 9, 2023
In the vast world of data, it’s not just about gathering and analyzing information anymore; it’s also about ensuring that data pipelines, processes, and platforms run seamlessly and efficiently.
Analytics Vidhya
FEBRUARY 6, 2023
Introduction While working with multiple projects, there are chances of issues with versions of packages in python; for example, a project needs a new version of a package, and another requires a different version. Sometimes the python version itself changes from project to project.
Seattle Data Guy
FEBRUARY 11, 2023
Apache Airflow is a very popular tool that data engineers rely on. Why do data engineers like Airflow? What are… Read more The post What Is Apache Airflow – Data Engineering Consulting appeared first on Seattle Data Guy. Also, what does Apache Airflow event do? What is a DAG?
Confessions of a Data Guy
OCTOBER 6, 2023
I wring my hands sometimes, wishing that things and technologies somehow come together into some bubbling […] The post The Ultimate Data Engineering Chadstack. appeared first on Confessions of a Data Guy. Running Rust inside Apache Airflow.
Data Engineering Weekly
FEBRUARY 18, 2024
RudderStack is the Warehouse Native CDP, built to help data teams deliver value across the entire data activation lifecycle, from collection to unification and activation. Our hope is only with the amazing community of data practitioners who constantly support us. We are so over the Big Data Era to Modern Data Stack.
Towards Data Science
MAY 22, 2023
Solving data preparation tasks with ChatGPT Photo by Ricardo Gomez Angel on Unsplash Data engineering makes up a large part of the data science process. In CRISP-DM this process stage is called “data preparation”. It comprises tasks such as data ingestion, data transformation and data quality assurance.
Snowflake
OCTOBER 23, 2023
One of our goals at Snowflake is to ensure we continue to deliver a best-in-class platform for Python developers. Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake.
Data Engineering Weekly
MARCH 31, 2024
Intuit: How Intuit data analysts write SQL 2x faster with the internal GenAI tool The productivity increase with GenAI is undeniable, and several startups are trying to solve the Text2SQL generation problem. My key highlight is that Excellent data documentation and “clean data” improve results.
Towards Data Science
AUGUST 19, 2023
How I made the transition to an analytics engineer Photo by Campaign Creators on Unsplash A few years ago, I was at a point where I was feeling unfulfilled in my career. I had been working in data engineering for three years and the initial excitement of starting in the world of tech had faded.
Confessions of a Data Guy
MARCH 25, 2024
Ever wondered how to build and end-to-end project for an Open Source Python Package that gets published to PYPI? link] The post How To Build and Open Source PYPI Python Package appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
FEBRUARY 25, 2024
I love to write Rust … but I deploy Python. Even when I know I […] The post Why I Love Rust, but Deploy Python appeared first on Confessions of a Data Guy. I’m not sure if others have this same problem, maybe they are lucky, they get to build in their favorite language 24/7, it’s their tool of choice.
Cloudera
JULY 13, 2021
After the launch of CDP Data Engineering (CDE) on AWS a few months ago, we are thrilled to announce that CDE, the only cloud-native service purpose built for enterprise data engineers, is now available on Microsoft Azure. . CDP data lifecycle integration and SDX security and governance. Easy job deployment.
Lyft Engineering
MARCH 6, 2024
DEED In this post, we’ll cover how Lyft upgrades Python at scale — 1500+ repos spanning 150+ teams — and the latest iteration of the tools and strategy we’ve built to optimize both the overall time to upgrade and the work required from our engineers. Python, How Do I L̶o̶v̶e̶ Use Thee? Everything starts with data.
Start Data Engineering
OCTOBER 11, 2021
Leetcode: data structures and algorithms 4. Data modeling 4.1 Data warehousing 4.2 Data pipelines 6. Introduction Skills 1. Distributed system fundamentals 7. Event streaming 8. System design 9. Business questions 10. Cloud computing 11.
Towards Data Science
DECEMBER 4, 2023
A Glossary with Use Cases for First-Timers in Data Engineering An happy Data Engineer at work Are you a data engineering rookie interested in knowing more about modern data infrastructures? In this guide Data Engineering meets Formula 1. I bet you are, this article is for you!
Data Engineering Podcast
JANUARY 30, 2022
Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. The only thing worse than having bad data is not knowing that you have it.
Knowledge Hut
FEBRUARY 1, 2024
Variables in Python are fundamental containers used for storing and manipulating data in a program. In Python programming, variables are the backbone of data manipulation and program logic. They hold and transform data, allowing for the execution of algorithms and the management of large datasets.
Towards Data Science
MAY 11, 2023
As its name suggests, DOP puts data first and foremost. or general-purpose languages (Python, JavaScript). Whereas the author illustrates his examples using JavaScript and Java, this article attempts to demonstrate the ideas in Python. This can be achieved by adhering to four main principles.
Knowledge Hut
MAY 3, 2023
Did you know that data is now an essential component of modern business operations? With companies increasingly relying on data-driven insights to make informed decisions, there has never been a greater need for skilled specialists who can manage and evaluate vast amounts of data.
Knowledge Hut
JUNE 26, 2023
Welcome to the world of data engineering, where the power of big data unfolds. If you're aspiring to be a data engineer and seeking to showcase your skills or gain hands-on experience, you've landed in the right spot. What are Data Engineering Projects?
Christophe Blefari
MARCH 2, 2024
Mistral ( credits ) Hello all, this is the Data News, this week edition might be smaller than usual in term of comments as I'm working on a Data News related project that takes me a bit of time, which will probably lead to a series of articles. In 2024 we are more than ever tools to move data from sources to destinations.
Knowledge Hut
MARCH 5, 2024
Data engineers are highly in demand and short in supply. Data engineering is one of the hottest jobs that is trending across the globe. Singapore has a thriving technical market that has been on the lookout for data engineers. Who is Data Engineer and What Do They Do?
Snowflake
SEPTEMBER 18, 2023
As data continues to become more complex, it is critical to have effective ways to present this information. With the explosion of AI/ML, users want to be able to interact with their data and ML models. However, building such data apps has not been easy. No front-end experience is needed and apps are written in pure Python.
Data Engineering Podcast
MAY 22, 2022
Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the entire data flow. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.
Data Engineering Podcast
JULY 10, 2022
Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of information workflows.
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content