Trending Articles

article thumbnail

Testing DuckDB’s Large Than Memory Processing Capabilities.

Confessions of a Data Guy

I am a glutton for punishment, a harbinger of tidings, a storm crow, a prophet of the data land, my sole purpose is to plumb the depths of the tools we use every day in Data Engineering. I find the good, the bad, the ugly, and splay them out before you, string ’em up and […] The post Testing DuckDB’s Large Than Memory Processing Capabilities. appeared first on Confessions of a Data Guy.

Process 113
article thumbnail

7 Computer Vision Projects for All Levels

KDnuggets

Each project, from beginner tasks like Image Classification to advanced ones like Anomaly Detection, includes a link to the dataset and source code for easy access and implementation.

Project 124
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Unapologetically Technical Episode 14 – Cliff Crosland

Jesse Anderson

Unapologetically Technical’s newest episode is now live! In this episode of Unapologetically Technical, I interview Cliff Crosland, the co-founder and CEO of Scanner.dev. Cliff Crosland is a data engineer passionate about helping people wrangle massive log volumes. He sees logs as a treasure trove of insights and believes effective log analysis is critical in today’s complex systems.

article thumbnail

Robinhood Reports Third Quarter 2024 Results

Robinhood

Robinhood Markets, Inc. (Nasdaq: HOOD) today reported financial results for the quarter ended September 30, 2024. Read our Q3 2024 earnings press release here. Access more information at investors.robinhood.com. The post Robinhood Reports Third Quarter 2024 Results appeared first on Robinhood Newsroom.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

#ClouderaLife Employee Spotlight: Julia Ostrowski

Cloudera

In this Employee Spotlight, we sat down with Julia Ostrowski to learn about her time at Cloudera, what she loves about her job, her experience on both sides of Cloudera’s mentorship program, and her impressive volunteer work. Meet Julia Ostrowski Julia is the Director of Enterprise Entitlement at Cloudera and has been with the company since 2019, joining via Hortonworks.

Food 71
article thumbnail

Skip Lines of CSV files with DuckDB and Polars

Confessions of a Data Guy

There are some things you don’t need until you need them. I ran into that situation recently with needing to process some CSV / Flatfiles on short notice. At first, it appeared to be easy, but then I realized, as usual, there was a little monkey wrench thrown into the middle of it. It is […] The post Skip Lines of CSV files with DuckDB and Polars appeared first on Confessions of a Data Guy.

Process 130

More Trending

article thumbnail

Tools for the Next Era: The Modern Marketing Data Stack 2025

Snowflake

The stage is set for a new era in marketing, and marketers have never had so much data and technology at their fingertips. But to deliver the ROI that enterprises require today, marketers must have a strategic mindset and fine-tune the tools, tactics and approaches in their marketing data stack. Snowflake is here to help marketers evolve and accelerate their marketing impact with our third annual Modern Marketing Data Stack report and global virtual event.

Food 74
article thumbnail

Open Source Security at Databricks

databricks

The Databricks Product Security team is deeply committed to ensuring the security and integrity of its products, which are built on top of.

IT 93
article thumbnail

Differential Backups in MyRocks Based Distributed Databases at Uber

Uber Engineering

Learn about how the Storage team at Uber significantly reduced costs and improved speed for backups of its Petabyte-scale, MyRocks-based distributed databases by devising a Differential Backups solution.

article thumbnail

Modern Data Architecture: Data Mesh and Data Fabric 101

Precisely

Key Takeaways: Data mesh is a decentralized approach to data management, designed to shift creation and ownership of data products to domain-specific teams. Data fabric is a unified approach to data management, creating a consistent way to manage, access, and share data across distributed environments. Both approaches empower your organization to be more agile, data-driven, and responsive so you can make informed decisions in real time.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Building Interactive Data Science Applications with Python

KDnuggets

Using Python to build engaging and interactive applications where users can pass in an input, get and feedback and make use of multimedia elements such as images, videos, and audio.

Python 108
article thumbnail

Win the CSP & MSP Markets by Leveraging Confluent’s Data Streaming Platform and OEM Program

Confluent

Deploying Confluent Platform in conjunction with Confluent's OEM Program can help CSPs and MSPs develop high-margins, while maintaining operational excellence and lowering risk.

article thumbnail

Bringing Together Data Intelligence and Evaluation Intelligence: Databricks Ventures Invests in Galileo

databricks

Our customers say their biggest challenge in getting Generative AI from pilot to production is the " measurement problem." It's hard to.

Data 91
article thumbnail

Enabling Seamless Cloud Migration and Real-Time Data Integration for a Nonprofit Educational Healthcare Organization with Striim

Striim

A nonprofit educational healthcare organization is faced with the challenge of modernizing its critical systems while ensuring uninterrupted access to essential services. With Striim’s real-time data integration solution, the institution successfully transitioned to a cloud infrastructure, maintaining seamless operations and paving the way for future advancements.

52
article thumbnail

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

In an era where efficiency is king, are you leveraging the right tools to transform your manufacturing processes? A Manufacturing Execution System (MES) is critical for enhancing operational efficiency, reducing waste, and optimizing energy usage—key factors for improving your bottom line and lowering your carbon footprint. Join Nikhil Joshi, a manufacturing technology expert with 18+ years of hands-on experience, in this new webinar as he uncovers the secrets of MES and how to best utilize thes

article thumbnail

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

The rise of AI and GenAI has brought about the rise of new questions in the data ecosystem – and new roles. One job that has become increasingly popular across enterprise data teams is the role of the AI data engineer. Demand for AI data engineers has grown rapidly in data-driven organizations. But what does an AI data engineer do? What are they responsible for?

article thumbnail

Fine-Tuning GPT-4o

KDnuggets

Learn how to enhance GPT-4o performance for legal text clarification on your old laptop with just a few lines of code.

Coding 112
article thumbnail

New Snowflake Deployment: Mexico and South Korea Coming Soon

Snowflake

Snowflake is excited to announce a significant expansion of our AI Data Cloud infrastructure with support for Microsoft Azure Mexico by the end of Snowflake’s fiscal year, and support for Microsoft Azure in Seoul in the first half of 2025. These deployments underscore Snowflake’s continued commitment to providing our customers with a unified and secure experience, regardless of where their data resides.

article thumbnail

Announcing General Availability: Publish to Microsoft Power BI Service from Unity Catalog

databricks

We're excited to announce the General Availability of Publish to Microsoft Power BI Service from Unity Catalog, an integration that makes it easy.

BI 77
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Diff Authoring Time: Measuring developer productivity at Meta

Engineering at Meta

At Meta, we’re always looking for ways to enhance the productivity of our engineers and developers. But how exactly do you measure developer productivity? On this episode of the Meta Tech Podcast Pascal Hartig ( @passy ) sits down with Sarita and Moritz , two engineers at Meta who have been working on Diff Authoring Time (DAT) – a method for measuring how long it takes to submit changes to a codebase.

article thumbnail

Upgrading Uber’s MySQL Fleet  to version 8.0

Uber Engineering

Learn all about our journey of successfully upgrading our MySQL fleet at Uber from v5.7 to v8.0, enhancing performance and reliability.

MySQL 74
article thumbnail

How to Fine-Tune T5 for Question Answering Tasks with Hugging Face Transformers

KDnuggets

Fine-tuning the T5 model for question answering tasks is simple with Hugging Face Transformers: provide the model with questions and context, and it will learn to generate the correct answers.

IT 76
article thumbnail

Retain Customers with Faster, Friendlier Claims: 4 Strategies for Insurers

Precisely

Key Takeaways: In the insurance industry, customer satisfaction has a direct impact on your bottom line. Efficient claims processing and transparent communications are key to customer satisfaction. To streamline the claims process and enhance the customer experience, you must adopt automation, self-service, and omnichannel communication solutions. In 2024, property claims customer satisfaction (CSAT) has reached its lowest point in seven years, according to a recent J.D.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Aimpoint Digital: Leveraging Delta Sharing for Secure and Efficient Multi-Region Model Serving in Databricks

databricks

When serving machine learning models, the latency between requesting a prediction and receiving a response is one of the most critical metrics for.

article thumbnail

Data Engineering Weekly #195

Data Engineering Weekly

Astasia Myers: The three components of the unstructured data stack LLMs and vector databases significantly improved the ability to process and understand unstructured data. I never thought of PDF as a self-contained document database, but that seems a reality that we can’t deny. The blog is an excellent summary of the existing unstructured data landscape.

article thumbnail

Webinar: DataOps For Beginners – 2024

DataKitchen

“That should take two hours, not two months. Can’t your Data & Analytics Team go any faster?” “The executives’ dashboard broke! The data’s wrong! Can I ever trust our data?” If you’ve ever heard (or had) these complaints about speed-to-insight or data reliability, you should watch our webinar, DataOps for Beginners, on demand. DataKitchen’s VP Gil Benghiat breaks down what DataOps is (spoiler: it’s not just DevOps for data) and how DataOps can take your Data & Analytics factory fro

Data 52
article thumbnail

10 Useful Python One-Liners for Data Cleaning

KDnuggets

Here are some useful Python one-liners for common data cleaning tasks.

Python 124
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Continuous deployment for large monorepos

Uber Engineering

In this blog, we share how we reimagined CD at Uber to improve deployment automation and UX of managing microservices, while tackling the peculiar challenges of working with large monorepos.

article thumbnail

Unlocking FHIR for Data and AI in a Meaningful Way

databricks

Discover how the Databricks and XponentL partnership is allowing customers to unlock their FHIR needs. Learn more about dbignite. Imagine you’re feeling.

Data 72
article thumbnail

Product Management in the Dynamic World of Data Streaming

Confluent

See how Product Manager Surabhi Singh handles the ever-changing world of data streaming, improves platform features for customers, and chases her career ambitions.

article thumbnail

Data Migration to the Cloud: Benefits and Best Practices

Precisely

Key Takeaways: Cloud migration enhances agility, cuts operational costs, and helps you stay compliant with evolving regulations. Maintaining data integrity during cloud migration is essential to ensure reliable and high-quality data for better decision-making and future use in advanced applications. Partner with the right providers that offer both technical tools and expertise within your industry and use cases.

Cloud 59
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.