May, 2024

article thumbnail

Zenlytic Is Building You A Better Coworker With AI Agents

Data Engineering Podcast

Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Unfortunately this often turns into an exercise in frustration for everyone involved due to complex workflows and hard-to-understand dashboards. The team at Zenlytic have leaned on the promise of large language models to build an AI agent that lets you converse with your data.

Building 278
article thumbnail

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

1. Introduction 2. Project demo 3. TL;DR 4. Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Use DuckDB 4.4. Distributed systems are scalable, resilient to failures, & designed for high availability 4.5.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

This article was published on 14 March 2024 in The Pragmatic Engineer, for subscribers. I'm sharing this piece in public more than a month later, as it provides important context and analysis for the AI dev tools space. Subscribe to The Pragmatic Engineer to stay up-to-date on what is happening with software engineering, Big Tech, and startups.

article thumbnail

Where to Go Next in Your Data Career

KDnuggets

We are all looking for the right opportunities in our career. In the landscape of data-related careers, the roles can be grouped into classes, and future opportunities tend to follow natural migration paths between the class groups.

Data 151
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Introducing the Robinhood Crypto Trading API

Robinhood

Robinhood Crypto customers in the United States can now use our API to view crypto market data, manage portfolios and account information, and place crypto orders programmatically Today, we are excited to announce the Robinhood Crypto trading API , ushering in a new era of convenience, efficiency, and strategy for our most seasoned crypto traders. Robinhood Crypto customers in the United States can use our new trading API to set up advanced and automated trading strategies that allow them to st

Insurance 142
article thumbnail

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Rock the JVM

by Herbert Kateu 1. Introduction This article is a follow-up to the websocket article that was published previously. To recap, we created an in-memory chat application using WebSockets with the help of the Http4s library. The chat application had a variety of features implemented through commands directly in the chat window such as the ability to create users, create chat rooms, and switch between chat rooms.

Scala 136

More Trending

article thumbnail

Enable stakeholder data access with Text-to-SQL RAGs

Start Data Engineering

1. Introduction 2. TL;DR 3. Enabling Stakeholder data access with RAGs 3.1. Set up 3.1.1. Pre-requisite 3.1.2. Demo 3.1.3. Key terminology 3.2. Loading: Read raw data and convert them into LlamaIndex data structures 3.2.1. Read data from structured and unstructured sources 3.2.2. Transform data into LlamaIndex data structures 3.3. Indexing: Generate & store numerical representation of your data 3.

article thumbnail

Introducing Confluent Cloud Freight Clusters

Confluent

Confluent Cloud Freight clusters are now available in Early Access. In this blog, learn how Freight clusters can save you up to 90% at GBps+ scale.

Cloud 145
article thumbnail

5 Free University Courses to Learn Machine Learning

KDnuggets

Want to learn machine learning from the best of resources? Check out these free machine learning courses from the top universities of the world.

article thumbnail

Introducing Salesforce BYOM for Databricks

databricks

Salesforce and Databricks are excited to announce an expanded strategic partnership that delivers a powerful new integration - Salesforce Bring Your Own Model.

138
138
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Snowflake Announces Agreement to Acquire TruEra AI Observability Platform to Bring LLM and ML Observability to the AI Data Cloud 

Snowflake

Accelerating enterprise AI use cases into production is now a board-level priority for most companies. However, one of the key challenges in AI today is ensuring that those use cases are ready for real-life use and continue to perform at a high level in production. Not only must enterprises ensure accurate, reliable, and valuable results they must also address and mitigate critical issues like bias, hallucinations, and toxicity.

Cloud 126
article thumbnail

Mind the map: a new design for the London Underground map

ArcGIS

A modern take on the London tube map with updated accessible colours, a re-classification of lines by type, and line symbols scaled by frequency

Designing 134
article thumbnail

A Notebook is all I want or Don't

Data Engineering Weekly

The tweet received strong reactions on LinkedIn and Twitter. To clarify, I quoted it as a Notebook-style development, but it is not exactly a Notebook. There is a lot of context missing in that tweet, so I decided to write a blog about it. People have reservations about using tools like Jupytor Notebook for the production pipeline for a good reason.

article thumbnail

The right words in the right place

Tweag

tl;dr You may not believe it, but Nix documentation is getting better. Nixpkgs and NixOS still need more time. Table of contents Overview Motivation Statistics Retrospective Thoughts on future work Acknowledgements Overview This is a retrospective of my and many other people’s work on documentation in the Nix ecosystem between October 2022 and March 2024.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

5 Free MIT Courses to Learn Math for Data Science

KDnuggets

Learning math is super important for data science. Check out these free courses from MIT to learn linear algebra, statistics, and more.

article thumbnail

Announcing General Availability of Liquid Clustering

databricks

We’re excited to announce the General Availability of Delta Lake Liquid Clustering in the Databricks Data Intelligence Platform. Liquid Clustering is an innovative.

Data 130
article thumbnail

Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg

Snowflake

Today we’re excited to announce an expansion of our partnership with Microsoft to deliver a seamless and efficient interoperability experience between Snowflake and Microsoft Fabric OneLake, in preview later this year. This will enable our joint customers to experience bidirectional data access between Snowflake and Microsoft Fabric, with a single copy of data with OneLake in Fabric.

Metadata 125
article thumbnail

What’s New in ArcGIS Roads and Highways and ArcGIS Pipeline Referencing (May 2024)

ArcGIS

The latest release of ArcGIS Roads and Highways and ArcGIS Pipeline Referencing includes a variety of new and enhanced features.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Post-quantum readiness for TLS at Meta

Engineering at Meta

Today, the internet (like most digital infrastructure in general) relies heavily on the security offered by public-key cryptosystems such as RSA, Diffie-Hellman (DH), and elliptic curve cryptography (ECC). But the advent of quantum computers has raised real questions about the long-term privacy of data exchanged over the internet. In the future, significant advances in quantum computing will make it possible for adversaries to decrypt stored data that was encrypted using today’s cryptosystems.

Bytes 104
article thumbnail

Robinhood Response to Receipt of Wells Notice from the U.S. Securities and Exchange Commission

Robinhood

Robinhood Markets,Inc.(Nasdaq:HOOD) today announced that RHC has received a Wells Notice from the SEC Staff Earlier today we announced Robinhood Crypto (RHC) has received a Wells Notice from the U.S. Securities and Exchange Commission staff indicating they will recommend that the Commission file an enforcement action. “After years of good faith attempts to work with the SEC for regulatory clarity including our well-known attempt to ‘come in and register,’ we are disappointed that the agency has

101
101
article thumbnail

Top SQL Queries for Data Scientists

KDnuggets

SQL seems like a data science underdog compared to Python and R. However, it’s far from it. I’ll show you here how you can use it as a data scientist.

SQL 141
article thumbnail

Executive Overview: The Rise of Open Foundational Models

databricks

Moving generative AI applications from the proof of concept stage into production requires control, reliability and data governance. Organizations are turning to open.

article thumbnail

Prepare Now: 2025's Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Moving Beyond MTEB and BEIR: Snowflake AI Research Joins Forces with the University of Waterloo to Evolve RAG and Retrieval Benchmarks

Snowflake

To accurately answer business questions using LLMs, companies must augment models with their data. Retrieval Augmented Generation (RAG) is a popular solution to this problem, as it integrates the organization’s factual, real-time data into the prompt for the LLM. While the adoption of RAG has increased, an open question remains: How do enterprises know how effective their system is?

Cloud 119
article thumbnail

Join us at the Iceberg Summit 2024

Cloudera

Apache Iceberg is vital to the work we do and the experience that the Cloudera platform delivers to our customers. Iceberg, a high-performance open-source format for huge analytic tables, delivers the reliability and simplicity of SQL tables to big data while allowing for multiple engines like Spark, Flink, Trino, Presto, Hive, and Impala to work with the same tables, all at the same time.

article thumbnail

Latest Computer Science Research Topics for 2024

Knowledge Hut

Everybody sees a dream—aspiring to become a doctor, astronaut, or anything that fits your imagination. If you were someone who had a keen interest in looking for answers and knowing the “why” behind things, you might be a good fit for research. Further, if this interest revolved around computers and tech, you would be an excellent computer researcher!

article thumbnail

Introduction to the Export Attachments geoprocessing tool

ArcGIS

Learn about the new Export Attachments geoprocessing tool in ArcGIS Pro 3.3 and how it simplifies the process of exporting attachments.

Process 113
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

A Roadmap to Machine Learning Algorithm Selection

KDnuggets

The goal of this article is to help demystify the process of selecting the proper machine learning algorithm, concentrating on "traditional" algorithms and offering some guidelines for choosing the best one for your application.

article thumbnail

Introducing Databricks Assistant Autocomplete

databricks

We are excited to introduce Databricks Assistant Autocomplete now in Public Preview. This feature brings the AI-powered assistant to you in real-time, providing.

127
127
article thumbnail

Snowflake Cortex LLM Functions Moves to General Availability with New LLMs, Improved Retrieval and Enhanced AI Safety

Snowflake

Snowflake Cortex is a fully-managed service that enables access to industry-leading large language models (LLMs) is now generally available. You can use these LLMs in select regions directly via LLM Functions on Cortex so you can bring generative AI securely to your governed data. Your team can focus on building AI applications, while we handle model optimization and GPU infrastructure to deliver cost-effective performance.

article thumbnail

We’ll See You at the Gartner Data and Analytics Summit

Cloudera

The Gartner Data and Analytics Summit in London is quickly approaching on May 13 th to 15 th , and the Cloudera team is ready to hit the show floor! The theme of this year’s summit, “Generating Value Together: Creating Synergies between Data, Analytics & AI,” could not have come at a better time as we push forward on our AI and analytics journey together.

Banking 104
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.