Data Engineering Digest

tags delta-lake

Case Study: Matter Uses Rockset to Bring AI-Powered Sustainable Insights to Investors

Rockset

AUGUST 27, 2020

In several of these scenarios, both NoSQL databases and data lakes have been very useful because of their schemaless nature, variable cost profiles and scalability characteristics. This allows us to correct bad predictions made by the AI via our custom tagging app, tapping into the latest data ingested in our pipeline.

NoSQL

NoSQL Data Lake Portfolio Architecture

Data Vault 2.0 with dbt Cloud

dbt Developer Hub

JULY 2, 2023

Each house does not have a pipe directly from the local river: there is a dam and a reservoir to collect water for the city from all of the sources – the lakes, streams, creeks, and glaciers – before the water is redirected into each neighborhood and finally into each home’s taps. A new development in the city? No problem!

Cloud

Cloud Data Warehouse Data BI

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Costwiz: Saving cost for LinkedIn enterprise on Azure

LinkedIn Engineering

JULY 27, 2023

The Extract phase utilizes Azure Data Factory to manage data ingestion from sources like Azure Kusto Clusters, Delta Live Tables in Azure Databricks, LinkedIn's internal REST endpoints, and Azure Data Lake. Change tracking information driven watermarking: Pros: Pipeline idempotent, relies on source-provided delta records.

Metadata

Metadata Utilities Cloud Data Lake

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh. Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Apache Spark MLlib vs Scikit-learn: Building Machine Learning Pipelines

Towards Data Science

MARCH 9, 2023

When working with NLP applications it gets even deeper with stages like stemming, lemmatization, stop word removal, tokenization, vectorization, and part of speech tagging (POS tagging). It is perfectly possible to execute these steps using libraries like Pandas and NumPy or NLTK and SpaCy for NLP.

Machine Learning

Machine Learning Building Datasets Scala

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

Based on Tecton blog So is this similar to data engineering pipelines into a data lake/warehouse? Automation , because the same loader patterns are used for both and the same metadata tags are expected from both, meaning the applied date timestamp in the business vault will match up with the raw date timestamp where it came from.

Engineering

Engineering Raw Data Data Science Scala

What is Data Engineering? Everything You Need to Know in 2022

phData: Data Engineering

JANUARY 3, 2022

Data engineers allow an organization to efficiently and effectively collect data from various sources, generally storing that data into a data lake or into several Kafka topics. The ELT use case is commonly seen within data lake architectures or systems that need raw extracted data from multiple sources.

Data Engineering

Data Engineering Data Engineer Engineering Data Governance

The Good and the Bad of Apache Spark Big Data Processing

AltexSoft

JULY 18, 2023

In terms of storage and infrastructure, Spark’s versatility shines through its ability to work with various platforms, from Delta Lake to Elasticsearch to various popular database management systems. Just make sure to tag your question with ‘apache-spark’ to get the right attention.

Big Data

Big Data Data Process Process Hadoop

What is ETL Pipeline? Process, Considerations, and Examples

ProjectPro

NOVEMBER 30, 2021

Using Delta Lake, you can use Databricks notebooks and your SQL expertise to query your data lake. Review the data in your target data lake or data warehouse, look over logs for anomalies, schedule ETL processes, and set up notifications to examine the outcomes of the transfer from SSIS to Databricks.

Process

Process Data Pipeline Data Warehouse AWS

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

The biggest gain with using Git over Subversion is that your developer’s branching and tagging can be separate from the central repository. It’s also important to know what the delta is between your current code base and your deployed codebase. First off, you have to define a branching and tagging strategy which takes time to do well.

IT AWS Software Engineer Software Engineering

Case Study: Matter Uses Rockset to Bring AI-Powered Sustainable Insights to Investors

Data Vault 2.0 with dbt Cloud

Webinars

Trending Sources

Costwiz: Saving cost for LinkedIn enterprise on Azure

Webinars

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Apache Spark MLlib vs Scikit-learn: Building Machine Learning Pipelines

Data Vault on Snowflake: Feature Engineering and Business Vault

What is Data Engineering? Everything You Need to Know in 2022

The Good and the Bad of Apache Spark Big Data Processing

What is ETL Pipeline? Process, Considerations, and Examples

DataOps: What Is It, Core Principles, and Tools For Implementation

Stay Connected