Events, Metadata, Relational Database and Structured Data

Events

Metadata

Relational Database

Structured Data

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Instead of relying on traditional hierarchical structures and predefined schemas, as in the case of data warehouses, a data lake utilizes a flat architecture. This structure is made efficient by data engineering practices that include object storage. Watch our video explaining how data engineering works.

Data Lake

Data Lake Architecture IT Amazon Web Services

How Windward Built Real-Time Logistics Tracking and AI Insights for the Maritime Industry

Rockset

AUGUST 2, 2023

The Windward Maritime AI platform Lastly, Windward wanted to move their entire platform from batch-based data infrastructure to streaming. This transition can support new use cases that require a faster way to analyze events that was not needed until now. They used MongoDB as their metadata store to capture vessel and company data.

Database-centric

Database-centric PostgreSQL Transportation Insurance

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

An Engineering Guide to Data Creation - A Data Contract perspective - Part 1

Data Engineering Weekly

MARCH 24, 2023

Data engineering starts to add value to the business by capturing events at each step of the business process. The events are then further enriched and analyzed to bring visibility to business operations. Architectural patterns for Data Creation There are three types of architecture patterns in data creation.

Engineering

Engineering Data Transportation Database

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Data Collection for Machine Learning: Steps, Methods, and Best Practices

AltexSoft

JUNE 26, 2023

From the perspective of data science, all miscellaneous forms of data fall into three large groups: structured, semi-structured, and unstructured. Key differences between structured, semi-structured, and unstructured data. Note, though, that not any type of web scraping is legal.

Data Collection

Data Collection Machine Learning Unstructured Data Non-relational Database

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

This means that a data warehouse is a collection of technologies and components that are used to store data for some strategic use. Data is collected and stored in data warehouses from multiple sources to provide insights into business data. Data from data warehouses is queried using SQL.

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

HDFS master-slave structure. A HDFS Master Node, called a NameNode , keeps metadata with critical information about system files (like their names, locations, number of data blocks in the file, etc.) and keeps track of storage capacity, a volume of data being transferred, etc. Data management and monitoring options.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Sqoop vs. Flume Battle of the Hadoop ETL tools

ProjectPro

OCTOBER 28, 2015

Hadoop Sqoop and Hadoop Flume are the two tools in Hadoop which is used to gather data from different sources and load them into HDFS. Sqoop in Hadoop is mostly used to extract structured data from databases like Teradata, Oracle, etc., They enable the connection of various data sources to the Hadoop environment.

ETL Tools

ETL Tools Hadoop Relational Database Unstructured Data

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

What is data fabric? A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., How data fabric works.

Architecture

Architecture Metadata Data Lake Machine Learning

Implementing the Netflix Media Database

Netflix Tech

DECEMBER 14, 2018

A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve. NMDB is built to be a highly scalable, multi-tenant, media metadata system that can serve a high volume of write/read throughput as well as support near real-time queries.

Media

Media Database Metadata Data Schemas

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

NOVEMBER 15, 2021

It serves as a distributed processing engine for both categories of data streams: unbounded and bounded. Support for stream and batch processing, comprehensive state management, event-time processing semantics, and consistency guarantee for the state are just a few of Flink's capabilities. However, Trino is not limited to HDFS access.

Big Data

Big Data Project Metadata Programming Language

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

SEPTEMBER 23, 2021

An ETL approach in the DW is considered slow, as it ships data in portions (batches.) The structure of data is usually predefined before it is loaded into a warehouse, since the DW is a relational database that uses a single data model for everything it stores.

Architecture

Architecture Data Lake Unstructured Data Data Warehouse

50 PySpark Interview Questions and Answers For 2023

ProjectPro

NOVEMBER 22, 2021

StructType is a collection of StructField objects that determines column name, column data type, field nullability, and metadata. PySpark imports the StructType class from pyspark.sql.types to describe the DataFrame's structure. The uName and the event timestamp are then combined to make a tuple. appName('ProjectPro').getOrCreate()

Hadoop

Hadoop Python Datasets Metadata

Data Mesh Architecture: Concept, Main Principles, and Implementation

AltexSoft

JULY 19, 2022

In the last few decades, we’ve seen a lot of architectural approaches to building data pipelines , changing one another and promising better and easier ways of deriving insights from information. There have been relational databases, data warehouses, data lakes, and even a combination of the latter two.

Architecture

Architecture Data Lake Medical Datasets

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

AltexSoft

OCTOBER 8, 2021

Considered to be a leader in the field of data integration, Oracle Data Integrator (ODI) is a multi-functional solution that is part of Oracle’s data management ecosystem. The platform provides features for event-based , data-based, and service-based integration styles. Data profiling and cleansing.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Sqoop Interview Questions and Answers for 2023

ProjectPro

JUNE 23, 2016

Sqoop is compatible with all JDBC compatible databases. Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization Apache Sqoop uses Hadoop MapReduce to get data from relational databases and stores it on HDFS. It has a connector based architecture.

Hadoop

Hadoop MySQL Relational Database Java

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

What is unstructured data? Definition and examples Unstructured data , in its simplest form, refers to any data that does not have a pre-defined structure or organization. It can come in different forms, such as text documents, emails, images, videos, social media posts, sensor data, etc.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

Moreover, over 20 percent of surveyed companies were found to be utilizing 1,000 or more data sources to provide data to analytics systems. These sources commonly include databases, SaaS products, and event streams. Databases store key information that powers a company’s product, such as user data and product data.

IT Data Warehouse Data Governance Data Lake

Data Engineering Digest

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

How Windward Built Real-Time Logistics Tracking and AI Insights for the Maritime Industry

Webinars

Trending Sources

An Engineering Guide to Data Creation - A Data Contract perspective - Part 1

Webinars

Data Collection for Machine Learning: Steps, Methods, and Best Practices

Data Lake vs Data Warehouse - Working Together in the Cloud

Hadoop vs Spark: Main Big Data Tools Explained

Sqoop vs. Flume Battle of the Hadoop ETL tools

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

Implementing the Netflix Media Database

20 Best Open Source Big Data Projects to Contribute on GitHub

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

50 PySpark Interview Questions and Answers For 2023

Data Mesh Architecture: Concept, Main Principles, and Implementation

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

Sqoop Interview Questions and Answers for 2023

Unstructured Data: Examples, Tools, Techniques, and Best Practices

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Stay Connected