article thumbnail

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

release of PostGreSQL had on the design of the project? release of PostGreSQL had on the design of the project? Can you start by explaining what Timescale is and how the project got started? The landscape of time series databases is extensive and oftentimes difficult to navigate. What impact has the 10.0 What impact has the 10.0

article thumbnail

15+ Must Have Data Engineer Skills in 2023

Knowledge Hut

Kafka Kafka is one of the most desired open-source messaging and streaming systems that allows you to publish, distribute, and consume data streams. Kafka, which is written in Scala and Java, helps you scale your performance in today’s data-driven and disruptive enterprises.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Python for Data Engineering

Ascend.io

compute() Data Storage Python extends its mastery to data storage, boasting smooth integrations with both SQL and NoSQL databases. Be it PostgreSQL, MySQL, MongoDB, or Cassandra, Python ensures seamless interactions. Use Case: Storing data with PostgreSQL (example) import psycopg2 conn = psycopg2.connect(dbname="mydb",

article thumbnail

Data Engineering Glossary

Silectis

Kafka Apache Kafka is the Apache Foundation’s open-source software platform for streaming. NoSQL A non-relational database Open Source Software that is available to freely use and modify Parquet A column-oriented data storage format that’s part of the Hadoop ecosystem. HDFS stands for Hadoop Distributed File System.

article thumbnail

Why Mutability Is Essential for Real-Time Data Analytics

Rockset

A platform such as Apache Kafka/Confluent , Spark or Amazon Kinesis for publishing that stream of event data. Traditionally, this information would be stored in transactional databases — Oracle Database , MySQL , PostgreSQL , etc. because they allow for mutability: Any field stored in these transactional databases is updatable.

article thumbnail

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Rockset

DynamoDB has been one of the most popular NoSQL databases in the cloud since its introduction in 2012. As opposed to a traditional RDBMS like PostgreSQL, DynamoDB scales horizontally, obviating the need for careful capacity planning, resharding, and database maintenance. AWS Glue is a fully managed ETL service that lets us do both.

NoSQL 52
article thumbnail

Top 20+ Big Data Certifications and Courses in 2023

Knowledge Hut

Big Data Frameworks : Familiarity with popular Big Data frameworks such as Hadoop, Apache Spark, Apache Flink, or Kafka are the tools used for data processing. Implement ETL & Data Pipelines with Bash, Airflow & Kafka; architect, populate, deploy Data Warehouses; create BI reports & interactive dashboards.