article thumbnail

Brief History of Data Engineering

Jesse Anderson

Doug Cutting took those papers and created Apache Hadoop in 2005. They were the first companies to commercialize open source big data technologies and pushed the marketing and commercialization of Hadoop. Hadoop was hard to program, and Apache Hive came along in 2010 to add SQL. They eventually merged in 2012.

article thumbnail

Functional Data Engineering - A Blueprint

Data Engineering Weekly

Hadoop put forward the schema-on-read strategy that leads to the disruption of data modeling techniques as we know until then. We went through a full cycle that “schema-on-read ” led to the infamous GIGO (Garbage In, Garbage Out) problem in data lakes, as noted in this What Happened To Hadoop retrospect.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Industry Interview Series- How Big Data is Transforming Business Intelligence?

ProjectPro

Solocal has taken big data to the next stage of BI by designing a novel vision of BI with the open source distributed computing framework Hadoop. It replaced its traditional BI structure by integrating big data and Hadoop."-April In BI – there is a need to use ETL on top of Hadoop as there is not much scripting.

article thumbnail

Hadoop 2.0 (YARN) Framework - The Gateway to Easier Programming for Hadoop Users

ProjectPro

Hadoop (Hadoop 1.0) has progressed from a more restricted processing model of batch oriented MapReduce jobs to developing specialized and interactive processing models (Hadoop 2.0). With the advent of Hadoop 2.0, In this piece of writing we provide the users an insight on the novel Hadoop 2.0 to Hadoop 2.0.

Hadoop 40
article thumbnail

Cloud Native: What It Means in the Data World

Rockset

Hadoop and RocksDB are two examples I’ve had the privilege of working on personally. The falling price of SATA disks in the early 2000s was one major factor for the popularity of Hadoop, because it was the only software that could cobble together petabytes of these disks to provide a large-scale storage system.

Cloud 40
article thumbnail

Big Data Timeline- Series of Big Data Evolution

ProjectPro

2005 - The tiny toy elephant Hadoop was developed by Doug Cutting and Mike Cafarella to handle the big data explosion from the web. Hadoop is an open source solution for storing and processing large unstructured data sets. Hadoop is an open source solution for storing and processing large unstructured data sets.

article thumbnail

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

Apache Spark is also quite versatile, and it can run on a standalone cluster mode or Hadoop YARN , EC2, Mesos, Kubernetes, etc. You can also access data through non-relational databases such as Apache Cassandra, Apache HBase, Apache Hive, and others like the Hadoop Distributed File System. Apache CouchDB Source: idroot.us