Big Data File Formats, Explained

Parquet vs ORC vs AVRO vs JSON. Which one to choose and how to use them?

💡Mike Shakhomirov
Towards Data Science
9 min readFeb 28, 2023


Photo by James Lee on Unsplash

I’m a big fan of data warehouse (DWH) solutions with ELT-designed (Extract-Load-Transform) data pipelines. However, at some point, I faced the requirement to process raw event data in Cloud Storage and had to choose the file format for data files.

