Remove pyspark pyspark-pyspark-zip-story read
article thumbnail

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

Read them later using their “path”. The Implementation After reading one line or two about the available data processing tools in AWS, I chose to build a data pipeline with Lambda and Glue as data processing components, S3 as storage, and a local Airflow to orchestrate everything. S3 is AWS’ blob storage. Extract questions from PDF.

AWS 79