Remove product spark
article thumbnail

Automating Your Production Dataflows On Spark

Data Engineering Podcast

Sean Knapp founded Ascend to address the operational challenges of running a production grade and scalable Spark infrastructure, allowing data engineers to focus on the problems that power their business. Can you describe any limitations that are imposed by your selection of Spark as the processing engine?

article thumbnail

Developing Production Level Databricks Pipelines.

Confessions of a Data Guy

A question that comes up often … “How do I develop Production Level Databricks Pipelines?” ” Or maybe someone just has a feeling that using Notebooks all day long is expensive and ends up being an unreliable way to produce Databricks Spark + Delta Lake pipelines that run well … without error.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to use the DockerOperator

Marc Lamberti

How to use the DockerOperator We will use the DockerOperator in this example to run a Spark job. For that, you need a Dockerfile: FROM bde2020/spark-python-template:3.3.0-hadoop3.3 jar /spark/jars/ && mv aws-java-sdk-bundle-1.11.1026.jar In production, it will be a service like AWS ECR. alias("quote")).select("timestamp",

AWS 130
article thumbnail

Apache Spark Use Cases & Applications

Knowledge Hut

Apache Spark was developed by a team at UC Berkeley in 2009. Since then, Apache Spark has seen a very high adoption rate from top-notch technology companies like Google, Facebook, Apple, Netflix etc. According to marketanalysis.com survey, the Apache Spark market worldwide will grow at a CAGR of 67% between 2019 and 2022.

Scala 52
article thumbnail

You’re Invited: Innovate Through Data Virtual Summit

Speaker: Logi Analytics

Join us on October 19th & 20th for Logi Spark 2021, the premier event dedicated to helping application teams create engaging state-of-the-art analytics. At this free virtual event, your team will learn practical tips from the pros to help turn your product roadmap into a reality and generate value for your end users.

article thumbnail

Data News — Week 24.08

Christophe Blefari

Spark future — I'm convinced that Apache Spark will have to transform itself if it is not to disappear (disappear in the sense of Hadoop, still present but niche). Turning ideas into AI use cases — the Product Manager point of view. But for sure I'll add Arrow in the v2. Data will not tell you what to do.

Data Lake 130
article thumbnail

Data News — Week 23.40

Christophe Blefari

Goodbye Spark. Hello Polars + Delta Lake — Spark is under attack. In the last years Spark has been powering a lot of data use cases but with the modern data stack and more recently with DuckDB, Polars and smaller size OLAP technologies it allows a new way to do data processing. Contentsquare acquires Heap.

Python 130