April 4, 2023

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

By Sam Hall

At phData we try to be at the forefront of new technologies and capabilities that might benefit our customers. Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole.

What is This New Feature?

Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format. Amazon S3 is an object storage service from Amazon Web Services (AWS) that offers industry-leading scalability, data availability, security, and performance. 

Apache Iceberg is a widely supported open-source data format that offers atomic, consistent, isolated, and durable (ACID) transactions for data lakes. Fivetran is the automated data movement platform, anonymizing personally identifiable information (PII) while cleansing, normalizing, and automatically loading data into the lake.  

With expansive storage capacity and support for multiple data formats, the data lake is a popular destination for teams doing analysis on massive data sets or running extensive data science projects that fuel their business.

Hundreds of thousands of data lakes run on top of Amazon S3 and, of the many enterprise teams that have already put them to work, a majority cite enhanced business agility, improvement in developing products and services, and enhancing customer service and engagement as benefits of data lakes.

As organizations continue to leverage data lakes to run analytics and extract insights from their data, progressive marketing intelligence teams are demanding more of them, and solutions like Amazon S3 and automated pipeline support are meeting that demand.

Instead of focusing on all the manual steps required to ingest data, cleanse it, prepare it for usage, hash and block sensitive data, and then start querying it, modern organizations see great value in reducing data lake management efforts through pipeline automation and governance. 

A diagram titled, "Data ingestion into Amazon S3 with Fivetran and Iceberg" that shows how the new feature will simplify data ingestion.

Why We Think This Feature is a Big Deal

Fivetran’s support of the Apache Iceberg format on Amazon S3 as a target opens up an entirely new set of possibilities for data storage and integration. Additionally, it makes Iceberg more accessible to users of the modern data stack. 

Customers who don’t necessarily want to put their data directly into a data warehouse like the Snowflake Data Cloud can now use Fivetran to build a performant, governed, managed dataset on top of S3 which can still be efficiently queried and manipulated from within their query engine of choice. 

This nicely complements modern data architectures around a cloud data warehouse like Snowflake. Due to regulatory or other constraints, some organizations prefer to (or must) store data in open formats or external storage like S3. 

For such use cases, Snowflake supports unlocking the potential of Apache Iceberg via their native Iceberg Tables. These offer fast performance and familiar query semantics while allowing customers to manage their cloud storage, making them ideal for use cases requiring full DML, ACID compliance, and many other Snowflake platform features. 

Through this new destination, Fivetran and Snowflake can be used to eliminate common barriers to realizing the true value of data by allowing open, secure, and performant sharing of complex datasets both internally and externally.

Conclusion

Overall, we’re very excited about Fivetran’s support for Amazon S3 with Apache Iceberg and will keep you informed with more updates on this feature, as well as future ones from Fivetran.

As Fivetran’s Partner of the Year, phData has a celebrated track record of helping businesses of all sizes unlock Fivetran’s potential with our data and analytics consulting services. If you’re curious about harnessing a platform like Fivetran, reach out to phData to learn how we can help.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit