4 ELT Alternatives To Airbyte – How To Ingest Your Data

4 ELT Alternatives To Airbyte – How To Ingest Your Data

May 8, 2024 data engineering 0
airbyte alternatives

Getting data out of source systems and into a data warehouse or data lake is one of the first steps in making it usable by analysts and data scientists.

The question is how will your team do that?

Will they write custom data connectors, pay for a data connector out of the box or perhaps use an open source solution.

If you choose open source, then you’ll likely be going with Airbyte. It’s one of the few open-source data connectors you’ll find.

But perhaps you’re looking for an alternative to Airbyte. If you are looking for an alternative to Airbyte, we’ll be discussing a few below. Although most if not all aren’t open-source.

Before diving into alternatives, let’s first talk about what Airbyte is.

What is Airbyte?

Airbyte is an open-source data pipeline platform that serves as an alternative to Stitch Data and Fivetran. Though existing data pipeline platforms offer a significant number of integrations with well-regarded sources like Stripe and Salesforce, there is a gap in the current model that leaves out small service integrations.

Airbyte solves this problem by building and maintaining connectors while fostering a community of users who benefit from one another’s custom connectors. It’s common practice for companies to build custom connectors to support their applications. Airbyte’s open-source model creates a community wherein companies can support one another by building and maintaining their unique connectors.

Connectors on Airbyte run in Docker containers, allowing for independent operating. You can easily monitor each of your connectors, refresh them as needed, and schedule updates. Airbyte first certifies new connectors to ensure they’re ready for production; currently, there are over 46 connectors available. Already, more than 250 companies are benefitting from this open-source data pipeline platform.

To Open Source Or Not

One of the major challenges when it comes to finding alternatives to Airbyte is if you really want to stay open-source. There aren’t that many open-source data connector solutions. Estuary provides an open source version but it’s pretty limited.

So the alternatives below will be solutions you’ll have to pay for.

4 Airbyte Alternatives

So if you’re looking for an alternative to Airbyte, I’d also consider asking whether you need a replacement or an augmentation.

Regardless, here are four other EL (T) solutions you can use in your data stack.

Portable.io

alternatives to airbyte portable

One data connector solution that has been developed over the past few years is Portable.io. Porable.io is a cloud-based data integration tool that replicates data to Snowflake, BigQuery, Amazon Redshift, PostgreSQL, etc. What I have enjoyed about Portable is that it takes care of many of the long-tail data connectors that Airbyte doesn’t.

All for a flat fee.

Portable pricing

  • Free Tier – This is only for manual syncs (so you better get used to clicking)
  • One-ff Scheduled data flow: $200/data flow with unlimited sources, destinations, and volumes
  • Business Tier data flow: $1,000 for up to 10 data flows
  • Custom: For specific needs

Portable Features

  • 500+ data source connectors
  • Support for major cloud data warehouse providers
  • Unlimited data sources, destinations, and volumes
  • Free development and maintenance of new data integrations
  • Hands-on support

What Stands Out About Portable

One of the reasons I enjoy working with Portable is that any time I needed a custom connector, I would email their support team, and they’d work with me to develop, test, and “productionize” it.

All at no cost to me!

Basically, it is like having an extra engineer on my team.

Pros

  • A flat pricing model, meaning you know what you’re paying upfront
  • Try all connectors for as long as you want with no charge
  • 500+ long-tail connectors that other ETL solutions don’t support
  • Custom connector creation and support at no additional cost

Cons

  • Doesn’t yet support the largest enterprise data sources (think Salesforce, thus using Portable in conjunction with other solutions makes sense)
  • Doesn’t focus on databases as sources
  • Not available internationally

Portable.io is a growing contender in this space with a readily accessible team. The CEO of Portable.io is frequently active on data engineering Reddits answering an array of questions.

But let’s talk about a solution that can also manage your real-time needs.

Estuary.dev

Estuary is a cutting-edge platform designed to revolutionize the way businesses handle their data pipelines. This innovative solution offers a no-code approach to building reliable pipes that don’t require scheduling, supporting both batch/streaming and materialized views in milli-seconds. The platform is built on an open-source streaming framework called Gazette, which combines millisecond-latency pub/sub with native persistence to cloud storage, essentially creating a real-time data lake.

One of the standout features of Estuary Flow is its approach to data storage. When a data source is captured, the data is stored in your cloud storage as regular JSON files. This allows you to materialize all of that history and ongoing updates into a variety of different data systems, creating identical, up-to-date views of your data in multiple places, now or in the future. This feature, known as “Collections instead of Buffers,” provides a significant advantage over traditional data pipeline solutions.

Estuary offers “Turnkey batch and streaming connectors,” supporting both real-time and historical data through one tool and providing access to pre-built connectors to approximately 50 endpoints. You can also plug in your own connector through Flow’s open protocol, offering a high degree of flexibility.

The platform also offers schema validation and first-class support for testing transformations, with continuous integration whenever you make changes.

Finally, Estuary Flow provides “Managed CDC,” a simple, efficient change data capture from databases with minimal impact and latency. It also offers seamless backfills and real-time streaming out of the box, making it a comprehensive solution for data management.

Estuary pricing

  • Open source: Completely free, as you manage the infrastructure yourself
  • Cloud: $2.50/credit (one million rows = 6 credits; 1 GB = 4 credits)
  • Cloud high volume: Custom pricing for those that need more than 5,000 credits

Estuary Features

  • 75+ data source connectors (not all available on cloud service)
  • Change data capture support for databases
  • Support for data warehouses and data lakes as destinations
  • Data volume-based pricing

What Stands Out About Estuary

Overall, Estuary makes moving large amounts of data fast and affordable. Most other solutions I looked into would often cost 2x-10x as much or require me to code everything from scratch. Estuary helped me deliver data quickly into several customers’ data warehouses without spending time setting and managing multiple solutions to stream data.

Pros

  • Robust coverage of high-scale technology systems, like databases.
  • Data transformation with built-in testing
  • Real-time data capture and processing.

Cons

  • Estuary is a newer solution, so it is in a period of rapid change.
  • Not as many SaaS integrations as some alternatives (this can be supplemented with Portable or Fivetran).

Estuary is a comprehensive solution for businesses looking to streamline their data pipelines. With its innovative features and commitment to continuous improvement, it’s a platform worth considering for any business dealing with large volumes of data.

Matillion

fivetran alternative options

Proponents of Matillion’s ELT solution feel like it often surpasses Fivetran as it does far more than just EL. Unlike Airbyte, which doesn’t have fully fleshed-out transform capabilities and really just relies on DBT to perform its transforms, Matillion provides the end-user post-load transformations. Users can create transformation components with an easy to interact with point-and-click UI. This can be very favorable for some companies looking to have a more all-in-one tool in terms of ELT.

Overall, Matillion can be a solid replacement.

Matillion pricing

  • Free: Up to one million rows/month
  • Basic: $2.00/credit
  • Advanced: $2.50/credit
  • Enterprise: $2.70/credit

Matillion Features

  • 125+ data source connectors
  • On-premises and cloud deployment options
  • Cloud data transformation is presented with a graphic user interface (GUI)
  • Supports ETL, reverse ETL, CDC, and several other forms of data workflows

Pros

  • Strong data transformation capabilities built-in
  • On-premises option available
  • Since Matillion offers loading and transformation, it can be easier to implement data governance

Cons

  • Matillion’s GUI-based transformations can have a learning curve
  • Fewer data connections than other competitors, including Airbyte

Rivery

Rivery is an ELT solution that offers a comprehensive suite of data management tools, including data integration, activation, transformation, and orchestration. It’s designed with a no-code interface that simplifies the creation of data pipelines, which Rivery terms as “rivers.” The platform supports over 200 pre-built connectors, enabling seamless data integration from various sources directly into your data warehouse.

Rivery Features

  • 200+ data source connectors
  • Cloud deployment options
  • Cloud data transformation is presented with a graphic user interface (GUI)
  • Supports ETL, reverse ETL, CDC, and several other forms of data workflows

Rivery Pricing

This solution offers three versions: starter, professional, and enterprise. The pricing breakdown is as follows, starter plan costs $0.75 per rivery pricing unit (RPU) monthly credit, $1.20 per RPU credit monthly for the professional plan and, the enterprise plan is customizable.

Rivery Pros

  • Rivery is an all in one solution, meaning they offer extraction, loading and transforms

Rivery Cons

  • The pay-per-use is a little abstract as it’s put behind their RPU which is essentially another layer of abstraction on-top of cloud costs.
  • The GUI can help improve pipeline development but it can also feel limiting to individuals who are used to writing code

Which Solution Works Best For You?

ELT solution has to mesh with the technique and strategy involved in the company’s processes. The ELT platform can, as mentioned, save coding hours, but it needs to be integrated in the right way to provide the best service. That means understanding where data will be deployed at the endpoint and figuring out all of the primary sources that are priorities for centralizing data.

Any of the above resources can work with the right implementation and design. By taking on more of the data center process in an automated and replicating way, the company is easing the burden on its in-house staff and positioning for better scalability and growth. Take a look at some of the top ELT tools to understand how these are integrated into a commercial context and what that means in the age of the cloud and SaaS.

Thanks for reading! If you want to read more about data consulting, big data, and data science, please click below.

Thanks for reading! If you’d like to read more about data engineering, check out the articles below.

Normalization Vs Denormalization – Taking A Step Back

Data Warehouse vs Data Lake vs Data Lakehouse: What’s the difference?

Alternatives to dbt (Data Build Tool)

Using The Cloud As A Data Engineer

What Is SSIS and Should You Use It?

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *