Amazon S3 and Aurora are two of the widely used highly scalable services provided by AWS. The former is an object storage service, and the latter is a relational database designed to be compatible with MySQL and PostgreSQL. While Amazon S3 offers robust features for managing and organizing your data, Aurora provides advanced querying capabilities. By transforming data from Amazon S3 to Amazon Aurora, you can leverage the combined potential of semi-structured and structured data in your S3. As a result, it can become your single source of truth for your scheduled or real-time analytics.

This guide will outline two straightforward methods for moving Amazon S3 data to an Aurora table. By the end of this article, you will gain a clear understanding and learn the necessary steps to connect Amazon S3 to Amazon Aurora.

Let’s dive in. 

Methods to Connect Amazon S3 to Amazon Aurora

You can use one of the following methods to connect Amazon S3 and Amazon Aurora.

  • Method 1: Custom Scripts Approach to Transfer Data from Amazon S3 to Amazon Aurora.
  • Method 2: Using No-Code Tool to Connect Amazon S3 to Amazon Aurora

Method 1: Custom Scripts Approach to Transfer Data from Amazon S3 to Amazon Aurora

This method illustrates how you can load data from Amazon S3 to Amazon Aurora PostgreSQL-compatible database. It involves the following steps:

  • Installing the aws_s3 extension 
  • Setting up access to the S3 bucket 
  • Connect to Amazon Aurora
  • Use a psql query to copy the data to your Aurora PostgreSQL DB cluster

Here are the details of each step for an Amazon S3 Amazon Aurora integration:

Step 1: Installing the aws_s3 extension

The aws_s3 extension allows you to read data from S3 into Aurora PostgreSQL tables and write data from Aurora PostgreSQL tables to S3. The aws_s3 extension provides functions like s3_import and s3_export to import and export data between the databases and the Amazon S3 bucket. 

To install the extensions, run the following command in your Aurora PostgreSQL CLI.

postgres=> CREATE EXTENSION aws_s3 CASCADE;

To confirm if the extension is successfully installed, you can use the psql \dx metacommand. Alternatively, you can also run the below query.

select * from pg_available_extensions where installed_version is not null;

Once you download this extension, it enables Aurora PostgreSQL to interact with AmazonS3 and allows you to import and export data between the database and S3.

Step 2: Setting up Access to an Amazon S3 Bucket 

Now collect all the details you need to supply to the function. This includes Aurora PostgreSQL DB table name, bucket name, file path, file type, and AWS Region where the S3 data is located. If you’ve not created a table in the Aurora PostgreSQL DB cluster to copy S3 data, run the following command. The below command creates a table named info into which the aws_s3.table_import_from_s3 function will be used to import the data.

postgres=> CREATE TABLE info (column1 varchar(60), column2 varchar(60), column3 varchar(60));

To get the Amazon S3 bucket details, open the Amazon S3 console and select Buckets. Find and choose the bucket containing your data. Select its Object Overview page and choose Properties. Here you will get the bucket’s name, path, file type, AWS Region, and Amazon Resource Name (ARN).

Now, set up permissions on your Aurora PostgreSQL-compatible DB cluster to allow access to the Amazon S3 bucket containing the file. You can either create an AWS Identity and Access Management (IAM) role or security credentials. Here we will proceed by using an IAM role to access an Amazon S3 bucket for Windows. This includes two steps, first create an IAM policy and then create an IAM role.

  1. Create an IAM policy

To create an IAM policy, run the following command in your AWS CLI:

aws iam create-policy ^
   --policy-name aurora-s3-import-policy ^
   --policy-document '{
     "Version": "YYYY-MM-DD",
     "Statement": [
       {
         "Sid": "s3import",
         "Action": [
           "s3:GetObject",
           "s3:ListBucket"
         ], 
         "Effect": "Allow",
         "Resource": [
           "arn:aws:s3:::nameof-s3-bucket", 
           "arn:aws:s3:::nameof-s3-bucket/*"
         ] 
       }
     ] 
   }'  

The above command creates an IAM policy named aurora-s3-import-policy and grants access to a bucket named nameof-s3-bucket. This policy will provide the bucket and object permission that will allow your Aurora PostgreSQL DB cluster to access Amazon S3.

  1. Create an IAM role
aws iam create-role ^
   --role-name aurora-s3-import-role ^
   --assume-role-policy-document '{
     "Version": "YYYY-MM-DD",
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
            "Service": "rds.amazonaws.com"
          },
         "Action": "sts:AssumeRole",
         "Condition": {
             "StringEquals": {
                "aws:SourceAccount": "**********",
                "aws:SourceArn": "********"
                }
             }
       }
     ] 
   }'

The above command will create a role named aurora-s3-import-role.

After creating the role and policy, attach the IAM policy to the IAM role that you’ve just created using the following command:

aws iam attach-role-policy ^
   --policy-arn arn^
   --role-name aurora-s3-import-role    

Replace ARN with your policy ARN.

Step 3: Connect to Amazon Aurora 

Now you can add an IAM role for an Aurora PostgreSQL DB cluster using: AWS CLI or Console. Here’s a step-by-step process included in using both ways:

  1. Using AWS CLI
aws rds add-role-to-db-cluster ^ 
--db-cluster-identifier my-db-cluster ^ 
--feature-name s3Import ^ 
--role-arn role-arn ^ 
--region aws-region

Replace my-db-cluster with your Aurora PostgreSQL DB cluster name, role-arn with your role ARN details, and mention your AWS region.

  1. Using the Console

Log in to your AWS Management Console and open the Amazon RDS console. 

Select the PostgreSQL DB cluster name. On the Connectivity & security tab, mention the role to add under Add IAM roles to this cluster in the Manage IAM roles section.

Select s3 import under the Feature and click on Add role.

Step 4: Use a psql Query to Copy the Data to your Aurora PostgreSQL DB cluster

Finally, run the query below to import data from your S3 bucket using the table_import_from_s3 function of the aws_s3 extension.

postgres=> SELECT aws_s3.table_import_from_s3(
   'info',
   ' ', 
   '(format csv)',
   :'s3_uri'
);

Let’s break down the above syntax:

  • info: Mention the name of the table (refer to step 2) in the Aurora PostgreSQL DB cluster where the data will be copied. 
  • ‘ ‘ : This parameter indicates which S3 columns must be replicated in Aurora PostgreSQL table columns. If you don’t specify any columns, all the column data will be copied to the table. 
  • (format csv): This specifies the data format of the files in S3. In this case, it’s set to CSV format. You can modify this parameter according to the format of your data files, such as JSON, AVRO, etc.
  • S3_uri: Mention the unique identifier that specifies the file’s location within your S3 bucket.

Likewise, using the aws_s3 extension, you can also import an Amazon S3 file data of custom delimiter format, a compressed (gzip) file, or an encoded file.

Note: The above steps are applicable only for Aurora PostgreSQL-compatible databases. For the Aurora MySQL-compatible database, you can follow this guide.

You have successfully established the Amazon S3 Amazon Aurora connection for moving data from Amazon S3 to Aurora.

While using the custom-scripts method looks time-consuming, it is better suited for several use cases:

Flexibility: Custom scripts provide you with complete flexibility and control over the entire data movement. You can enrich your existing data by combining data from other sources. This can allow you to enhance the quality of your data for an in-depth analysis. You can further transform and filter before storing data in Amazon Aurora. This makes custom script a viable option for organizations with numerous custom requirements.

Data Security: In a highly regulated industry where access to data is heavily restricted, it is ideal to handle the data migration task with custom scripts. This eliminates the need for moving the data to any third-party servers. You can enhance your data security by following secure development practices like data masking, access control, and encryption.

Data Accuracy: Data validation enables you to ensure the accuracy and correctness of the migrated data. By implementing validation checks in the custom script, you can verify that the transferred data from source to destination meets the expected standards. Common validation checks in custom scripts inspect the integrity of data and remove duplicate data to ensure consistency.

Limitations of the Custom Scripts Method to Move Data from Amazon S3 to Amazon Aurora

While using custom scripts to migrate data from Amazon S3 to Amazon Aurora offers flexibility and security, there are some limitations to consider:

  • Developing custom scripts requires technical skills to design, implement, test, and maintain the scripts. As the data volume increases, it could be time-consuming and resource-intensive.
  • The custom scripts not only need development and maintenance effort but also require robust error-handling mechanisms through writing numerous test cases. It can also include implementing backups through data replication for quick recovery during downtime. These techniques should be incorporated to address data validation failures, network interruptions, or other potential issues during the migration process.

Here’s a better alternative!

Method 2: Using No-Code Tool to Connect Amazon S3 to Amazon Aurora

Hevo Data is a cloud-based pipeline solution that helps you to streamline and automate the process of collecting, transforming, and loading data. With its wide range of 150+ pre-built connectors, you can quickly establish connections between databases and platforms without writing a single line of code. This allows you to extract data from various sources and move it to the required destination.

  • Connect and Configure Data Source: Before you proceed to connect Amazon S3 as a data source, check the prerequisites here. Once you’re aware of all the details required to connect to Amazon S3, log in to the Hevo account to configure your S3 account. Click on + Create Pipeline, search and select S3. Now, connect and configure to your S3 bucket account either using IAM Role or Access Credentials option. Mention the required details and click on Test and Continue
Configure S3 as Source
  • Connect and Configure Destination: Before you proceed to configure Amazon Aurora as your destination, check the prerequisites here. To connect to your Aurora account, select the pipeline that you’ve just created in the previous step. Search and select Amazon Aurora and enter a unique name for your destination. Then fill in the Aurora account details such as database host, port username, password, and database name. After filling in all the details, click on Save & Continue.
Configure Amazon Aurora as Destination

That’s it! Now the Hevo Data pipeline will start capturing data from Amazon S3 and replicate it into your Amazon Aurora tables.

Check out some of the reasons to choose the Hevo Data platform for your data integration needs: 

  • User-friendly Interface: Hevo provides a user-friendly, no-code interface to set up data integrations without any coding skills. This ease-of-use empowers both technical and non-technical users to easily manage and automate data pipelines.
  • Real Time Data Integration: Hevo supports real-time data integration, allowing you to capture and process data as soon as it is generated. This allows you to achieve up-to-date insights and make timely data-driven decisions for your business.
  • Automatic Schema Detection and Mapping: Hevo detects schema as it extracts data from the source and automatically maps the transformed data to the target schema. It creates the required tables and columns in the destination database as and when the schema changes. This schema evolution eliminates the need for manual intervention and ensures data consistency.
  • Monitoring and Alerting: Hevo includes several features to monitor data which include error handling and data lineage tracking. It offers visibility into data pipelines, allowing you to track and ensure the accuracy of data throughout the data integration process. You can also set alerts to get notified as soon as any issue occurs during the migration process.

What can you Achieve by Replicating Data from Amazon S3 to Amazon Aurora?

Amazon S3 and Amazon Aurora integration can provide several benefits and enable various use cases. Here are some of them:

  • By centralizing your data into a relational database, you can perform real-time sales analysis to gain immediate insights into your customers’ buying behavior, sales trend, and product performance. This, in turn, will allow you to focus on desired customers with personalized marketing campaigns, product recommendations, and sales promotions.
  • Moving data to Aurora allows you to integrate data from multiple sources, including sales and marketing. With the consolidated view of your business data, you can identify the most profitable product categories on your website. 
  • By migrating data to Aurora, you can create a structured database of all your customer-product relationships. This includes customer profiles, campaign data, website analytics, and advertising data. Having a unified view will enable you to perform comprehensive analysis, better decision-making, and improved marketing strategies.

Conclusion

The custom script approach of Amazon S3 to Amazon Aurora ETL discussed in this article is practical and applicable only in a one-off kind of scenario. As the process involves writing code and fetching several details, it demands manual time and effort. You also need extensive technical skills to build the pipeline. On the other hand, a solution like Hevo goes one step above.

Hevo is simple to use and allows you to complete this process by only specifying a few details about the Amazon S3 and Aurora. It has pre-built integrations with 150+ sources. You can connect your SaaS platforms, databases, etc., to any data warehouse you choose, without writing any code or worrying about maintenance. If you are interested, you can try Hevo by signing up for the 14-day free trial.

Visit our Website to Explore Hevo

Tejaswini Kasture
Freelance Technical Content Writer, Hevo Data

Tejaswini's profound enthusiasm for data science and passion for writing drive her to create high-quality content on software architecture, and data integration.

All your customer data in one place.