How to get your Data to AWS

Bring On The Data

Whether you are dealing with a single massive database, millions of log files or stream of data from thousands of IoT sensors the data AWS offers services that can be combined to facilitate the secure and efficient transport of your data to the cloud.

This article will provide an overview of how to handle the three common data sources: files, databases and data streams.

How to Handle Files

The simplest way to get files into AWS is to upload them to AWS Simple Storage Service (S3). S3 supports multi-part uploads of 5 GB chunks to form a single file with a maximum size of 5 TB. By default, S3 uses the Internet to transport the data, therefore if you are transporting sensitive information be sure to use AWS Key Management Service (KMS) to perform client-side and server-side data encryption.

If you are frequently moving Terabytes of data, AWS Direct Connect provides a private, dedicated, 1Gbps or 10Gbps consistent network connection between your premise and the AWS network. AWS offers a discounted bandwidth transfer rates for AWS Direct Connect, but remember on top of the bandwidth transfer cost, there are hourly port connection costs and the monthly fee from the APN Partner providing the network connection.

For transferring Petabytes (Thousands of Terabytes) of data or for locations which limited or expensive network connectivity AWS offers AWS Snowball. AWS will send you a rugged and tamper-resistant device the size of a PC tower that can be connected to your on-premise network using at 10Gbps RJ45, SFP+ copper, or SFP+ optical network interface. One AWS Snowball can store up to 80TB of data and can encrypt your data on the fly when it's full, ship it back to AWS and they will load the data for you to the cloud.

You can order as many Snowballs as your heart desires if you want a semi-truck capable of transporting 100PB of data or the equivalent of 1250 Snowballs? You can do that! It's called an AWS Snowmobile.

The urgency you need to data in the cloud, how frequent you will be transferring data and your budget will determine which mode transport method fits your needs.

This chart shows the numbers of days it will require to transfer data at a given ideal maxed out transfer rate:

Amount of Data
Direct Connect
80TB (Snowball)

How to Handle Databases

The options to transport databases to the cloud are identical to transferring files, with the additional ability to use a VPN to secure the data while in transit to the target AWS database services: Amazon Relational Database Service (RDS), Amazon DynamoDB, and Amazon Redshift. These provide an endpoint for all your SQL, NoSQL and data warehousing needs, and I will go into more detail in a following article where I cover Big Data storage options on AWS.

The task of moving files is straightforward. The task of moving database is more complex. AWS offers AWS Database Migration Service (DMS) to help with this task. With DMS you do not need to bring your database offline and provides error handling, as well as the ability to map your existing database schemas to different database engines.

How to Handle Data Streams

Data streams, as the name implies, is a continuous flow of data. This data could come from another cloud service providing real-time stock prices to thousands of IoT sensors distributed in facilities around the world, anything with the ability collect or generate data and an Internet connection can a source of a data stream.

Amazon Kinesis Stream can be used as the big data stream that all your smaller data streams merge and flow into. Kinesis is designed to support thousands of writes a second, and increasing data and read /write bandwidth is as easy as adding additional shards. The Kinesis services embody features to perform data analytics or can be used to feed other AWS analytic services which I will go into more depth in a later article.

For persistent data streams like a web socket, the data can be written directly to a Kinesis Stream.

For devices or services that require a REST API interface, the interface can be build using Amazon API Gateway.

If you looking for a managed solution you can look into AWS IoT that provides all the component to build an end-to-end IoT solution. AWS IoT is a platform that enables you to connect devices to AWS Services and other devices, secure data and interactions, process and act upon device data.

Serving as the gatekeeper, AWS Identity & Access Management (IAM) service provides various methods to broker authorization and access to Amazon Kinesis, Amazon API Gateway or AWS IoT.

As you can see, with AWS it's not a matter of can you get your data to the cloud, it's a matter of which method fits your needs from a functionality, security, reliability, and cost standpoint.

If you would like recommendations on the best approach to get your data to the cloud feel free to reach out.


Popular posts from this blog

Apple Pay, Android Pay, contactless credit cards, is it safe?

Failed CUDA Toolkit Install? Ubuntu 18.04 stuck on boot of Gnome Display Manager?

How Salesforce uses AWS to Improve The Support Call Experience