Moving Large Dataset to AWS Platform using Snowball

Selvakumar Palanisamy
4 min readApr 4, 2021

It is difficult to move large volume of data to cloud platform over conventional internet links, bandwidth available just is not enough.

Snowball helps to address the issue businesses faced when attempting to transfer large amount of data.

AWS Snowball Edge stands as an appliance offering 100 TB data transfer complete with on-board storage and compute. Snowball Edge can be used in three data use case scenarios

  1. Move large chunks of data into and out of AWS
  2. Prop up independent local workloads across remote destinations
  3. Act as temporary storage for local data sets.

Data Transfer Phases

Data Transfer Phases

1) First you need to discover the data to be transferred -Capture the details about

Average files sizes

File system

Network topology

2) After data discovery, design a plan to migrate the data

Define timelines

Assessment of the existing data platform infrastructure

Tools /automated interfaces, resources

Post migration processing

3) POC to evaluate the end -to-end process. Recommendation is to use single snowball edge to gain insights about data transfer, identify the issues and optimize the tools /migration scripts.

4) Once the data is copied /migrated to snowball edge device, perform additional validations for data integrity with S3 integrity check to ensure that data is transferred without any modifications

For large migrations, it is recommended that you engage storage, network, and AWS cloud teams while planning. This will help us to align the right information and resources for your success.

AWS Snowball Data transfer Architecture

Below are the high-level steps to perform the data migration from on-premises NAS device to AWS snowball edge device

1) Create Data transfer job and provide the your datacenter address to receive the snowball edge device, S3 bucket to copy the data and role, encryption key details.

2) Once you received the device, power on the device and connect it to the local network .Work with network team to get the available static ip for snowball device (internal private subnet).

3) Connect the workstation and snowball device to the internal network preferably in the same subnet.

4) Download the AWS OPSHUB client to the workstation, enter the unlock key and manifest file retrieved from the snowball data transfer job page( AWS console) to unlock the device.

5) Once the device is unlocked, you can see the OPSHUB dashboard -Storage, compute details.

6) Configure the AWSCLI on the workstation to connect to snowball device.

7) Fetch the access key id and secret access key from snowball edge using snowball edge command line and use these details to configure AWS cli.

8) Write scripts to compress the small files into larger file for the optimal batch data transfer.

AWS CLI has two primary commands used in transferring data from the local file system to the Snowball Edge; copy and sync.

aws s3 cp source-file s3://bucket-name — endpoint end-point-name — profile data-trans-profile –metadata “owner=dmig,group=data”

aws s3 sync . S3:\\<bucket-name> — profile <profile-name> — endpoint <endpoint details > — region snow

Validation

1) Check the logs — Amazon S3 Adapter for Snowball client

· Windows — C:/Users/<username>/.aws/snowball/logs/snowball_adapter_<year_month_date_hour>

· Linux — /home/.aws/snowball/logs/snowball_adapter_<year_month_date_hour>

· Mac — /Users/<username>/.aws/snowball/logs/snowball_adapter_<year_month_date_hour>

2) Validate Command for the Snowball Client

snowball -v validate

validates all the metadata and transfer statuses for the objects on the Snowball, This command might take some time to complete, and might appear to be stuck from time to time.

3) Manual Data Validation for Snowball After Import into Amazon S3

Whenever data is imported into or exported out of Amazon S3, you can download PDF job report.

The job report provides insight into the state of your Amazon S3 data transfer. The report includes details about your job or job part for your records. The job report also includes a table that provides a high-level overview of the total number of objects and bytes transferred between the Snowball and Amazon S3.

AWS Snowball Edge Data transfer cost : There is a $300 service fee per data transfer job. This includes up to 10 days of on-site usage (not including shipping days). Additional fees are applied whenever the appliance is kept for greater than 10 days.

--

--

Selvakumar Palanisamy

Cloud solution architect with extensive experience in Cloud Architecture, Server-less solutions , Cloud Big data solutions, Strategy , IT operations