Public Repository

Last pushed: 2 years ago
Short Description
Bulk transfer of objects into S3, for example from Akamai or another HTTP accessible data source.
Full Description

https://github.com/ReutersMedia/s3-bulk-transfer

This tool was designed to help migrate from Akamai NetStorage to S3, where the Akamai objects were publically accessible. It will preserve the Content-Type header from the source, and also will handle multi-GB files efficiently. Under the hood it creates shell commands that pipe curl into gof3r, avoiding any disk operations. gof3r is the only s3 command-line tool that will perform multipart and multithreaded uploads from a stream input.

Provide a list of URLs or paths for objects that are publically accessible via an HTTP GET, in a file defined by INPUT_FILE. Generally you will mount a data folder to /input in the container. You also specify a PART_NUMBER via the environment, or the command line.

The application will randomly partition the list of files into TOTAL_PARTS lists, and the container will work on only PART_NUMBER. So you can for example launch 20 versions (TOTAL_PARTS=20), then launch with PART_NUMBER=1,2,3,...,20. Before uploading an object it will test for it's existance in S3. It does not check hash values.

The number of upload threads for each container is also configurable. Under the hood, the application uses a shell command of the form below:

curl -L http://source_domain/my-path | go3fr -m "Content-Type: video/mp4" -b my-bucket -k my-path

You can supply either paths, or full URLs. The application will write out 3 files to the input directory upon completion: bad_files, good_files, and existing_files.

To run:
docker --env-file=my.env -v /home/me/input:/input reutersmedia/s3-bulk-transfer:latest

Docker Pull Command
Owner
reutersmedia

Comments (0)