Wanted to follow up on the issue.
I found and tried a suggestion by this thoughtful soul:
https://stackoverflow.com/users/622140/james
On centos 7: yum install awscli
Quote:
. . . I ran across documentation on how to get the AWS S3 CLI sync command to synchronize buckets with massive parallelization. The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:
|
Code:
aws configure set default.s3.max_concurrent_requests 1000
aws configure set default.s3.max_queue_size 100000
After executing the aws configure set commands above, use the sync command as follows:
Code:
aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path
Result: 1TB copy from bucket A to bucket B in 6 hours; or, roughly 47.5 MB/s.
6 hours vs. 10 - 14 days using s3cmd - enormous improvement.
I observed the aws s3 sync command created more processes than I could count, used 10% - 32% cpu across all 8 cores on a Xeon cpu, and grabbed roughly 8 GB's of RAM.
James from stackoverflow achieved much higher data transfer rates up to 700MiB/s transferring files between 3 to 50 GB in size; whereas, I was transferring hundreds of thousands of files of much smaller size.
This may explain our speed differences.