Copying files between buckets in S3
It happens sometimes that data that you need are in a different bucket than you really need them. You cannot rename the bucket sometimes data are in a different account etc. Moving files to your drive and then uploading back is not only very slow but you also pay for the band with. Much more efficient is to use Amazon’s APIs to copy files around. I think there are some commercial tools that help you with this but you can easily roll your own tool.
This copies all the files in several batches. Each batch will move 20 files in parallel. My files were all of the similar size so I did not need to worry about creating some smarter mechanism than just wait for the whole batch.
Just remember that if you are intending to use EMR later name your buckets only with alphanumeric, dashes and dots (like in the example) or you will be copying again soon because EMR does not like it one bit.