Setup s5cmd CLI object storage client
Installation
Section titled “Installation”- Download the latest release for your platform from the GitHub repository.
Unpack the archive. Move the executable to /usr/local/bin.For Debian based distributions:
Section titled “For Debian based distributions:”Install the.deb package with dpkg: dpkg -i s5cmd_2.3.0_linux_amd64.debInstall with Homebrew:
brew install peak/tap/s5cmdWindows
Section titled “Windows”Unpack the archive. Place the.exe file in a directory in your system path.Configuration
Section titled “Configuration”For Linux and macOS:
Section titled “For Linux and macOS:”export S3_ENDPOINT_URL="https://object.storage.eu01.onstackit.cloud" export AWS_ACCESS_KEY_ID="xxxxxxxxxxxx" export AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyyyyyyyyy" export AWS_REGION="eu01"For Windows Command prompt:
Section titled “For Windows Command prompt:”setx S3_ENDPOINT_URL "https://object.storage.eu01.onstackit.cloud" setx AWS_ACCESS_KEY_ID "xxxxxxxxxxxx" setx AWS_SECRET_ACCESS_KEY "yyyyyyyyyyyyyyyyyyyyyy" setx AWS_REGION "eu01"Examples
Section titled “Examples”List All Buckets
Section titled “List All Buckets”s5cmd lsList All objects in a bucket
Section titled “List All objects in a bucket”s5cmd ls s3://bucketPrint a remote object’s metadata
Section titled “Print a remote object’s metadata”s5cmd head s3://bucket/object.gzDownload a single Object Storage object
Section titled “Download a single Object Storage object”s5cmd cp s3://bucket/object.gz.Download multipleObject Storageobjects
Section titled “Download multipleObject Storageobjects”Suppose we have the following objects:
s3://bucket/logs/2020/03/18/file1.gz s3://bucket/logs/2020/03/19/file2.gz s3://bucket/logs/2020/03/19/originals/file3.gzs5cmd cp 's3://bucket/logs/2020/03/*' logs/s5cmd will match the given wildcards and arguments by doing an efficient search against the given prefixes. All matching objects will be downloaded in parallel.s5cmdwill create the destination directory if it is missing.
logs/ directory content will look like:
$ tree. └── logs ├── 18 │ └── file1.gz └── 19 ├── file2.gz └── originals └── file3.gz 4 directories, 3 filesℹ️s5cmd preserves the source directory structure by default. If you want to flatten the source directory structure, use the--flattenflag.
s5cmd cp --flatten 's3://bucket/logs/2020/03/*' logs/logs/ directory content will look like:
$ tree. └── logs ├── file1.gz ├── file2.gz └── file3.gz 1 directory, 3 filesUpload file to theObject Storage
Section titled “Upload file to theObject Storage”s5cmd cp object.gz s3://bucket/Upload multiple files to theObject Storage
Section titled “Upload multiple files to theObject Storage”s5cmd cp directory/ s3://bucket/Will upload all files at given directory to theObject Storagewhile keeping the folder hierarchy of the source.
Delete anObject Storageobject
Section titled “Delete anObject Storageobject”s5cmd rm s3://bucket/logs/2020/03/18/file1.gzDelete multipleObject Storageobjects
Section titled “Delete multipleObject Storageobjects”s5cmd rm 's3://bucket/logs/2020/03/19/*'Will remove all matching objects:
s3://bucket/logs/2020/03/19/file2.gz s3://bucket/logs/2020/03/19/originals/file3.gzs5cmd utilizes S3 delete batch API. If matching objects are up to 1000, they’ll be deleted in a single request. However, it should be noted that commands such as
s5cmd runs3://bucket-foo/object s3://bucket-bar/objectare not supported bys5cmdand result in error (since we have 2 different buckets), as it is in odds with the benefit of performing batch delete requests. Thus, if in need, one can uses5cmd runmode for this case, i.e,
$ s5cmd run rm s3://bucket-foo/object rm s3://bucket-bar/objectCopy objects from oneObject Storageto anotherObject Storage
Section titled “Copy objects from oneObject Storageto anotherObject Storage”s5cmd supports copying objects on the server side as well.
s5cmd cp 's3://bucket/logs/2020/*' s3://bucket/logs/backup/Will copy all the matching objects to the given prefix, respecting the source folder hierarchy.
⚠️Copying objects (from oneObject Storageto anotherObject Storage) larger than 5GB is not supported yet.
Run multiple commands in parallel
Section titled “Run multiple commands in parallel”The most powerful feature of s5cmd is the commands file. Thousands of S3 and filesystem commands are declared in a file (or simply piped in from another process) and they are executed using multiple parallel workers. Since only one program is launched, thousands of unnecessary fork-exec calls are avoided. This way S3 execution times can reach a few thousand operations per second.
s5cmd run commands.txtcommands.txtcontent could look like:
cp 's3://bucket/2020/03/*' logs/2020/03/ # line comments are supported rm s3://bucket/2020/03/19/file2.gz # empty lines are OK too like above # rename an S3 object mv s3://bucket/2020/03/18/file1.gz s3://bucket/2020/03/18/original/file.gzConfiguring Concurrency
Section titled “Configuring Concurrency”numworkers is a global option that sets the size of the global worker pool. Default value of numworkers is256.
Commands such ascp, select and run, which can benefit from parallelism use this worker pool to execute tasks. A task can be an upload, a download or anything in arun file.
For example, if you are uploading 100 files to an S3 bucket and the —numworkersis set to 10, then s5cmd will limit the number of files concurrently uploaded to 10.
s5cmd --numworkers 10 cp '/Users/foo/bar/*' s3://mybucket/foo/bar/concurrency is a cp command option. It sets the number of parts that will be uploaded or downloaded in parallel for a single file. Default value of concurrency is5.
numworkers and concurrency options can be used together:
s5cmd --numworkers 10 cp --concurrency 10 '/Users/foo/bar/*' s3://mybucket/foo/bar/If you have a few, large files to download, setting —numworkers to a very high value will not affect download speed. In this scenario setting —concurrency to a higher value may have a better impact on the download speed.