Zum Inhalt springen

Setup s5cmd CLI object storage client

Diese Seite ist noch nicht in deiner Sprache verfügbar. Englische Seite aufrufen

  1. Download the latest release for your platform from the GitHub repository.
Unpack the archive. Move the executable to /usr/local/bin.
Install the.deb package with dpkg: dpkg -i s5cmd_2.3.0_linux_amd64.deb

Install with Homebrew:

brew install peak/tap/s5cmd
Unpack the archive. Place the.exe file in a directory in your system path.
Terminal window
export S3_ENDPOINT_URL="https://object.storage.eu01.onstackit.cloud" export AWS_ACCESS_KEY_ID="xxxxxxxxxxxx" export AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyyyyyyyyy" export AWS_REGION="eu01"
setx S3_ENDPOINT_URL "https://object.storage.eu01.onstackit.cloud" setx AWS_ACCESS_KEY_ID "xxxxxxxxxxxx" setx AWS_SECRET_ACCESS_KEY "yyyyyyyyyyyyyyyyyyyyyy" setx AWS_REGION "eu01"
s5cmd ls
s5cmd ls s3://bucket
s5cmd head s3://bucket/object.gz
s5cmd cp s3://bucket/object.gz.

Suppose we have the following objects:

s3://bucket/logs/2020/03/18/file1.gz s3://bucket/logs/2020/03/19/file2.gz s3://bucket/logs/2020/03/19/originals/file3.gz
s5cmd cp 's3://bucket/logs/2020/03/*' logs/

s5cmd will match the given wildcards and arguments by doing an efficient search against the given prefixes. All matching objects will be downloaded in parallel.s5cmdwill create the destination directory if it is missing.

logs/ directory content will look like:

$ tree. └── logs ├── 18 │ └── file1.gz └── 19 ├── file2.gz └── originals └── file3.gz 4 directories, 3 files

ℹ️s5cmd preserves the source directory structure by default. If you want to flatten the source directory structure, use the--flattenflag.

s5cmd cp --flatten 's3://bucket/logs/2020/03/*' logs/

logs/ directory content will look like:

$ tree. └── logs ├── file1.gz ├── file2.gz └── file3.gz 1 directory, 3 files
s5cmd cp object.gz s3://bucket/

Upload multiple files to theObject Storage

Section titled “Upload multiple files to theObject Storage”
s5cmd cp directory/ s3://bucket/

Will upload all files at given directory to theObject Storagewhile keeping the folder hierarchy of the source.

s5cmd rm s3://bucket/logs/2020/03/18/file1.gz
s5cmd rm 's3://bucket/logs/2020/03/19/*'

Will remove all matching objects:

s3://bucket/logs/2020/03/19/file2.gz s3://bucket/logs/2020/03/19/originals/file3.gz

s5cmd utilizes S3 delete batch API. If matching objects are up to 1000, they’ll be deleted in a single request. However, it should be noted that commands such as

s5cmd runs3://bucket-foo/object s3://bucket-bar/object

are not supported bys5cmdand result in error (since we have 2 different buckets), as it is in odds with the benefit of performing batch delete requests. Thus, if in need, one can uses5cmd runmode for this case, i.e,

$ s5cmd run rm s3://bucket-foo/object rm s3://bucket-bar/object

Copy objects from oneObject Storageto anotherObject Storage

Section titled “Copy objects from oneObject Storageto anotherObject Storage”

s5cmd supports copying objects on the server side as well.

s5cmd cp 's3://bucket/logs/2020/*' s3://bucket/logs/backup/

Will copy all the matching objects to the given prefix, respecting the source folder hierarchy.

⚠️Copying objects (from oneObject Storageto anotherObject Storage) larger than 5GB is not supported yet.

The most powerful feature of s5cmd is the commands file. Thousands of S3 and filesystem commands are declared in a file (or simply piped in from another process) and they are executed using multiple parallel workers. Since only one program is launched, thousands of unnecessary fork-exec calls are avoided. This way S3 execution times can reach a few thousand operations per second.

s5cmd run commands.txt

commands.txtcontent could look like:

cp 's3://bucket/2020/03/*' logs/2020/03/ # line comments are supported rm s3://bucket/2020/03/19/file2.gz # empty lines are OK too like above # rename an S3 object mv s3://bucket/2020/03/18/file1.gz s3://bucket/2020/03/18/original/file.gz

numworkers is a global option that sets the size of the global worker pool. Default value of numworkers is256.

Commands such ascp, select and run, which can benefit from parallelism use this worker pool to execute tasks. A task can be an upload, a download or anything in arun file.

For example, if you are uploading 100 files to an S3 bucket and the —numworkersis set to 10, then s5cmd will limit the number of files concurrently uploaded to 10.

s5cmd --numworkers 10 cp '/Users/foo/bar/*' s3://mybucket/foo/bar/

concurrency is a cp command option. It sets the number of parts that will be uploaded or downloaded in parallel for a single file. Default value of concurrency is5.

numworkers and concurrency options can be used together:

s5cmd --numworkers 10 cp --concurrency 10 '/Users/foo/bar/*' s3://mybucket/foo/bar/

If you have a few, large files to download, setting —numworkers to a very high value will not affect download speed. In this scenario setting —concurrency to a higher value may have a better impact on the download speed.