Skip to content

Setup s5cmd CLI object storage client

Last updated on

s5cmd is a high-performance command-line tool designed for managing files in S3-compatible object storage services. Here are some key features and highlights:

  • Speed: s5cmd is optimized for speed, capable of handling large-scale operations efficiently.
  • Parallelism: It supports parallel execution, allowing multiple operations to run concurrently.
  • Versatility: You can perform a variety of tasks such as copying, moving, removing, versioning and listing files.
  • Scripting: s5cmd can be easily integrated into scripts for automated workflows.
  1. Download the latest release for your platform from the GitHub repository.
Unpack the archive.
Move the executable to /usr/local/bin.
Terminal window
Install the.deb package with dpkg:
dpkg -i s5cmd_2.3.0_linux_amd64.deb

Install with Homebrew:

Terminal window
brew install peak/tap/s5cmd
Unpack the archive.
Place the.exe file in a directory in your system path.
Terminal window
export S3_ENDPOINT_URL="https://object.storage.eu01.onstackit.cloud"
export AWS_ACCESS_KEY_ID="xxxxxxxxxxxx"
export AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyyyyyyyyy"
export AWS_REGION="eu01"
Terminal window
setx S3_ENDPOINT_URL "https://object.storage.eu01.onstackit.cloud"
setx AWS_ACCESS_KEY_ID "xxxxxxxxxxxx"
setx AWS_SECRET_ACCESS_KEY "yyyyyyyyyyyyyyyyyyyyyy"
setx AWS_REGION "eu01"
Terminal window
s5cmd ls
Terminal window
s5cmd ls s3://bucket
Terminal window
s5cmd head s3://bucket/object.gz
Terminal window
s5cmd cp s3://bucket/object.gz.

Suppose we have the following objects:

Terminal window
s3://bucket/logs/2020/03/18/file1.gz
s3://bucket/logs/2020/03/19/file2.gz
s3://bucket/logs/2020/03/19/originals/file3.gz
Terminal window
s5cmd cp 's3://bucket/logs/2020/03/*' logs/

s5cmd will match the given wildcards and arguments by doing an efficient search against the given prefixes. All matching objects will be downloaded in parallel.s5cmdwill create the destination directory if it is missing.

logs/ directory content will look like:

Terminal window
$ tree
.
└── logs
├── 18
└── file1.gz
└── 19
├── file2.gz
└── originals
└── file3.gz
4 directories, 3 files

ℹ️s5cmd preserves the source directory structure by default. If you want to flatten the source directory structure, use the--flattenflag.

s5cmd cp --flatten 's3://bucket/logs/2020/03/*' logs/

logs/ directory content will look like:

Terminal window
$ tree
.
└── logs
├── file1.gz
├── file2.gz
└── file3.gz
1 directory, 3 files
Terminal window
s5cmd cp object.gz s3://bucket/

Upload multiple files to the Object Storage

Section titled “Upload multiple files to the Object Storage”
Terminal window
s5cmd cp directory/ s3://bucket/

Will upload all files at given directory to the Object Storage while keeping the folder hierarchy of the source.

Terminal window
s5cmd rm s3://bucket/logs/2020/03/18/file1.gz
Terminal window
s5cmd rm 's3://bucket/logs/2020/03/19/*'

Will remove all matching objects:

s3://bucket/logs/2020/03/19/file2.gz
s3://bucket/logs/2020/03/19/originals/file3.gz

s5cmd utilizes S3 delete batch API. If matching objects are up to 1000, they’ll be deleted in a single request. However, it should be noted that commands such as

Terminal window
s5cmd runs3://bucket-foo/object s3://bucket-bar/object

are not supported by s5cmd and result in error (since we have 2 different buckets), as it is in odds with the benefit of performing batch delete requests. Thus, if in need, one can use s5cmd run mode for this case, i.e,

Terminal window
s5cmd run
rm s3://bucket-foo/object
rm s3://bucket-bar/object

Copy objects from one Object Storage to another Object Storage

Section titled “Copy objects from one Object Storage to another Object Storage”

s5cmd supports copying objects on the server side as well.

Terminal window
s5cmd cp 's3://bucket/logs/2020/*' s3://bucket/logs/backup/

Will copy all the matching objects to the given prefix, respecting the source folder hierarchy.

⚠️Copying objects (from one Object Storage to another Object Storage) larger than 5GB is not supported yet.

The most powerful feature of s5cmd is the commands file. Thousands of S3 and filesystem commands are declared in a file (or simply piped in from another process) and they are executed using multiple parallel workers. Since only one program is launched, thousands of unnecessary fork-exec calls are avoided. This way S3 execution times can reach a few thousand operations per second.

Terminal window
s5cmd run commands.txt

commands.txt content could look like:

Terminal window
cp 's3://bucket/2020/03/*' logs/2020/03/
# line comments are supported
rm s3://bucket/2020/03/19/file2.gz
# empty lines are OK too like above
# rename an S3 object
mv s3://bucket/2020/03/18/file1.gz s3://bucket/2020/03/18/original/file.

numworkers is a global option that sets the size of the global worker pool. Default value of numworkers is256.

Commands such ascp, select and run, which can benefit from parallelism use this worker pool to execute tasks. A task can be an upload, a download or anything in arun file.

For example, if you are uploading 100 files to an S3 bucket and the —numworkersis set to 10, then s5cmd will limit the number of files concurrently uploaded to 10.

Terminal window
s5cmd --numworkers 10 cp '/Users/foo/bar/*' s3://mybucket/foo/bar/

concurrency is a cp command option. It sets the number of parts that will be uploaded or downloaded in parallel for a single file. Default value of concurrency is5.

numworkers and concurrency options can be used together:

Terminal window
s5cmd --numworkers 10 cp --concurrency 10 '/Users/foo/bar/*' s3://mybucket/foo/bar/

If you have a few, large files to download, setting —numworkers to a very high value will not affect download speed. In this scenario setting —concurrency to a higher value may have a better impact on the download speed.