Accelerate AWS S3 Uploads: How To Perform Multi-part Upload With AWS CLI – A Practical Guide

In this article, you will learn how to speed up file uploads to AWS S3 with multi-part upload using AWS CLI. Our practical guide provides proper instructions for optimizing your uploads. You can upload a single object in multiple pieces using the multipart upload feature. A contiguous chunk of the object’s data is contained in each section. Each of these object elements can be uploaded on its own and in any arrangement. If any part of the transmission fails, you can retransmit it without affecting the other sections. The object is created by Amazon S3 after all the component components have been submitted. Generally speaking, if your item size exceeds 100 MB, you should think about using multipart uploads rather than sending the data all at once.

The general process is as follows:

  1. Split the file into multiple smaller files.
  2. Initiate the multipart upload by sending a “Create Multipart Upload” request to the S3 API. This will return a unique upload ID, which you will use to identify the upload in subsequent requests.
  3. Upload the parts of the object. Each part must be at least 5MB in size, except for the last part. You can upload parts in any order, but they must be identified by their part number (1, 2, 3, etc.).
  4. After all the parts have been uploaded, send a “Complete Multipart Upload” request to the S3 API, providing the upload ID and the list of part numbers and ETags for all the parts.
  5. Once the multipart upload is complete, the object will be available for download from S3.

It is important to note that when performing a multipart upload, all parts must be uploaded before the complete upload request can be made.

The AWS CLI (Command Line Interface) provides the aws s3 command for interacting with S3. You can use the following AWS CLI commands to perform a multipart upload:

AWS CLI commands to perform a multipart upload

Split the file to be uploaded:

Use the split command in the command line interface (CLI) to split a file into multiple smaller files. The basic syntax of the command is:

split [options] [input-file] [prefix]

Where, ‘input-file’: The file you want to split.

 ‘prefix’: The prefix to use for the names of the split files. The names will be in the format prefixaa, prefixab, prefixac, etc.

Here are a few examples of how you can use the split command:

  1. To split a file into files of 1000 lines each:

split -l 1000 myfile.txt myfile_split_

  1. To split a file into files of 1MB each:

split -b 1m myfile.txt myfile_split_

  1. To split a file into files of a specific number of bytes:

split -b 512k myfile.txt myfile_split_

  1. To split a file by a specific pattern:

split -p ‘pattern’ myfile.txt myfile_split_

It’s important to note that the split command is a standard command and is available on most Unix-based systems, including Linux and macOS. If it’s not available on your system, you can install it through your system’s package manager.

Initiate the multipart upload:

aws s3api create-multipart-upload –bucket my-bucket –key my-object

This command will return a unique upload ID, which you will use in subsequent commands.

Upload the parts:

aws s3api upload-part –bucket my-bucket –key my-object –upload-id UPLOAD_ID –part-number PART_NUMBER –body PART_FILE

You can upload parts in any order, but they must be identified by their part number (1, 2, 3, etc.). PART_FILE is the path to the file you want to upload.

List the parts:

aws s3api list-parts –bucket my-bucket –key my-object –upload-id UPLOAD_ID

This command will return the list of parts, including the part number and ETag for each part.

Complete the multipart upload:

aws s3api complete-multipart-upload –bucket my-bucket –key my-object –upload-id UPLOAD_ID –multipart-upload ‘{“Parts”:[{“ETag”:”ETAG1″,”PartNumber”:1},{“ETag”:”ETAG2″,”PartNumber”:2}]}’

You need to provide the upload ID and the list of part numbers and ETags for all the parts.

Abort the multipart upload:

aws s3api abort-multipart-upload –bucket my-bucket –key my-object –upload-id UPLOAD_ID

This command will abort the multipart upload for the provided upload ID.

These are the basic steps, but you can also customize the process by using other options available in the AWS CLI commands.

Let’s discuss some advantages of using the multipart upload feature in AWS S3.

There are several advantages of using multipart upload in AWS S3:

  1. Resumable uploads: If an upload is interrupted, you can resume the upload from where it left off, rather than starting the upload over from the beginning.
  2. Improved throughput: By uploading parts of an object in parallel, multipart uploads can improve the overall upload speed, especially for large objects.
  3. Reduced costs: With multipart uploads, you can upload parts of an object in parallel, reducing the overall time required to upload an object. This can also reduce the costs associated with data transfer.
  4. Improved reliability: By uploading parts of an object in parallel, multipart uploads can improve the overall reliability of an upload, as the upload can continue even if one or more parts fail to upload.
  5. Better error handling: With multipart uploads, you can better handle errors during the upload process, as you can identify which parts have failed to upload and retry only those parts.
  6. Ability to upload large files: Multipart uploads allows for the ability to upload files larger than 5GB, which is the limit for a single upload.

Better control over the upload process: With multipart uploads, you have more control over the upload process, as you can pause, resume, and even cancel the upload at any point.

AWS S3 has certain limits when it comes to multipart uploads:

  • Part size: Each part must be at least 5MB in size, except for the last part. The last part can be any size greater than 0B.
  • The total number of parts: A maximum of 10,000 parts can be uploaded for a single object.
  • Object size: The total size of all parts combined cannot exceed 5 TB.
  • Minimum upload size: The minimum object size for a multipart upload is 5MB.
  • Concurrent uploads: There is a limit to the maximum number of concurrent multipart uploads per account that can be initiated for a bucket. By default, this limit is 100, but it can be increased by contacting AWS support.
  • Upload expiration: An initiated multipart upload will expire after 7 days if it has not been completed or any of its parts have not been uploaded.
  • Retries: S3 will attempt to retry failed upload parts automatically. But there are limits to the number of retries and the time frame during which retries will occur.
  • Upload Part Copy limit: A maximum of 5,000 parts can be copied at a time in a single upload-part-copy request

It’s important to keep in mind that these limits are subject to change and are also dependent on the S3 storage class you are using. Also, AWS reserves the right to change these limits at any time, so it’s always a good idea to check the current limits before starting a multipart upload.

Configure a lifecycle policy for an Amazon S3 bucket to automatically abort incomplete multipart uploads

You can configure a lifecycle policy for an Amazon S3 bucket to automatically abort incomplete multipart uploads. The process is as follows:

  1. Open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. Choose the bucket for which you want to configure a lifecycle policy.
  3. Click on the “Management” tab, and then choose “Lifecycle.”
  4. Click the “Add rule” button.
  5. In the “Transition” section, choose “Transition incomplete multipart uploads”
  6. In the “Days after initiation” field, enter the number of days after which you want S3 to transition incomplete multipart uploads to the “abort” state.
  7. Click the “Save” button.

You can also use the AWS CLI to create a lifecycle policy that aborts incomplete multipart uploads.

aws s3api put-bucket-lifecycle-configuration –bucket my-bucket –lifecycle-configuration 

‘{

 “Rules”: [

  { “Status”: “Enabled”, 

   “AbortIncompleteMultipartUpload”:

       “DaysAfterInitiation”: 7

  }

 }

    ]

}’

This command will configure a lifecycle policy that aborts all incomplete multipart uploads that have been initiated more than 7 days ago.

It is important to keep in mind that once a multipart upload is marked as “abort”, it can’t be resumed or completed. So, it’s always a good idea to check the upload status before the multipart upload expires and also keep track of the expiration date of the upload.

Multipart upload API and permissions

When performing a multipart upload in Amazon S3, you must have the proper permissions to use the S3 API.

To initiate a multipart upload, you will need the s3:PutObject permission. This permission allows you to upload a new object to a bucket.

To upload a part, you will need the s3:UploadPart permission. This permission allows you to upload a part of an object that was created with a multipart upload.

To complete a multipart upload, you will need the s3:CompleteMultipartUpload permission. This permission allows you to complete a multipart upload and finalize the object.

To list the parts of a multipart upload, you will need the s3:ListMultipartUploadParts permission. This permission allows you to list the parts of a multipart upload.

In order to abort a multipart upload, you will need the s3:AbortMultipartUpload permission. 

This permission allows you to abort a multipart upload and delete all previously uploaded parts.

You can grant permissions to an IAM user or role, or an S3 bucket policy. It’s important to note that these permissions are not exclusive to multipart uploads, they also apply to simple PUT object operations.

It’s also important to be aware that the permissions are not only limited to the API calls but also to the location (bucket, prefix) where you want to perform the operations. It’s always a good practice to use IAM policies that are least privileged, meaning that you should only grant the permissions that are needed to perform the specific task.

Conclusion

In this article, we learned how to perform multi-part uploads in AWS S3 using AWS CLI. We also learned the advantages of this feature, configuring lifecycle policy and permissions.

Check out this post to learn about Amazon VPC to Amazon VPC Connectivity Options.

Leave a Comment

Your email address will not be published. Required fields are marked *