What is AWS S3?
AWS S3 stands for Simple Storage Service. It allows you to store, retrieve, access, and back up any amount of data at any time, from anywhere over the internet. S3 is a perfect storage solution for storing massive amounts of data, for example, audio and video files, large-scale photo storage, big data sets, etc. We can access S3 from the AWS management console, AWS Command Line Interface (CLI), and AWS Software Development Kit (SDK).
How to store data in AWS S3?
To store any data inside S3 we need to create a bucket inside an AWS account. An S3 bucket is a logical container in which data is stored, we need to create an S3 bucket inside AWS S3 to store any data. We can also create multiple buckets to organize our data. The bucket is identified from its name and its name should be globally unique across all regions and accounts. For example, if I create a bucket and name it ‘my-bucket’, no other AWS account can create an S3 bucket with a similar name. S3 bucket simply works as storage in which we can store unlimited data either directly inside the bucket or create a folder for easy management.
Folders are used for grouping and organizing files. Unlike a traditional file system, S3 does not use a hierarchy to organize its files. For the sake of organizational simplicity, AWS S3 console supports the folder concept as a means of grouping data. So you can have multiple files or folders inside a single bucket.
Anything that we store in S3 is termed as an object. An object can be a file, image, video, audio, etc. When we store data on our hard drive we call it a file but when we upload this file into S3 it will be termed as an object. A bucket in S3 is similar to the file directory in your hard drive.
Let’s look at a few features of objects. The maximum size of a single object that you can upload to S3 is 5 terabytes (TB). Please note that 5 TB is not the size of total data that you can upload, instead it is the maximum size of a single file that S3 allows you to upload but that does not mean that you cannot upload objects bigger than 5TB. if you have an object bigger than 5 TB then you have to split it into multiple parts and then use something called a multi-part upload to upload them.
d. S3 versioning
In S3 you can version objects to protect them from any unintended actions or even the accidental deletion of an object. This allows the previous retention of previous versions of an object.
e. Storage Class
AWS S3 offers different levels of storage tiers with varying costs based on how frequently you use the data. S3 also provides a lifecycle policies feature to allow the automatic transfer of objects between tiers, based on access patterns and conditions you define. S3 has the following main storage classes.
- S3 Intelligent-Tiering
- S3 Standard
- S3 Standard-Infrequent Access (S3 Standard-IA)
- S3 One Zone-Infrequent Access (S3 One Zone-IA)
- S3 Glacier Instant Retrieval
- S3 Glacier Flexible Retrieval (formerly S3 Glacier)
- Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive)
- S3 Outposts
f. Bucket Policy and User Policy
S3 has two access policy features for granting permissions to S3 resources. You can create permission to limit who can access or see your objects. The bucket policy is a resource-based policy used to grant permissions to your bucket and its objects. The IAM user policy feature is used to create and configure access policies for IAM users.
S3 Features: Availability, durability, and data replication
S3 offers three important features namely availability, durability, and data replication. Let’s discuss each one by one.
Availability means system uptime. It simply means for how long the system is available. In other words, we can say it is the duration of time for which the system is operational and able to deliver data upon request.
Availability is measured in percentage and termed as a Service Level Agreement (SLA). It is a promise that service providers have with their customers for the availability of their services or systems. Upon fulfilling their promise and customers getting everything within the promised time leads to SLA fulfillment. For example, supermarkets say they are open 24/7 which means it’s available round the clock for you. If they keep their promise and are open throughout the day, week, month, and year, then we can say that their availability is 100% and they fulfill their SLA.
It refers to long-term data protection. In simple words, it means how well is your data protected from any possible loss or corruption. In addition to availability, your data should remain intact in its original form.
Durability is also measured in percentage. Let’s understand with an example, suppose you stored a 1000 kg potato in cold storage for 6 months. When you went back, it was only 800 kg. It means either 200 kg of potatoes were rotten or eaten by rats. That means the cold storage service was not durable. So next time you will make sure to store potatoes in well-equipped food storage to avoid loss. This is called durability.
It’s also important to know how data gets stored in AWS S3. It is based on a concept called data replication.
c. S3 data replication:
AWS S3 offers a useful feature of data replication. Upon uploading your data in S3 in one availability zone (AZ), AWS makes additional copies of your data and replicates it in other availability zones within the same region. Now you wonder why AWS does that.
The answer is that AWS promises eleven nines of data durability. It means AWS offers a 99.999999999% data durability guarantee. AWS does not ensure 100% data durability but they say that there is a 99.9999999995% chance that your data won’t be lost. So how does AWS ensure that our data will be safe?
To ensure this, AWS maintains copies of our data in all the AZs in the same region. In case of any data loss, they recover the data from other AZs to the last one. So that’s how our data is always present in all the AZs of one region and that’s the reason AWS promises 11 nines of durability. Thus, making it one of the significant benefits of using amazon S3. Data replication holds true for any subsequent updates made after the data is uploaded. Data modified in one AZ is immediately replicated in the other AZs by AWS. It works in the same manner, in case of a delete action performed on the uploaded data.
Amazon S3 Access
There are 3 ways to access Amazon S3
a. AWS Management Console
It is a web-based user interface for managing AWS resources and Amazon S3. Upon creating your own AWS account, you can choose S3 from the AWS management console and access S3.
b. AWS Command Line Interface
Install AWS CLI upon creating your AWS account. Use this CLI to give commands, and create scripts to perform S3 tasks. AWS CLI is supported on Windows, Linux, and macOS.
c. AWS Software Development Kits (SDKs)
AWS SDKs provide an easy way to create programmatic access to S3 and AWS. It is made up of libraries and sample code for several programming languages and platforms like Python, Ruby, .NET, and Java. To access S3 via AWS SDK, you will send a request to amazon S3 using the AWS SDK libraries.
D. Amazon S3 REST API
S3 and AWS can be programmatically accessed by using Amazon S3 REST API. The REST API is an HTTP interface to S3.
Does Amazon S3 incur costs?
Yes, Amazon S3 is chargeable. In traditional storage systems, we have to purchase a predetermined amount of storage capacity. We pay for the whole capacity even though we utilize lesser capacity. However, with S3 you pay only for the amount of storage actually used. This feature gives you the flexibility to grow the business with a pay-as-you-go model.
PCI DSS compliance
Amazon S3 supports the storage, processing, and transmission of credit card data by a service provider or Merchant and has been verified as being compliant with the Payment Card Industry (PCI) Data Security Standard (DSS).
All right, folks, that’s all for this blog. Do explore other articles on S3 on our blog. Till then, happy learning.