Amazon Simple Storage Service ( S3 )

Learn via video courses
Topics Covered

Overview

There is an old saying, "Data is the new oil." Before the twenty-first century, we stored the majority of data in physical form. For instance, storing paper documents or files in a storage room. In the 21st century, the modernization of technology provides the solution to this manual work.

For example, we started storing the file in physical form along with a digital form for various formats in the extensions PDF, XLS, JPEG, and ODS.If the physically stored files are missed or burnt out by accident, we can recreate those documents by using the digital form.

Hard disk->>Pendrive->>Online data storage services such as Dropbox, Google Drive, etc.

Over time, the necessity for storing data and usage of data has increased exponentially, and the world has never been the same. In the year 2006, AWS launched the S3 service, which is used to store and retrieve data with unlimited storage capacity. AWS revolutionized data storage by launching Amazon S3 services, and it's a hit.

What is Amazon S3 ?

S3 stands for Simple Storage Service.

Amazon S3 is a simple storage service provided by AWS. We can store all types of data files in the S3 bucket.

Types of data files :

  • Document : PDF, XLS , ODS, etc.
  • Image : JPEG, PNG, etc.
  • Audio : MP3, wav, etc.
  • Video : MP4, MOV, AVI, etc.

All the other available file types are also supported.

In the history of the cloud computing industry, Amazon S3 is the first service to provide an infinite storage solution with scalable performance to its customers.

Amazon S3 First launch

In addition to storage, S3 also provides a lot of other options to reduce storage costs with better performance.

AWS S3 Terminology

There are many terminologies associated with Amazon S3. We are going to discuss the major six terminologies.

  1. Bucket
  2. Object
  3. Key
  4. Versioning
  5. ACL
  6. Bucket Policy

AWS S3 Terminology reference

Bucket :

According to AWS docs, A bucket is a container for objects.

We should create a bucket with the proper namespace to store the objects. Bucket resides in a region. Once the bucket is created, we cannot change the bucket’s region or name.

Bucket names should follow certain naming conventions as mentioned in the below links.

We should follow the below-naming convention while providing a name for the bucket.

Allowed length from 3 to 63 characters :

  • Allowed characters : lowercase letters, numbers , dots (.), and hyphens(-).
  • Starting character : letter or number.
  • Ending character : letter or number.

For improved compatibility, AWS advises adding a few extra criteria in addition to the ones listed above.

Kindly refer to the below link for AWS suggestions on a naming convention.

Amazon S3 bucket naming convention


Object :

In Amazon S3, all data files are considered objects irrespective of their extension or format. An object consists of the file data and metadata. We can upload a maximum of a 5TB file as a single object in an S3 bucket. We can upload, download, and open the objects in the bucket.


Key :

A key is a unique identifier created by AWS whenever we upload an object. Each object will have a unique object key within a bucket. using these object keys, we can use the object path as a web service endpoint.


Versioning :

Imagine if we uploaded an object to an S3 bucket and someone accidentally deleted the data or object. How will we recover the data ?

Here comes versioning to the rescue. Versioning is one of the options which can be used to store multiple variants of a single object in a bucket.

Using versioning, we can protect our data from accidental deletion and overwrite the same object. You can also use S3 Versioning to keep multiple variants of an object in the same bucket.

S3 Versioning is disabled on buckets by default, and we must explicitly enable it.


Access Control List :

Our buckets and objects can be public or private.

What if some user needs to access the bucket or object ?

Using ACL, we allow authorized users of the same AWS account or another AWS account to read and write to the objects in a bucket.

AWS suggests using the bucket policy instead of the ACL unless you have a use case that needs control over each object individually.


Bucket Policy :

The bucket policy is the resource-based IAM policy. We can create a bucket policy by using the Policy Creator or JSON declaration. Bucket policies are limited to 20 KB in size.

The bucket policy decides the following :

  • Who can access the buckets ?
  • Who can read or write the objects contained within the bucket ?

A sample case

Consider the following entity is available in your AWS account :

  • IAM User-1 : Antony

  • IAM User-2 : Basha

Bucket Name : devops-project-details

Object Name : Solution-document.pdf

We can create a bucket policy by providing read-only access to the IAM User-1. "Anthony" should only read the objects.

  • If IAM User-1 tries to download the file, he will get an access-denied error. Because S3's Object download is a write operation,

  • If IAM User-2 tries to read the file, he will get an access-denied error.

According to the bucket policy :

  • IAM user 1 only has read permission.

  • IAM user 2 doesn’t have any permissions over that bucket.

Features of S3

  • There are many features associated with S3 services.

  • Numerous AWS services can be integrated with S3 as per their use case

FeaturesDescriptionAWS Service
Storage MonitoringManage and Monitor the objects in the s3 bucketAWS S3 Batch operation , S3 Replication
Storage AnalyticsAnalyze the stored object in a bucketAWS Storage lens
Storage QueryQuery the objects in a bucketAmazon Athena , Amazon Redshift Spectrum
Data transferData Migration and transferAWS Storage Gateway , AWS Data sync, AWS Snowfamily
  • In addition to the above features, Amazon S3 also has :

    • Encryption by default
    • Static website hosting
    • Life cycle policy

How Amazon S3 Works ?

What happens when we upload the objects/files/folders with files to the S3 bucket ?

  • First, AWS will check the permission to determine whether the user is authorized to act on that bucket or not.

  • Second, if we download the object, AWS will let us download the object in the browser if no restricted policy is enforced.

  • There are some internal tools used by Amazon to monitor the S3 upload/download, scaling performance, and error control which helps customers achieve better performance with low-cost storage by using Amazon S3.

AWS reinvent S3 deep dive

S3 Storage Classes

There are 6 storage classes available in S3.

Frequently accessed objects :

  • Standard Storage Class :

    • This is a general-purpose storage class suitable for frequently accessed objects.

    • Objects in this storage class will automatically be copied and stored in all the availability zones within a region.

Infrequently accessed objects :

  • Standard IA :

    • This storage class is suitable for infrequently accessed objects.

    • When compared to the standard, this storage class will cost less.

    • Objects in this storage class will automatically be copied and stored in all the availability zones within a region.

Column 1S3 StandardS3 intelligent-TieringS3 Standard-IA
Designed for durability99.999999999% (11 9’s)99.999999999% (11 9’s)99.999999999% (11 9’s)
Designed for availability99.99%99.99%99.99%
Availability SLA99.9%99%99.9%
Availability Zones>=3>=3>=3
Minimum storage duration chargeN/A30 Days30 Days
Retrieval feeN/AN/APer GB retrieved
First-byte latencymillisecondsmillisecondsmilliseconds
Lifecycle transitionsyesyesYes
  • One Zone IA :

    • This storage class is similar to standard IA, but instead of storing the data across availability zones, objects will be stored only in a single availability zone.

    • The one-zone IA is less costly when compared to the standard IA.

      Storage classS3 One Zone IA
      Designed for durability99.999999999% (11 9’s)
      Designed for availability99.5%
      Availability SLA99%
      Availability Zones1
      Minimum storage duration charge30 Days
      Retrieval feeper GB retrieved
      First-byte latencymilliseconds
      Lifecycle transitionsyes
  • S3 Intelligent tiering :

    • Using machine learning, S3 Intelligent Tiering will move the object forward and back depending on accessibility.

    • Intelligent tiering in S3 will save you 20% on storage costs.

    • There is no additional charge or retrieval cost for using S3 intelligent tiering.

  • S3 Glacier :

    • This storage class provides three options :

      • S3 Glacier Instant Retrieval
      • S3 Glacier Flexible Retrieval
      • S3 Glacier Deep Archive

      This storage class is suitable for archiving files for a long period. at low cost when compared to other storage classes.

      But, in this storage class, we can’t retrieve the data as per our needs in real-time.

    • Here, Glacier has 3 types of retrieval methods.

    • Each method comes with a separate cost concerning retrieval time.

      • Expedited
      • Standard
      • Bulk
    Storage classS3 Glacier Instant RetrievalS3 Glacier Flexible RetrievalS3 Glacier Deep Archive
    Designed for durability99.999999999% (11 9’s)99.999999999% (11 9’s)99.999999999% (11 9’s)
    Designed for availability99.9%99.99%99.99%
    Availability SLA99%99%99.9%
    Availability Zones>=3>=3>=3
    Minimum storage duration charge90 days90 Days180 Days
    Retrieval feeper GB retrievedper GB retrievedPer GB retrieved
    First-byte latencymillisecondsminutes or hourshours
    Lifecycle transitionsyesyesYes

Protecting Your S3 Data

How to protect our files/objects in the S3 bucket ?

We have three options to implement the best security practices.

Bucket policy :
We can create a bucket policy to restrict access to certain IAM users/roles and IP addresses.

Resource-Based policy :
Using the AWS IAM role, we can establish a connection between computer resources such as Lambda or EC2 and S3. The data transfer will take place solely within the AWS backbone network.

IAM policy :
We can restrict IAM users to certain S3 buckets with limited permission by attaching an AWS Managed policy or custom policy.

AWS S3 Benefits

Flexible Data Management

Amazon S3 is having different storage class. Customers can choose the storage class as per their application or business needs. At an additional cost, Amazon S3 provides an inbuilt data replication option between regions and accounts.

Durability, Availability, and Scalability

All the objects available in S3 will have nearly 99.99% durability, availability, and scalability . This guarantees very minimal data loss.

Backup and Recovery

In S3, we can store objects in the Glacier storage class by archiving them for a long time. whenever we need our data or recover the backup data, we can retrieve it with the Recovery Point Objective and Recovery Time Objective. This helps to optimize cost and performance for backup and replication.

Data Migration and Data Transfer

There are three types of data migration that can happen in the cloud.

They are :

  • Hybrid Data Transfer
  • Online Data Transfer
  • Offline Data Transfer

Hybrid data transfer :
We connect our on-premise file storage to the AWS Storage gateway.The AWS storage gateway will transfer the data to Amazon S3.

Online Data Transfer :
AWS Data Sync moves data from on-premises file storage to Amazon S3.Transfer of 100 TBs of data in near real-time, so the time for migrating the data will be less.

Offline Data Transfer :
The AWS Snowball, or Snowmobile, is an exabyte-scale device. we can order the snowball or snowmobile device from AWS. The snowball or snowmobile will reach our location, then we can transfer the data to the snowball or snowmobile. Then, AWS moves the data from the snowball to S3.

Static Website Hosting

We can host a website using the Static Website option in the S3 bucket. we can also map the S3 bucket name to the DNS name in Route 53 to make our domain live. The S3 static website is scalable in nature. So we don’t need to worry about the website's load.

Amazon S3 Use Cases

Build a data lake :
We can build ML, AI, and HPC application insights using S3 as a storage source.

Backup and restore of critical data :
With S3 replication, we can achieve RPO and RTO as per the compliance requirement.

Data archiving at the lowest possible cost :
Using Amazon S3 Glacier, we can move data and object archives to lower-cost storage classes.

Utilize cloud-native applications :
We can configure S3 as a storage source for cloud-native web and mobile-based applications.

Note : Amazon S3 Suits all kinds of workloads or use cases.

Competitor Services

There are alternate storage solutions provided by other cloud service providers as well.

They are :

NoCSPStorage solutions
1AWSAmazon S3
2GCPGoogle Cloud storage
3AzureAzure Blob Storage
4OCIOracle cloud storage
5IBMIBM Cloud object storage

Conclusion

  • Amazon S3 is the best storage solution available in the cloud industry for startups, enterprise customers, or individual AWS accounts.

  • Using S3, customers can easily build cloud-native, scalable web and mobile applications.

  • S3 can be used in a wide range of domains, including application development, data science, machine learning, and artificial intelligence, among others.

  • S3 can be integrated with a variety of AWS services, including Lambda, Glue, and others.