QubOps

Day 7 - S3 Object Archiving

AWS Cost Optimisation Advent Calendar 2024

5 min read time

#aws #s3

Welcome to day 7 of the unofficial AWS Cost Optimisation Advent Calendar 2024, where every day we will be sharing new tips or tricks to help you optimise your cloud costs before Christmas 2024.

For many organisations deleting data is not an option, but that doesn't mean you should simply leave objects in an S3 bucket indefintely. Despite S3's relatively low pricing that is far from the most cost effective option.

In this post we will look at how you can archive objects in S3 to reduce costs.

S3 Object Archiving

Archiving objects in S3 buckets can result in up to 80% cost savings compared to storing the objects in the standard storage class.

The main challenge to take advantage of these cost savings is to identify which Storage Class is the most suitable for the data in a bucket.

By default your data is sitting in the most expensive storage class so let's take a look at the alternatives.

It is important to note that if you need constant access to your data and for it to be available immediately on-demand, then archiving is not the right option.

The rest of the data can mostly be split into two categories:

  1. Data that is rarely accessed - Examples can be log files, backups or historical data that is not needed for day-to-day operations.
  2. Data that is accessed infrequently - Examples can be data that is accessed once a month or once a year such as reports or data for batch jobs.

Rarely Accessed Data

For data that is rarely accessed, you can use the S3 Glacier storage classes. There are restrictions on how often you can access the data and how quickly you can retrieve it, but the cost savings are significant. Each glacier storage class has its own nuances so you should read the documentation carefully. In summary:

  • Instant Retrieval - Your data can be retrieved in milliseconds but you can only access the data once per quarter.
  • Flexible Retrieval - Even cheaper than instant retrieval but you can only access it 1-2 times per year and it can take 1-12 hours to retrieve the data.
  • Deep Archive - The cheapest option but you can only access the data once per year and it can take 12 hours to retrieve the data.

Items like recent database backups would suit instant retrieval where you may need to access the data quickly but not often. Whereas old database backups that would just be used for investigation could benefit from Flexible Retrieval or even Deep Archive.

The same goes for log files. Particularly if you are only keeping them in the event an investigation comes up that you would need to go back and look at them.

Infrequently Accessed Data

For data that is accessed infrequently, you can use the S3 Standard-IA or One Zone-IA storage classes. These classes are cheaper than the standard storage and still offer fast access to the data.

The main difference between the two is that S3 Standard-IA stores data in multiple Availability Zones whereas S3 One Zone-IA stores data in a single zone.

Examples of when you might use One Zone-IA are when you have data that you could reproduce or you have backed up elsewhere so you don't need the extra redundancy provided by Standard-IA.

Intelligent Tiering

Another option is to use the S3 Intelligent-Tiering storage class. This class is designed to automatically move objects between the frequent access and infrequent access tiers based on your access patterns. This is the easiest option to gain a potential immediate cost saving.

The downside is that it can take a while for AWS to understand your access patterns so you may still be paying for storage at standard rates whilst AWS goes through the process of understanding your data.

In addition, there are costs associated with moving data between the tiers so in some workloads this can add up when objects move between tiers frequently and possibly negate some of the cost savings.

Important Notes

As there is a cost associated with moving data between storage classes, you should consider how often you are likely to use to switch classes.

You can also upload data straight to Glacier bypassing the standard storage class. However, you should be aware of the restrictions on accessing the data in a timely manner.

It is also important to take into account any regulatory requirements our business might have in terms of data retention and access.

Conclusion

As AWS defaults to the most expensive storage class it is important to take a bit of time to understand the data in your buckets and whether there is a noticeable cost saving to be made.

A small bit of configuration can result in significant cost savings over time.

That said, cost is of course not the only factor to consider. Therefore you should carefully consider the durability and availability requirements of each option.

That's it for today, see you tomorrow for the next tip!

To be one of the first to know when the next edition is published please follow us on LinkedIn, X or subscribe to the RSS feed.

Join our newsletter for Cost Optimization tips and tricks

By subscribing you agree to our Privacy Policy and provide consent to receive updates from our company.