#aws #rds
Welcome to day 5 of the unofficial AWS Cost Optimisation Advent Calendar 2024, where every day we will be sharing new tips or tricks to help you optimise your cloud costs before Christmas 2024.
Today we are talking about rightsizing RDS.
This topic has many layers of complexity that we're excited to dive into with you. There are far too many variables to cover in a single blog post, so for the advent we are focusing on the quick wins, and the most common improvements.
For RDS we already covered switching to Graviton instances which is the quickest cost-saving tip for RDS so today we are going to look at rightsizing.
Simply put, rightsizing is ensuring your databases are not too big or too small for the workload they are running. This can be more challenging for databases than for application servers as they are slower to get online. Zero downtime is typically a requirement as it is usually the most mission critical part of your application, and the cost of getting it wrong can be very high.
Without metrics you can only rely on feel and feedback from users so getting some insight into how your database is performing is essential. AWS provides CloudWatch metrics for RDS which can be used to monitor CPU, memory, disk and network usage.
One very important monitoring tool is the slow query log. This can be enabled in the RDS console and will log any queries that take longer than a certain threshold to complete. This can be a good indicator of where you might be able to make improvements to reduce the load on the database. We have encountered scenarios where a single slow query on an important part of an application (eg. the dashboard) has resulted in oversizing the database by an order of magnitude at a signicant long term cost to ensure that query succeeds.
You will also want to make use of Performance Insights to get more detailed information on what queries are running and how they are performing.
The key things you want to try and understand are the peak usage of your database and the average usage. In addition, you will want to get an understanding of the data growth rate and how that might impact your database in the future.
Some common scenarios you might see are:
If the CPU/RAM is consistently low or high then you can probably make a good guess at what size instance you need. You can compare the available instances at https://instances.vantage.sh and see if you can find one that is better suited for your workload.
If you have intermittent spikes then you might want to consider using a burstable instance type like t4g. These instances are cheaper than the equivalent fixed performance instances but can burst to higher performance levels when needed.
It is also good to ensure you have a solid understanding of what is causing the spikes. As mentioned earlier sometimes it is just a few queries which need optimisation to remove those spikes and allow you to go to a smaller database instance. If you run replicasets you can also consider offloading some of the read queries to the replicas to reduce the load on the master if you are not doing this already. If the read queries are what is causing the spikes and you have a replicaset you can also consider using a larger instance for the replicas and a smaller one for the master which might sound strange but can work well in many read heavy environments.
If your spikes are just on certain days of the month then you might want to consider using a larger instance for those days only by setting up a timed resize in maintenance windows on a monthly basis. These can typically be done with zero downtime if you are using a replicaset.
Another consideration is the switch to Aurora Serverless. Aurora Serverless allows you to only pay for the resources you use which can be a good fit for workloads with unpredictable usage patterns or various spikes as it can scale up and down fairly quickly to help manage your costs.
Aurora (both standard and Serverless) handles disk growth dynamically making it a non-issue in most cases. For other RDS instances you will need to monitor the disk usage and ensure you have enough space to handle the growth.
A good rule of thumb is to stay between the 60-70% capacity range by scaling to 61% and setting an alert on 70% to resize the disk giving yourself until 80% to action it where it starts to become urgent. If your database is relatively small, and the cost of storage is not significant, you may want to opt for a threshold closer 50%-70% to avoid the maintenance burden of frequent resizing.
Amazon claims that Aurora provides up to five times the performance of MySQL and three times the performance of PostgreSQL, which could reduce the amount of database instances needed and thereby offset the added cost.
That said, Amazon Aurora can sometimes be more expensive than standard RDS for MySQL/Postgres. The cost of Aurora depends on the workload and whether the features of Aurora provide enough advantages to justify the potential added cost.
If a simple, small database system is needed without the need for Aurora's enhanced performance and features, then a standard RDS instance would likely be less expensive. Always consider your specific requirements and use case before deciding.
Rightsizing is one of the more complex tasks to tackle in cost optimisation as it takes additional, sometimes intimate, knowledge of the business and how the application has been written. We have only scratched the surface here on a few scenarios and some ideas of how to handle them. The key is to monitor, analyse and then act on the data you have collected. This is an iterative process and you will likely need to revisit this multiple times as your application grows and changes.
To be one of the first to know when the next edition is published please follow us on LinkedIn, X or subscribe to the RSS feed.
Join our newsletter for Cost Optimization tips and tricks
By subscribing you agree to our Privacy Policy and provide consent to receive updates from our company.