#aws #vpc
Amazon VPC Flow Logs are a powerful tool for monitoring and analyzing network traffic in your AWS environment. However, they can also lead to significant costs if not managed properly. Here are some strategies to help you optimize your VPC Flow Logs costs.
The three main components of VPC Flow Logs that contribute to costs are:
Data Ingestion and Storage go hand in hand so we will discuss them together. Many companies opt for CloudWatch Logs for VPC Flow Logs because this makes them feel more accessible - you can browse them line by line.
However, the reality is that VPC Flow Logs are often not very useful without performing queries and analysis. For example searching for specific IP addresses and ports, or filtering by protocol.
Let's look at the costs of storing VPC Flow Logs in CloudWatch Logs versus Amazon S3 (using the S3 Standard storage class):
Service | Cost per GB Ingested | Cost per GB Stored (pm) |
---|---|---|
CloudWatch Logs | $0.50 | $0.03 |
Amazon S3 (Standard) | $0.25 | $0.023 |
So for 10TB of VPC Flow Logs, the costs would be:
Service | Ingestion Cost | Storage Cost (pm) | Total Cost (first month) |
---|---|---|---|
CloudWatch Logs | $5,120 | $307 | $5,427 |
Amazon S3 (Standard) | $2,560 | $235 | $2,795 |
So on a basic level, you can see that storing VPC Flow Logs in Amazon S3 is going to be cheaper on both ingestion and storage costs if you are keeping your VPC flow logs for a month and you are under 10TB.
This typically holds true for the first 50TB of data stored for a single month. After 50TB the ingestion costs are equal and only the storage costs differ by roughly 23% (S3 still being cheaper) but is further enhanced with S3 intelligent tiering.
If your VPC Flow Logs are just there for a "what if" scenario or for very infrequent investigations then you can set the S3 storage class straight to Glacier Instant Retrieval.
Here are two example comparisons with 10TB and 100TB of VPC Flow Logs with the different tiering options in us-east-1. Please note pricing is subject to change.
10TB of VPC Flow Logs:
Service | Ingestion Cost | Storage Cost (pm) | Total Cost (first month) | Total Cost (after 1 year) |
---|---|---|---|---|
CloudWatch Logs | $5,120 | $307 | $5,427 | $8,806 |
CloudWatch Logs (Infrequent Access) | $2,560 | $307 | $2,867 | $6,246 |
Amazon S3 (Standard) | $2,560 | $235 | $2,795 | $5,386 |
Amazon S3 (Infrequent Access) | $2,560 | $128 | $2,688 | $4,096 |
100TB of VPC Flow Logs:
Service | Ingestion Cost | Storage Cost (pm) | Total Cost (first month) | Total Cost (after 1 year) |
---|---|---|---|---|
CloudWatch Logs | $14,848 | $3,072 | $17,920 | $51,712 |
CloudWatch Logs (Infrequent Access) | $9,728 | $3,072 | $12,800 | $46,592 |
Amazon S3 (Standard) | $9,728 | $2,304 | $12,032 | $37,376 |
Amazon S3 (Infrequent Access) | $9,728 | $1,280 | $11,008 | $25,088 |
Infrequent Access is preferred over Intelligent Tiering for VPC Flow Logs as particularly for the first month, the Intelligent Tiering costs are higher.
It is worth considering how different options affect your ability to access and query the data in the VPC Flow Logs.
It is rarely useful to simply browse through VPC Flow Logs line by line, and you are more likely to find yourself using tools or queries to analyze and look for useful information.
In CloudWatch you would typically use the CloudWatch Logs Insights and in S3 you would typically use Amazon Athena to query the data.
We will cover the finer details of using Athena as opposed to CloudWatch Logs for CUR file queries in a dedicated post.
But, they key points to note in the context of usability and costs are:
One of the key factors in managing VPC Flow Logs costs is the retention policy. By default AWS retains VPC Flow Logs indefinitely, which can lead to high storage costs over time. It is rarely useful to keep VPC Flow Logs for a long time unless you have specific compliance or auditing requirements.
Even then, there are usually better ways to retain only the relevant logs.
For more information on retention policies for both CloudWatch Logs and S3:
Using Parquet format should reduce your storage costs by roughly 20% compared to the standard JSON format used by VPC Flow Logs - this is however offset by a one off Parquet processing fee on initial ingestion (roughly 14%), so will have no neglible benefit for short lived logs (less than 1 month).
As it is a columnar format, Parquet also reduces the amount of data scanned when using Athena to query the data, which can further reduce costs.
Veering off the default of 10 minute aggregation will increase costs so should be left as default without a specific reason to have more granularity.
If you are using Amazon Macie to scan your S3 buckets or VPC Flow Logs, make sure to exclude VPC Flow Logs to avoid unnecessary costs.
If you need real-time troubleshooting capabilities for a particular issue, you may consider temporarily creating a CloudWatch Logs group for VPC Flow Logs. This allows you to tail and scan in real-time, but be aware that this will incur additional costs.
In summary, to optimize your VPC Flow Logs costs in AWS, in most cases you will benefit by using Amazon S3 with Parquet enabled.
This can be nuanced in certain use cases, but for the majority of use case, this is likely to be the optimal setup.
Join our newsletter for Cost Optimization tips and tricks
By subscribing you agree to our Privacy Policy and provide consent to receive updates from our company.