Introduction on Sizing and Architecting for Object Storage
In this blog series, I’m going to talk about plenty of things around the sizing and architecting for using Object Storage with Veeam Backup & Replication. Each part of this blog series covers different topics and things.
The first part of Sizing and Architecting for Object Storage with Veeam covers the following topics:
- Calculating your real block size on disk, which helps you with sizing object storage for Capacity or Performance Tier in Veeam
- General Best Practices when using Veeam & Object Storage
How to get the REAL block size on disk (and calculate the average block size)
After explaining this topic in various user group sessions or presentations, it’s finally time to make this public. Public in the case, explaining how to calculate the block size, why we need it and what tools to use.
Firstly, let’s start with the famous block sizes we can select within our backup jobs. The settings vary from 8 MB, 4 MB and 1 MB. Keep in mind, that the 8 MB block size will only be visible when you add a registry key to the VBR server. Furthermore, 8 MB is not recommended, but still here is the information for the registry key.
Registry Key Type: REG_DWORD
Key Name: UIShowLegacyBlockSize
Key Value: 1
Here are some general points regarding the block sizes:
- The higher the block size, the larger the incremental backup sizes (especially with 8MB)
- Still, all these options are BEFORE deduplication and compression takes place!
- Depending on deduplication & compression, the object size on disk / object can be very different !
- Reducing the quantity of blocks (higher block size), reduces the API calls
Why do I need to get the real block size and calculate an average block size ?
Answering this question is very simple but still very complex: METADATA!
The block sizes heavily influence how metadata operations are working on an object storage system (especially on-premises) and therefore are needed to do a proper sizing. If you know what average object size you are expecting based on your workload / your customer’s workload, the better you can provide a sizing for the object storage system.
In addition, most vendors need / appreciate an average kind of block size / object size to size for their object storage solution. They are going to size / nodes / clusters / Cassandra (or other) databases around this piece of information.
So how can we collect all the values we require for getting the real block sizes on disk and calculate the average object size. Thankfully, Matthias Mehrtens a german Veeam SA, created a wonderful script to gather all the restore point statistics of your backup jobs. After collecting all those restore point statistics, we can use the collected data to calculate our real block size and then get an average block size based upon all restore points.
Here is the link to the script on GitHub: Get-RPStatistics
This script retrieves a summary of all existing backup restore points. Furthermore, this script also displays compression and deduplication factors and the sizes which we need to calculate back our blocks.
Let’s do the maths on how we properly calculate the average block size
- Convert the Read/Changed Size reported by the script into KB
- Backup File Size = Read/Changed Size * Dedupe Factor * Compression Factor
- Amount of Read Blocks = Read / Changed Size divided by the block size of your backup job settings
- Amount of Written Blocks = Amount of Read Blocks divided by the dedupe factor (not necessary here but still interesting š)
- REAL Object Size = Backup File Size divided by the amount of Read Blocks
Create the mean value of all calculated REAL Object Sizes
When you ran the script and understood the calculation path, there is obviously only one tool to process the data further: Excel š
This small excerpt from a backup chain with Veeam shows you the following:
- You can see that the real average block size is not 4 MB like how it’s set in the backup job, as it is always before deduplication and compression
- The rule of thumb with roughly 50 % data reduction does not apply in this example as well. We have a block size of 2.658 MB which is 66.45 % of data reduction (good in this case)
- Depending on the average block/object size, the object storage vendor has to calculate its sizing
- Some object storage vendors do not have those problems, as they donāt care about block sizes or metadata operations. š (but the most do ā thatās why Veeam is writing KB articles and BP guides to inform about that)
You can find the Excel file I used to conveniently calculate the block sizes on my GitHub repository here. Feel free to use it š
Conclusion around calculating the average object size
As I mentioned it here and there, it’s good to know your average object size. This knowledge helps you to know how your workload behaves in terms of deduplication and compression, which is done by Veeam. You gain more control over your sizing, as some object storage vendors out there need those values.
Lastly, make sure to align with the best practices offered by the object storage vendor. This helps you gain a proper sized object storage solution.
General Best Practices when Sizing and Architecting for Object Storage
After calculating block sizes, let’s hop on some general best practices and things to consider when using Object Storage with Veeam. The collection here applies to Cloud, On-Premises, Capacity Tier and Performance Tier and should help you in avoiding specific pitfalls. Feel free to comment on this post, so I can adjust this list with more insights and your quote / credits. š
Let’s start with this: āVeeam backups put a high load on Object storage, so a proper design is requiredā (Source: Veeam Best Practice Guide – Object Repository)
- Size of the bucket: Some Object Storage vendors require holding a specific limit in terms of how many TBs per bucket are stored. This is related to the architecture of how that object storage solution functions as it has to deal with a high amount of stored objects and thus metadata, amount of API calls on that bucket and the throughput in general.
- Amount of Objects per bucket:Ā Some object storage vendors require setting a limit in regard to how many objects per bucket are stored and are sufficient to keep the performance and don’t overload the system.
- Amount of VM/s per bucket:Ā Furthermore, some vendors tell you as a rule of thumb to not store as many as X VM’s per S3 bucket to reduce all things stated here. Metadata operations, API calls throughput.
- Multiple Object Storage buckets: For object storage systems where you are not confident if it can hold the load and metadata operations, consider multiple smaller buckets instead of a single bigger one. In addition, this is also very helpful when leveraging the public cloud as there are limitations per bucket which you can overcome by leveraging the SOBR mechanisms.
- NEVER CONFIGURE TIERING OR LIFECYCLE RULES:Ā Do not configure these, as they are simply unsupported and can lead to various difficult and weird problems.
- Think twice about your block sizes: Increasing block size is a tradeoff of lowering API calls. Meaning less objects to look at on rescan and in regard to block generation. However, this raises storage consumption so think twice when you select 4 MB or 1 MB block sizes. There are object storage systems out there who can handle 1 MB block sizes like a charm! š
- GFS Points in a Direct to Object Storage scenario: If you are using Immutability, try to use GFS points as well, as they are now considered a ātrue member of the backup chainā.Ā The majority of the blocks will be put into the longest possible GFS restore point, which results in lower API calls and due to compression and deduplication done by Veeam this will most likely āzero-out.ā
- Immutability & Block Generation: Calculate the overhead when using Immutability on S3 / object storage, as the block generations adds + 10 days to the retention. You cannot simply lift and shift a block ReFS / XFS repository to object storage, as you will need to consider more space with the object storage solution. Generally, 30 days is being added up with AWS S3 and IBM Cloud Object Storage. For all other types of object storage solution, it is 10 days. Make sure to read this article and understand what the block generation does to your retention.
- DO NOT DELETE MANUALLY OR ADJUST FROM AN OBJECT STORAGE BUCKET MANAGED BY VEEAM: I think this is pretty clear here. Veeam is going to take care of deleting old objects based on the retention in your backup and backup copy jobs. Deleting objects manually could lead to potential data loss or weird problems.
- SECURITY CONSIDERATIONS:Ā Consider encryption when pointing to the public cloud. Additional permissions based on IAM rules or STS are also a good thing to be considered when using AWS S3 or a compatible object storage system. There is a brilliant white paper out there, discussing āSecure Direct Mode”.Ā
- Say Goodbye to your Synthetic Full Backups:Ā There is no synthetic full backup option with a Direct to Object (D2O) backup job any more. However, synthetic full backups generally were only incremental backups which have been synthesized to a āfull backupā and this is no major loss here.
Conclusion
I really hope this collection of tips and general best practices help you with Sizing and Architecting for Object Storage in a Veeam environment. There are more articles to come in this series, and I’m going to list them here.
Check out all Veeam related posts on my blog here:Ā Veeam Blog Posts
Great post!