Everything you need to know to architect and size for Object Storage with Veeam — Part 3

Everything you need to know to architect and size for Object Storage with Veeam — Part 3

In this blog series, I’m going to talk about plenty of things around the sizing and architecting for using Object Storage with Veeam Backup & Replication. Each part of this blog series covers different topics and things.

The third part of Sizing and Architecting for Object Storage with Veeam covers all the topics you need to know to architect, size, and design Object Storage for an on-premises deployment. This blog post is not intended to help size and architect for cloud object storage, but rather for on-premises deployments.

  • Sizing and Architecting Object Storage for On Premises deployment
  • Gateway Server, Backup Proxies and Data Movers, Physical or virtual Gateway Server, How Many Gateway Servers, Static or Automatic Gateway Server Selection
  • Object Storage Limits, Objects per Bucket, VMs per Bucket, TB per Bucket, Backup Job Settings (1 MB, 4 MB, 8 MB).
  • What are the caveats and advantages with bigger or smaller block sizes ?

Before we start into sizing, numbers and architecting, let’s first have a look what components are involved in terms of Veeam & Object Storage. This gives us an overview of all things involved in the process of backing up (and restoring) to an object storage system.

  • Veeam Backup & Replication Console
  • Object Storage Repository
  • Backup Proxy
  • Gateway Server
  • Veeam Data Mover
  • Veeam Mount Server
  • Veeam vPower NFS Service

How do backups work within Veeam ?

Basically, having a look at “how backups work” with Veeam, gives an optimal background information on what components are involved.

How Backup Works

As you can see, the first important step is at number 3, when the Veeam Backup Manager establishes a connection to both data movers. The source and the target data movers are the components initiating the data transfer from the source to the target backup repository.

However, when it comes to backing up directly to an on-premises object storage solution, the Veeam Data Mover cannot be installed persistent on that object storage system. In these D20 (Direct to Object) use-cases, we need to utilize a Gateway Server.

Veeam Gateway Server Architecture

A Gateway Server is an auxiliary backup infrastructure component that “bridges” the backup server and backup repository. In an D2O use case, the Gateway Server acts as the target Veeam Data Mover and hosts it. Veeam establishes a connection between the source Veeam Data Mover and the target Veeam Data Mover, whereby the data transfer takes place between the two Data Movers again. If you want to know more about the Veeam Gateway Server and its requirements and limitations, check out this helpcenter page here.

So let’s dive into the first component here, the Gateway Server

Veeam Gateway Server Sizing and Placement

One sentence about Veeam Gateway Server Sizing. Treat a Gateway server the same as a Backup Repository server when it comes to Compute Sizing. If you have a look at the Veeam Sizer, for example, you will also see the following:

Veeam calculator repository gateway

That means, that the original “repository” server which has local disk or direct-attached / SAN based storage gets replaced with the “Gateway” server in a Direct to S3 use case. The recommendation due to the best practice guide is to apply a 3:1 ratio against the proxy compute calculation based on the number of parallel tasks required to fulfil the backup window. Think twice as a gateway server could also be used for backup copy jobs and this would need additional CPU / Memory for those operations.

Simply think about replacing the compute part from the repository server and place it on the gateway server. It is important, that you size the gateway properly because it does all of the following things.

  • acting as a Veeam Datamover
  • setting object lock on the target
  • calculating the block generation
  • copying data from and to object storage
  • round-robin traffic if multiple gateway servers are used

Direct Mode or not ?

Whether to use the “Direct” mode or “by a gateway” server depends on your network architecture and your desired “span of control”. In “Direct” Mode, each source data mover will connect directly to the Object Storage over HTTPS. If using “by a gateway” only the gateway server needs to be authorized to connect to the object storage system which gives you mostly better control. Usually, when you are having several virtual machines as NBD / HotAdd Proxies, it makes sense to use the same machines to be a gateway server as well. So the traffic flow takes place on only one machine which is both, the proxy and gateway server. One significant thing about parallelism within a D2O sizing case is, that Veeam uses one gateway server to process all disks of a single VM when “per-machine” backups are used.

This means, to get more parallel throughput, make sure to have many gateway servers in place. Not only you will gain some sort of HA and round-robin functionality, but you will also gain more parallel throughput of your data.

Automatic or Static Gateway Selection ?

There is a very big section in the best practice guide on that specific topic which I 100% recommend reading.

On top of everything written in that section, I’m giving you my personal opinion based on the experience of the last projects working on D2O use cases.

I kind of stick to the “static gateway” and tend to not use the automatic gateway selection because of 2 very important (at least to me) reasons.

  • Complete control and knowledge of the whole network traffic and flow by explicitly selecting the gateway servers I want to have in place
  • As soon as several repositories or large-scale multi-site environments are in place, you cannot overcome the “static” selection because it’s crucial to deeply understand the network flow.

In my experience those 2 reasons are very understandable and almost everyone had the situation where a proxy / gateway server from a complete different country / network / site acted within a backup job somewhere else. This is avoidable by statically selecting the gateway because of the knowledge of the architecture you have.

Physical or Virtual Gateway Servers ?

You may think of all the obvious reasons to choose between virtual machines or physical servers as your Gateway server. Let me put them down here again, as this is very important to consider.

Virtual Machine (VM) Gateway Servers

Pros:

  • Flexible and Scalable plus easier management through virtualization pane (snapshots, cloning, templates and so on)
  • cost-effective by reducing hardware
  • highly available by leveraging HA mechanisms like vSphere HA/FT, DRS or Hyper-V Clustering

Cons:

  • Performance Limitations because of virtualization overhead
  • potential bottlenecks as a VM builds on “shared resources”
  • additional latency because of reads / writes happening through a hypervisor
  • full dependency on the virtualization stack which applies for compute, network and storage

Physical Machine Gateway Servers

Pros:

  • Best Performance because of direct access to storage and network interfaces without a virtualization layer / overhead
  • It’s a dedicated resource and can be combined with Direct SAN / NFS or Storage Snapshot Backup Proxy resources on a physical box
  • More resilient to host failures
  • optimal for large scale deployments as you get high throughput and low latency on a direct connect

Cons:

  • Higher costs because of the dedicated hardware
  • less flexible & scalable than a virtual machine
  • unless you have not various physical machine gateway servers in place, you are limiting your redundancy.

My take on it is always “it depends”. I tend to select virtual gateway servers whenever the customer is heavily invested in his virtual environment and prioritizes flexibility and cost efficiency paired with an easy management through the virtualization layer.

If peak performance, network throughput and direct storage access is critical to the customer, I always decide or influence on physical gateway servers, especially in large-scale environments with high backup and restore workloads.

How many Gateway Servers do I need

The amount of gateway servers you need depends on several factors

  • amount of total data to back up because this defines your backup window and your needed architecture
  • concurrency in general, meaning how many backup and restore jobs are running at the same time or simultaneously
  • available network bandwidth to saturate from the source and to the object storage solution
  • S3 object storage performance — many vendors have different limits on API calls, TB per bucket, number of objects per bucket. In this case, a highly parallel workload with multiple gateway servers, backup and restore jobs hammering those systems, could lead to problems.
  • The customers requirements on RTPO and the available backup window

All this information together mostly results in an initial deployment of a set of data movers and a careful control and analysis of the bottlenecks during backup & restore activities.

Think of a 4 hour backup window, which definitely needs a lot of CPUs and RAM and a very fast object storage solution. This 4 hour backup window probably cannot be satisfied with an 8 core 16 GB RAM virtual machine but rather with four bigger gateway servers. Remember again, sizing a gateway server for a D2O use case on premises requires the same compute ressources as if you would size for a repository server.

In conclusion and in general I would always recommend at least 2 gateway servers to have a minimum of redundancy in place and scale in pairs of 2 up to the desired amount which satisfies all the points outline above.

Proxy Servers and Data Movers

As per the “How Backup Works” article in the Veeam helpcenter the Veeam Backup manager establishes a connection with the Veeam Data Movers on the target repository, which if you remember is a gateway server in a D2O use case and the backup proxy. Skipping some steps here, but after everything has been established and connected, the source Veeam Data Mover reads the VM data from the VM disks and transfers data to the backup repository of the target Veeam data mover.

Based on all the information I already outlined above around gateway servers (treat them as a backup repository) add up this information with the correct proxy server sizing to determine how to size properly.

There is several content out there already covering the Proxy Server sizing but I want to highlight the best practice guide here especially: Proxy Server Sizing

Furthermore the Veeam Calculator and Scenario Builder helps you in determining how many proxies you need and how the sizing should look like.

Veeam Calculators

Object Storage Limitations relevant for sizing

The first 2 things to read and to address when it comes to all sizing topics is the best practice guide and the “Veeam Ready program” website. Veeam backup and restore processes put a very high load on object storage systems and that is part of why I made this blog series. There is some important general advice when it comes to size for a D2O scenario.

  •  Some vendors have problems with big bucket sizes, so consult the vendor to determine whats important
  • Some vendors need to use multiple smaller buckets to distribute load, bucket size and amount of objects across a lot of buckets to function properly
  • Some vendors need to take care of how many API calls are being done by Veeam on their object storage system so make sure to consult the vendor here again to determine whats appropriate and what is feasible.
  • Never ever think about configuring lifecycle rules or tiering mechanisms on object storage buckets enabled with Veeam Backup & Replication – it is not supported
  • Do not use “native S3 replication” or geo-redundant erasure coding enabled buckets for creating your “HA” or failover object storage architecture. This is a very common (and unlucky) thing but simply and strictly not supported. In the first thing a transparent failover with replicated erasure coding on an object-storage system on premises might sound good, but in case anything breaks, Veeam will probably rely on their documentation and tell you it is not supported. Either way, you do not want to create an achitecture which is unsupported. Use Backup Copy Jobs or try to leverage the SOBR to create copies across your datacenters !

native s3 replication

Veeam Ready program and whitepapers

Another thing to consult is the “Veeam Ready program” like outlined above. Depending on the vendor you choose, you will get additional testing info, documents and whitepapers at your hand with which you can architect the object storage solution for your needs.

You will find lots of information on how to treat that object storage vendor / product within your environment and it mostly comes down to the following things:

  • Required block size of the backup jobs for that specific vendor 1MB, 4MB and so on. I did a whole blog post for that topic so make sure to check this one out as this covers block sizes deeply!
  • In terms of the block size it is crucial to know this because, some vendors size their amount of nodes for object storage based on the “to expected block size” arriving on that object storage system.
  • recommended size of the s3 bucket for that specific vendor. Some vendors give you limits like 100 TB per bucket or 250TB per bucket. Some vendors do not have any limits on the bucket size.
  • recommended amount of objects per bucket. some vendors give you limits on how many million or billion objects a system, an account or a bucket can handle.
  • recommended network setup in general meaning MTU size, LAG and MLAG configurations and also maybe how many API calls an object storage system, account or bucket can handle.
  • Especially on API calls, there is a reason why Amazon AWS for example has a limit of 3,500 PUT/COPY/POST/DELETE calls or 5,500 GET/HEAD calls / requests per second. There is a chance that the on-premises object storage system has a limit as well or limits it somehow. Make sure to know this and work around this by carefully troubleshooting and analyzing if errors occur.

Advantages and caveats of smaller and larger block sizes

Block sizes is a big topic in general for object storage and I tried to cover this pretty deeply here in this post and the previous ones. Make sure to check out the other parts which I’ve linked everywhere.

Advantages of a larger block size

  • for Casssandra or Elasticsearch or “metadata database per nodes object-storage systems” most of the time a larger block size will result in better performance. This lies in the nature of the object-storage architecture implementation and needs careful calculation like described in my blog post.
  • larger blocks automatically result in fewer or reduced API calls to the object-storage system
  • If the object-storage solution is HDD based or anything slower than “Direct-Flash / NAND” it could make sense to use a larger block size as larger objects are better to read and especiall write on hard disks.
  • depending on the size of the anticipated solution, larger block sizes can lead to easier management within petabyte-scale architectures. However, this still relies on the object-storage vendor and platform in general but theoretically this is the case.

Advantages of a smaller block size 

  • enhanced compression because of the smaller size of the objects
  • default setting of Veeam is 1MB which results in 50% reduction based on Veeam’s deduplication and compression and therefore roughly 512Kb arrives on the object-storage system.
  • a smaller block size results in “more granular updates” as only small blocks / objects need to be updated during incremental runs for example.

Caveats and Considerations

  • Think twice about your block sizes: Increasing block size is a tradeoff of lowering API calls. Meaning less objects to look at on rescan and in regard to block generation. However, this raises storage consumption so think twice when you select 4 MB or 1 MB block sizes.
  • Performance vs. Storage Trade-Off: While larger blocks may improve performance, they definetely increase the required storage especially in the long run!
  • Immutability and Object Lock: This is probably one of the most important things to consider, because a larger block size reduces the “PutObjectRetention” API calls which is faster. Here is a wonderful forum thread on this topic

As you can see it is a very complex decision based on the object-storage system and the architecture around it. While a HDD solution based on a Cassandra DB benefits from larger block sizes another Direct Flash Scale-Out solution doesn’t care about the block size in general because it could be 1 Byte.

Ultimately, make sure to consult your partner / vendor on how that specific object-storage solution works as there are major differences and not every tip written here applies to every object-storage solution. This blog post should rather be a “make sure to know and then check this” and further consult your object-storage vendor kind of blog post.

Let me know in the comments if you have further things which I need to add to this blog post or need to cover!

Conclusion on everything you need to know to architect and size for Object Storage with Veeam — Part 3

I really hope this third part of the blog series around Sizing and Architecting for Object Storage was interesting. Feel free to comment and give some hints and tips which I can add here. There are more articles to come in this series, and I’m going to list them here.

Check out all Veeam related posts on my blog here: Veeam Blog Posts

About Falko Banaszak

Falko is a Consulting Field Solutions Architect for Modern Data Protection based in Germany working at Pure Storage Inc. In the last 12 years he has built a strong focus on virtualization, BCDR and has a passion for Microsoft 365 & storage systems. He's a Veeam Vanguard, a Veeam Certified Engineer, Veeam Certified Architect and a Veeam User Group founder and leader in Germany.

Check Also

vSphere Tagging Tool HTML5

Supercharge Excel with ChatGPT

Introduction and why I Supercharge Excel with ChatGPT When you read “Supercharge Excel with ChatGPT”, …

Leave a Reply

Your email address will not be published. Required fields are marked *