I just got out of my first 2009 Citrix Summit session in Las Vegas and I think the information gleaned was very informative:
In a basic sense, we got some very specific and concise recommended configuration notes for XenDesktop deployments. I have used my personal knowledge in combination with the session to come up with the current and applicable recommendations:
We typically look at the following four areas with regards to scaling and usage in a fully provisioned Virtual Desktop environment. This allow us to manage storage sprawl and images across both virtual and physical machines.
Delivery Controllers: Used as the proxy to deliver a specific desktop with appropriate customizations to the end user.
Virtual Infrastructure: Used as the repository for virtual machines and also to abstract the hardware and drivers from the machines.
Provisioning Infrastructure: This element is used to deliver "golden images" and a boot environment to virtual and physical hardware and occasionally to provide the user level cache for machine settings.
Central SAN Infrastructure: Used to provide local Disk Cache, VM host disks and a storage repository for the provisioning infrastructure.
There is inherently some level of complexity in sizing all of these infrastructure elements because each customer has a slightly different use case for each element-- as an example, you could need a few or many power user profiles for Virtual machines-- you also may need some number of users to have large amounts of specialized applications. Some users also may need to have dedicated hardware with advanced video capabilities, etc. Many of these complexities have not been accounted for in this simplified overview of the scalable elements mentioned above. However, the basic approach in this post will give you a good place to start when trying to understand the complexities of this sizing challenge.
Here is a basic overview from the depths of the lab regarding each element of the infrastructure:
Delivery Controller Baseline Stats: based on a relatively Beefy dual socket quad core machine in the lab, you should be able to accommodate 5-10 connections per second per DDC. This equates to up to 10,000 users connecting in a 20 minute period--
Scaling and availability are always questions beyond this-- Traditional approaches with load balancers and vertical scaling can be used to accommodate more concurrent connections than mentioned above.
Heartbeat connections to the farm are negligible and not considered a bottleneck with DDCs.
Virtual Infrastructure Baseline Stats:
This is one of the most interesting sections of the scaling infrastructure because it has to take into account the amount of Virtual machines per host and per Pool. All of this infrastructure is based on the use of Xenserver 5.0 HF3. Currently Xenserver limits you to 48 machines per pool and the hardware tested was across two types of dual socket Quad core machine. Machines deployed in the infrastructure were created with host cache deployed on a Netapp SAN (not on Provisioning server) Virtual hosts also were configured with a single VCPU and 512 MB RAM. A testing profile was created to accommodate "aggressive" usage of typical MS Office apps.
With this in mind, the two pieces of hardware got the following results:
| Dual Quad Core, 1.9 GHZ, 16 GB RAM | Dual Quad Core, 2.4 GHZ, 32 GB RAM |
VMs Per Host | 29 | 58 |
Physical Hosts per pool (Xenserver Limited) | 48 | 48 |
VMs in Pool (verified) | ~1400 | ~2500 |
ICA Bandwidth Average per VM (average) | ~15 Kb | ~15 Kb |
Storage IOPS per VM (average) | ~6 | ~5 |
Storage IOPS per VM (peak) | ~20 | ~25 |
Obviously, there is a lot of wiggle room in the above statistics for use with the actual use case. An environment heavy in power users will result in far less scalability, an environment that has less activity than created in the "aggressive" test case might result in slightly more scalability with less demand on back end storage infrastructure.
Provisioning Infrastructure Stats:
Provisioning servers are doing a lot of work in a "best practice" XenDesktop infrastructure. They must stream the entire OS image to the VM hosts and could also be required to provide host cache to the VM hosts if the VM hosts are not configured to use so-called local disk cache. In light of this, the scalability of the Provisioning server is highly dependent on the usage of host caching with the XenServer or workstation clients. With this being the case, The following table is reflective of the basic premise of using host caching versus not:
| Local Caching used | Local Caching NOT used |
Number of Provisioned hosts per PVS Server | 500-750 Machines | 250-450 Machines |
The hardware used for the above case was a single dual socket quad core machine with 8 GB RAM and a Netapp Solution in the backend Dual GE nics are used on the machine to accomodate host and storage connectivity. 64 bit W2k8 is used to address all of that RAm and make it work . . . This is all done with PVS 5.0 SP2.
The Vdisk accomodation is assumed to be 20-40 GB per vdisk. Write back Cache is assumed to be 1-2 GB per active user in both cases.
Some open questions:
There are still a bunch of things that are unknown, and hsould come with experience with this product across a myriad of customer experiences:
How much I/O is required if there are various vDisks and larger OS instances in the environment (like Windows 7 and Vista)?
How do you accomodate or script failure of the pool master in a Xenserver environment with no provisioning impact? (I have heard that this is being worked on)
How can you size this appropriately in very mixed environments that use provisioning for Xenapp workloads, regular Desktops and Virtual Desktops that may or may not use local caching?
When does the environment scale and hit issues with backend storage infrastructure? What are the reccomendations for accomodating this growth even if you start out small?
How do you apply security policies to this and leverage other application deploment methods?
Thankfully, our experience has given us the ability to answer these without having done the work in the lab. If you have questions, you know what to do. :)