Wednesday, May 6, 2009

Best Practices for Xenserver Deployments

This is based on Current Xenserver 5.x Capabiities and modern 2006+ hardware.

Sizing Assumptions:  Assuming Low to Medium -Low Utilization Workloads 
10:1 consolidation is more normal than higher.
This assumes Newer Hardware with VT or other features on Chips.  Intel is more common than AMD, and bottlenecks tend to move . . .  Quad core, quad socket boxes will be challenged with I/O, smaller boxes tend to be CPU or memory limited.
Always assume the Xenserver management (Dom 0) uses a single CPU Core by itself.
The rest of the cores are used for VMs
normal Consolidation is approx 4-6 VMs per Core (modern CPUs, RAM assumed to accomodate)
Eg on an 8 core machine . . .
1 core is used by the Dom 0, the other 7 are for VMs and you might be able to get between 28-42 VMs on this hardware.  This is VERY workload dependent.

Templates should usually have one VCPU!
ONLY add VCPUs if existing VCPUs are highly utilized and workload is VERY threaded.
This seems counter-intuitive, but most VMs will work better and faster with only one VCPU.

Assume XenCenter (DOM 0 will use 328-880 MB RAM-- average of 700MB)
Memory is statically allocated to Dom 0, and you should not over-allocate RAM.
Don't under allocate Virtual Memory-- you will end up with a lot of swap activity and therefore poor performance.
Leave some free memory on your servers-- leave some extra for xenmotion and growth of your VMs.

Always use a dedicated Network for Storage
The storage Network should use 2 bonded NICs for availability if NAS or iSCSI-- FC is on its own.  Local Storage is not typically reccomended because it will not allow for Xenmotion and other features requiring central storage.

For NAS Storage:
NAS appliances are a better choice than other choices for NAS-- you chould definitely have Write cache and battery-backed NVRAM.
For iSCSI Storage:
iSCSI MP is typically best--  only use Active active for arrays that do not use active-active pathing.
For FC Storage:
For FC-- Use an array with Active-active multipathing and balance I/O-- typically Round Robin.
FC over IP is new and will be a factor in the future, but is not typical for now.

If you want true integration with "storagelink" technology from Citrix Essentials, check the features page on the Citrix Site since this is a big factor for many folks in the storage decision.  This technology makes a big operational and management difference!

Network Sizing:
Typically, 6-10 VMs suck up a physical Gig Ethernet Port.  Promiscuous mode for VMs will make the traffic capabilities of a host less capable because all traffic will pass out of an interface to talk between VMs even on the same host.  Typically, inter VM Traffic can exceed actual outbound physical limits because the traffic does not actually cross the wire.

Use a dedicated VLAN and pair of ports for Management:
NIC Pair with bond for Fail for management.  There are idiosyncrasies with management traffic and changes to this network if you have created bonds, so be cautious with this.
Switch port mode:  Access
Use a dedicated network that does not Route for NAS or ISCSI based storage:
Routing this creates latency and will affect performance.
Switch port mode: Access
Use a dedicated network for VM Traffic:  
This is typically multiple interfaces with the same access to multiple VLANs.  Once you have all of these interfaces up, you bond them, but the switch ports are in Trunk Mode.  You then tag the traffic for VLANs to specific machines. 
Switch port Mode:  Trunk




Playing with the Cloud

This post is a work and progress and will be updated periodically to reflect industry trends and new methodologies . . .

In general a cloud is the abstraction of a variety of networked services to be used in a generic fashion by many different applications, desktops, etc.

Citrix has done some interesting things to express and extrapolate their cloud (C3) product for use by customers and allow use for labs and other items.  Right now, the cloud is a generic hypervisor layer teamed with some other resources to allow for more robust demonstrations with minimal effort.  Amazon is using the Xen Hypervisor in its EC2 offering, but has wrapped its proprietary tools around the infrastructure in order to monetize and control its generic infrastructure for its customers.  I have not yet seen nice tools to push VM formats in and out of this infrastructure for seamless migration between public and private clouds, but I am sure I am not the only one thinking of this automation.

Hierarchically, there are several layers to this approach (Citrix and Amazon):

Infrastructure as a service  (eg. Citrix C3)
Platform as a Service (eg. Citrix C3 labs)
Software as a Service  (eg. Citrix online --and I assume Dazzle) 

C3 is basically a branded bundle of Citrix Xenserver, Essentials and Workflow management used for creating a private cloud.  There is a cloud "bridge"  that is nomenclature for a bundle of Netscaler products with Wanscaler for acceleration between the corporate DC and the cloud itself . . . (public or private)

Now, the trick of setting up your own cloud and using it for real applicaitions is understanding the limitations and soft factors that will impact application performance and enduser experience for applications deployed within the cloud infrastructure.  There are "easy" applications to leverage cloud infrastructure, and there are difficult ones.  

In general, the deterministic factors in "easy" and "hard" come down to I/O and workload predictability.  Easy workload examples are stateless services and items like web services and applications.  Difficult items to deliver in a commoditized cloud are data services with transactional requirements that are very state or time sensitive. (like a big-ole Oracle DB.)

So, there are a number of tricks employed to make cloud management and workload management simpler as the challenges and demands on the cloud become more dynamic.  
lets take a nice example--  I have a lab environment that I can test in the Amazon cloud-- it involves a few machines running several windows hosts with Sharepoint and other services.  I can test a proof of concept of the interoperability and other items in the cloud, but as we near a production or "live" deployment of the architecture, I will need a different set of architectural elements in order to meet the enduser experience requirements:  with 10,000 users on the system, my I/O and backend storage requirements may explode in an unpredictable manner-- this may require that I deploy in my own cloud where I can scale the backend components that drive the enduser experience quality.  the reality really comes to economics . . .  I may be able to make production versions of the above exaple work in the cloud, but will it be cost effective versus the granular approach that I can take on my own?

Now, the realities of the industry are that the easy workloads can now be deployed in commoditized clouds, but difficult workloads like my example require more granular control.

The market is maturing however!   
Some interesting technologies that will enable this sort of difficult workload in commodity datacenters will include the following:
  1. Virtual Switching technologies in multi-tenent hardware environments (XenServer Vswitching, Cisco V1000 stuff as examples)
  2. I/O shaping and QOS to VMS
  3. Application security and workflow Management for provisioning service and migrating workloads depending on application workload across all factors that influence the application delivery.
  4. The ability to migrate workloads seamlessly between personal hypervisors, lab environments and private clouds.
Knowledge comes from experience, so you should get playing with this stuff now and see what the economics mean for you and your applications.

So, how the heck do I get started?  Well a couple of things you can do--
Go to the Amazon EC2 cloud and use your credit card to stand up some nice simple services that can be used to do simple demos.  Geek out!

If you happen to have some hardware sitting around, you can also create some things with your own hardware.
This might involve creating a pool of server resources with Xenserver locally in the most simple sense.

If you really want to make things work in an interesting fashion, workflow studio along with some automated provisioning would allow for some interesting use cases like watching server load on particular infrastructure elements, and then automating the start of more capacity based on increased requirements.

A truly automated environment could become quite complex, but the end result is automated simplicity once the market is mature and ready for your particular scenario.  I will be posting some details on how to set up application specific infrastructures in different clouds as we have more practical experience in this area . . . .

Sunday, May 3, 2009

Citrix Xendesktop Scaling and Recommendations

I just got out of my first 2009 Citrix Summit session in Las Vegas and I think the information gleaned was very informative:
In a basic sense, we got some very specific and concise recommended configuration notes for XenDesktop deployments.  I have used my personal knowledge in combination with the session to come up with the current and applicable recommendations:

We typically look at the following four areas with regards to scaling and usage in a fully provisioned Virtual Desktop environment.  This allow us to manage storage sprawl and images across both virtual and physical machines.

Delivery Controllers:  Used as the proxy to deliver a specific desktop with appropriate customizations to the end user.
Virtual Infrastructure:  Used as the repository for virtual machines and also to abstract the hardware and drivers from the machines.
Provisioning Infrastructure:  This element is used to deliver "golden images" and a boot environment to virtual and physical hardware and occasionally to provide the user level cache for machine settings.
Central SAN Infrastructure:  Used to provide local Disk Cache, VM host disks and a storage repository for the provisioning infrastructure.

There is inherently some level of complexity in sizing all of these infrastructure elements because each customer has a slightly different use case for each element-- as an example, you could need a few or many power user profiles for Virtual machines-- you also may need some number of users to have large amounts of specialized applications.  Some users also may need to have dedicated hardware with advanced video capabilities, etc.  Many of these complexities have not been accounted for in this simplified overview of the scalable elements mentioned above.  However, the basic approach in this post will give you a good place to start when trying to understand the complexities of this sizing challenge.

Here is a basic overview from the depths of the lab regarding each element of the infrastructure:

Delivery Controller Baseline Stats:  based on a relatively Beefy dual socket quad core machine in the lab, you should be able to accommodate 5-10 connections per second per DDC.  This equates to up to 10,000 users connecting in a 20 minute period--
Scaling and availability are always questions beyond this-- Traditional approaches with load balancers and vertical scaling can be used to accommodate more concurrent connections than mentioned above.
Heartbeat connections to the farm are negligible and not considered a bottleneck with DDCs.

Virtual Infrastructure Baseline Stats:
This is one of the most interesting sections of the scaling infrastructure because it has to take into account the amount of Virtual machines per host and per Pool.  All of this infrastructure is based on the use of Xenserver 5.0 HF3.  Currently Xenserver limits you to 48 machines per pool and the hardware tested was across two types of dual socket Quad core machine.  Machines deployed in the infrastructure were created with host cache deployed on a Netapp SAN (not on Provisioning server)  Virtual hosts also were configured with a single VCPU and 512 MB RAM.  A testing profile was created to accommodate "aggressive" usage of typical MS Office apps. 
With this in mind, the two pieces of hardware got the following results:

 

Dual Quad Core, 1.9 GHZ, 16 GB RAM

Dual Quad Core, 2.4 GHZ, 32 GB RAM

VMs Per Host

29

58

Physical Hosts per pool (Xenserver Limited)

48

48

VMs in Pool (verified)

~1400

~2500

ICA Bandwidth Average per VM (average)

~15 Kb

~15 Kb

Storage IOPS per VM (average)

~6

~5

Storage IOPS per VM (peak)

~20

~25


Obviously, there is a lot of wiggle room in the above statistics for use with the actual use case.  An environment heavy in power users will result in far less scalability, an environment that has less activity than created in the "aggressive" test case might result in slightly more scalability with less demand on back end storage infrastructure.  

Provisioning Infrastructure Stats:
Provisioning servers are doing a lot of work in a "best practice" XenDesktop infrastructure.  They must stream the entire OS image to the VM hosts and could also be required to provide host cache to the VM hosts if the VM hosts are not configured to use so-called local disk cache.  In light of this, the scalability of the Provisioning server is highly dependent on the usage of host caching with the XenServer or workstation clients.  With this being the case, The following table is reflective of the basic premise of using host caching versus not:

 

Local Caching used

Local Caching NOT used

Number of Provisioned hosts per PVS Server

500-750 Machines

250-450 Machines


The hardware used for the above case was a single dual socket quad core machine with 8 GB RAM and a Netapp Solution in the backend Dual GE nics are used on the machine to accomodate host and storage connectivity.  64 bit W2k8 is used to address all of that RAm and make it work . . .  This is all done with PVS 5.0 SP2.

The Vdisk accomodation is assumed to be 20-40 GB per vdisk.  Write back Cache is assumed to be 1-2 GB per active user in both cases. 


Some open questions:
There are still a bunch of things that are unknown, and hsould come with experience with this product across a myriad of customer experiences:
How much I/O is required if there are various vDisks and larger OS instances in the environment (like Windows 7 and Vista)?
How do you accomodate or script failure of the pool master in a Xenserver environment with no provisioning impact?  (I have heard that this is being worked on)
How can you size this appropriately in very mixed environments that use provisioning for Xenapp workloads, regular Desktops and Virtual Desktops that may or may not use local caching?
When does the environment scale and hit issues with backend storage infrastructure?  What are the reccomendations for accomodating this growth even if you start out small?
How do you apply security policies to this and leverage other application deploment methods?

Thankfully, our experience has given us the ability to answer these without having done the work in the lab.  If you have questions, you know what to do.  :)