This blog was originally started to better help me understand the technologies in the CCIE R&S blueprint; after completing the R&S track I have decided to transition the blog into a technology blog.

CCIE #29033

This blog will continue to include questions, troubleshooting scenarios, and references to existing and new technologies but will grow to include a variety of different platforms and technologies. Currently I have created over 185 questions/answers in regards to the CCIE R&S track!! Note: answers are in the comment field or within "Read More" section.

You can also follow me on twitter @FE80CC1E


Monday, August 22, 2011

Virtualization - VMware Clusters and Blade Servers

Blade chassis help further consolidate the data-center footprint and provide an excellent platform to run virtualization; this further consolidates the data-center footprint.  I have noticed on a variety of installations the failure to ensure proper placement of the primary nodes when installing a VMware cluster on blade chassis technology. Primary placement is critical to ensure the availability of VMs in the event of a blade chassis failure. There are 5 primary nodes per cluster and these are selected as the nodes are added to the cluster. Primary nodes holds all cluster settings and node states and this is replicated to all primaries. Secondary nodes do not become primary nodes if a primary node were to fail. Heartbeats are sent from primary to primary nodes and from secondary to primary nodes. If a primary node fails and is not removed then no secondaries become primaries, but if a failed primary node is removed from the cluster than a secondary node  becomes a primary node - the selection of which secondary node becomes a primary node is random further complicating the balancing of primary nodes. The diagram below shows 3 blade chassis's running in RACK A leveraging VMware, the diagram below shows the problems that happen when improper placement of the primary nodes is not followed. In the case below the blade chassis's are HP C7000 series. The installation of ESX is completed in the order of the blade server slots assigned by the blade chassis and all 5 primary nodes end up on one blade chassis (the installation includes adding the nodes to the cluster). The issues identified in the diagram.



To ensure that a blade chassis failure does not impact your ESX cluster ensure that you stagger the installation (installation includes adding the node to the cluster) across the blade chassis's (HP has a split back-plane which adds additional levels of resiliency). The installation process shown in the diagram below
As you can see this ensures that a blade chassis failure will not impact your ESX cluster. Blade technology and virtualization provides alot of opportunities for the business and with proper planning and execution you can ensure that the business realizes these benefits. There is no way to change the primary node placement without the removal of nodes from the cluster and strategically placing the nodes back into the cluster - this includes VMware 4.X and under.

Update: vSphere 5.0 uses the concept of master/slave and you can read more about it here

1 comments:

blade server said...

As a Dell employee I think your blog about blade server is quite impressive. I think a blade server is a server chassis housing multiple thin, modular electronic circuit boards, known as server blades. Each blade is a server in its own right, often dedicated to a single application.

Post a Comment