Stairway to AlwaysOn HA Level 3

Welcome to level 3 of the Stairway to AlwaysOn. In this article we’re going to look closer at the key components required to form a Windows Server Failover Cluster. An AlwaysOn Availability group and a Failover Clustered Instance both require a functional WSFC. Given this, we need to understand a little more about what happens at the Windows cluster level. This will help not only the design and deployment of the Highly Available SQL Server groups\instances but also the ongoing support, too.

With this in mind let’s look deeper into the following required objects:

  • Windows Server Active Directory Domain
  • DNS server
  • DHCP server
  • TCP/IP network
  • Computer nodes with a minimum of Windows Server Enterprise edition (Windows 2008) or Standard Edition (Windows 2012).

But what is a WSFC, and how does it work?

Let’s look at a bare cluster without applications installed to see what comprises the base configuration. In this scenario we’ll have a 5 node cluster with no shared storage as we plan to deploy an AlwaysOn Availability group configuration at a later date with no Failover Cluster Instances included.

Windows Server Failover Clustering uses a specific Windows Server feature. This feature must be enabled before clustering can be achieved. It is commonly enabled via the “Roles and Features” wizard in the Windows Operating system, but you may also use PowerShell.

Once the feature has been enabled, Failover Clustering uses a Windows service on each node that is a member of the failover cluster. The service is aptly named the “Cluster service”. The service forms a communication forum with partner nodes that is used to both synchronise the nodes and monitor the health of clustered applications.

In the event an application does not respond for some reason, a series of steps are taken based on the resource policies. If the failure is a software failure, for instance, the clustered SQL Server service terminated, and a restart is usually attempted on the same computer. If the failure is a hardware failure, a failover of the clustered application would be imminent.

If a failover results in a restart of the service for the clustered application, the restart of the service will usually occur on a partner node. This means that any connections to the service, in this case the database engine, are lost. The SQL Server database engine will restart and recover each database when the service restarts, rolling forward committed transactions or rolling back uncommitted transactions appropriately from the transaction logs.

As mentioned previously, the cluster service also monitors the general state\health of the server Operating System and the clustered applications. For the SQL Server database engine there is a periodic check to ensure that the service “IsAlive”. For the Operating System the isAlive check occurs roughly every 5 seconds.

The IsAlive check occurs every 60 seconds and is measured over a 5 second period. A lack of reply within a 5 second timeframe indicates to the cluster service that the node is unavailable. A failure to respond within a set timeline will initiate a failover of the application\resources.

Cluster Common Property

Default for All Clustered Roles Except Hyper-V

Default for Hyper-V Clustered Role

SameSubnetThreshold

5 seconds

10 seconds

CrossSubnetThreshold

5 seconds

20 seconds

More information may be found on this at this link: http://technet.microsoft.com/en-gb/library/dn265972.aspx#BKMK_NodeHealth

So, how does this all sit together and what would a typical configuration look like? Looking at our diagram below, we can see the typical network view of a base WSFC. We can see our Active Directory domain controller (which is also providing securely replicated DNS zones), our TCPIP networks and our cluster nodes.

 

 

Let’s also look at our infrastructure dependencies; to do this we’ll build a tree. The overall dependency tree (root on the bottom) for the required components would be as follows:

 

 

The blue dotted line indicates an optional dependency on an FCI, this would represent a configuration where your AlwaysOn Availability group contains a replica that is a Failover Cluster Instance of SQL Server.

Breaking the dependencies down, we can see the following:

TCP\IP Network

Starting with the basics, we require a solid TCP\IP network. This takes the typical form of Ethernet switches and cabling, traditionally copper based CAT5e (1Gbps). Computer objects plug into the various network switches via a patch\CAT5 cable.

Due to changes and improvements in the Windows Server Cluster model, it is no longer required to have a dedicated heartbeat adapter. Since Windows 2003, the changes have been extensive, with the greatest changes occurring between Windows 2003 and Windows 2008. The cluster model now utilises a special virtual adapter, which binds itself to an available physical NIC for inter node heartbeat traffic. Details of this virtual adapter would look similar to the screenshot below

 

Note: It should be understood that although you no longer need a dedicated heartbeat NIC you do still require multiple redundant TCP\IP networks.

The 5 nodes are connected through the public\client network; this network is required to be able to service clients connecting to the SQL Server instance. This network is also monitored by the cluster, and a failure here will initiate a failover of the SQL Server resources. There is also a redundant TCP\IP network to provide resilience. This is not strictly required, but best practice dictates that separate redundant networks are available. Separate networks may also be used for database mirroring endpoint traffic or for a SQL Server backup network. The cluster virtual adapter uses an address in the Automatic Private IP (APIP) range to ensure segregated communication between the cluster nodes.

Domain Name System

The next requirement for a reliable Microsoft client network is the presence of a Domain Name System (DNS) server. In a Microsoft Client network, computer objects (servers and workstations) are given a unique computer-name, and they are also assigned a unique IP address. The name and IP address are registered locally on the computer’s Network Interface Card (NIC). However, when you refer to another computer object by name on the TCP\IP network, you need to be able to resolve the name to a valid TCP\IP address; this is where the DNS server comes in.

The DNS server can be a network computer that performs this role specifically, or it can be a Windows Server Domain Controller that has the DNS role enabled. DNS uses zones to resolve groups of computer names. A zone may be either a forward lookup zone (mandatory) or a reverse lookup zone (optional). The 2 zone types operate as follows;

  • Forward lookup zone: resolve a computer name to an IP address
  • Reverse lookup zone: resolve a TCP\IP address to a computer name

Dynamic Host Configuration Protocol (DHCP)

Although not shown in the tree and not strictly necessary, we’ll look at DHCP.

Should you wish to automatically manage the assignment of TCP\IP addresses, you will require a DHCP server. This capability could, in a typical deployment, be placed on the DNS server. DHCP is the protocol that assigns a new computer an IP address upon boot from a reserved pool specified by the DHCP administrator. A typical view of the network configuration bound to your Network Interface Card via DHCP might look something like the image below (for the most part this is controlled by your system administrator).

 

 

Active Directory Domain

At the minimum you will have one Windows Server computer with the Active Directory Domain Services role installed. Once this role is installed a Windows Domain is created by executing DCPROMO.EXE. This is a wizard driven deployment and once complete, provides the network clients with Active Directory authentication and management of network resources. Examples of such resources are:

  • Windows Server Failover Clusters
  • File Shares
  • Printers
  • Users

 

Note that during the promotion of a Windows Server computer to a Domain Controller, if no DNS server is bound to the local network adapter you will be offered the option to install the DNS role. You also have the option to store the DNS zones within the Active Directory catalog. This makes replication of the DNS zones extremely efficient and secure.

The Active Directory domain provides central administration for users, computers, printers, etc. This enables a user account, which is used to run a Windows service, to access network resources unchallenged provided the necessary permissions (Access Control Lists) have been applied. Remember a running service cannot simply provide logon details; the authentication is performed dynamically to a resource (for instance a Windows Share) and must be unchallenged.

Windows Server Failover Cluster

Before you can deploy a Windows Server Failover Cluster you must first have at minimum:

  • a TCP\IP network
  • an Active Directory domain
  • a DNS server

These components represent the minimum requirement. The domain and the cluster cannot function without the above resources. As a DBA it helps us if we understand these basics. This provides you with a clearer picture of the required infrastructure.

The Windows Server Failover Cluster groups a defined set of nodes into one central management entity. The first action is to create a Client Access Point for the Windows cluster itself. This requires a unique virtual network-name and a unique virtual IP address. The term “Virtual” simply means that the object is not bound solely to a physical computer. The resource is virtual because it may be passed from one computer node to another.

When the resources (name and IP) are failed over between nodes, they are first unregistered from the current owner node and then re registered on the new partner node. The action of unregistering and re registering simply means they are unbound from the NIC on one server and bound to the NIC on the new computer.

With the Windows Server Failover Cluster created, the cluster nodes are managed via a single access point. This was defined during the cluster installation. Assuming no requirement for shared storage, the cluster is now ready to receive clustered applications such as Microsoft Exchange, SQL Server or File\Print server services.

 

SQL Server Failover Cluster Instance

A failover cluster instance of SQL Server provides a way of mitigating a serious hardware or software failure which would otherwise render the SQL Server instance permanently offline. Failover Cluster Instances have, for sometime, been a popular way of providing Highly Available SQL Server Instances, the only drawback is the requirement for shared storage mediums. Shared storage introduces the storage Single Point of Failure. A loss of the critical shared storage will render the SQL Server instance offline with no ability to restart until the storage issue is resolved.

It’s important to understand that each new instance deployed requires its own unique Virtual Network name and Virtual IP address. Instances cannot share disk file paths, Virtual IP addresses or Virtual Network names, they must all be unique.

The SQL Server Failover Cluster Instance is completely dependent upon the Windows Server Failover Cluster to provide resource management.

SQL Server AlwaysOn Availability Group

SQL Server Availability Groups are the new High Availability technology in SQL Server 2012. The AlwaysOn configuration utilises a WSFC to provide management for the group resources. AlwaysOn groups generally contain a set of primary databases. The database copies, combined into Availability Groups, are created across a set of “Replicas” and can provide multiple read only copies of a Primary database. A “Replica” is the term used for an instance of SQL Server (clustered or non-clustered) that is part of an AlwaysOn group.

New in SQL Server Availability groups is the option to create a Listener service. This provides a central Client Access Point for databases within the Availability group to which the Listener is bound. After the successful configuration of read only routing, the Listener will direct read only intent traffic to secondary read only databases, easing the load on the Primary database.

We’ll see more of this in a later stairway level, specifically for AlwaysOn.

That completes the overall view of the infrastructure requirements\dependencies; you should hopefully have a solid understanding of the basic components that a Highly Available SQL Server Instance\AlwaysOn Group requires.

In the next Stairway level we will look at the deployment a Windows Server Failover Cluster and gain a deeper insight into the steps required to provide a solid\stable cluster environment.