Windows 2003 Server Network Load Balancing (NLB) for IIS based SMTP services

Published June 17, 2008 |

In this post, I will explain how to set up 2 or more Windows 2003 servers to use the Windows Server built-in NLB functionalities in order to provide high availability and network load balancing for IIS based SMTP services.

Requirements

You need to have at least 2 servers that communicate to each other over 2 physically separated cables/switch connections. This means that if you run these 2 servers in a Virtual environment, and the Virtual Machines are hosted on the same Virtual Guest, then traffic between these two machines most likely never leaves the internal switch, so this scenario will cause issues. Furthermore, you may encounter problems with 2 virtual machines if they have the same Network Interface GUID.

All servers that will be used in the NLB cluster must have a fixed IP address. Furthermore, in 2003, they need to be on the same IP subnet.

The minimum number of servers in the NLB cluster is 2, the maximum is 32

In the perfect world, it is recommended to have multiple network interface cards in your server, so you can separate traffic that is targetting the cluster, and traffic that is targetting the server itself, however it is perfectly possible to install and use NLB on servers that have only one network interface card.

Preparation

You will need one new additional & dedicated IP address that will be used for the cluster. The idea of a NLB cluster is the fact that all of the cluster nodes are aware of the state of each other, and all of the cluster nodes are configured with one virtual IP, which is used by clients to connect to the services that are provided by the NLB. On all of the configured nodes, the NLB process will detect incoming requests to the virtual IP and will distribute the requests over the cluster nodes, based upon some parameters.
You will also need a DNS hostname (fqdn) for the cluster.
Before setting up the NLB, add the hostname in DNS and make sure to create a PTR (reverse lookup record) for the new hostname.

Verify that DNS is set up properly by running a nslookup to the FQDN of the cluster. You should get the IP address.

Since this post is about creating a NLB for SMTP services, you need to make sure that IIS’ SMTP is installed on each server that will be part of the NLB. Also, configure each SMTP server with the same settings (access control, relay restrictions, size & connection limits, domains, authentication, smart host settings, etc)

Procedure

On each server, run the following procedure :

Open Network Connections, select the network interface card that will be used to service NLB/Clustering traffic, Edit Properties and enable "Network Load Balancing". Make sure "Network Load Balancing is still selected and click "Properties"

Cluster Parameters / Cluster IP configuration :
– IP address : enter the (newly assigned virtual) IP address of the cluster. This is the new IP address that will be used by the clients to connect to the NLB SMTP service
– Subnet mask : enter the subnet mask
– Full internet name : enter the fqdn for the cluster
Cluster Parameters / Cluster Operation Mode :
– Enable multicast, don’t enable IGMP multicast (unless you know what you are doing)
– Do not enable remote control
Host Parameters
– Set a priority. The values should be unique. This value is referred to as the Host Priority ID
– Dedicated IP configuration : IP address : use the IP address that is assigned to the physical server, not to the cluster. Set the subnetmask accordingly
– Initial host state : set to "started"
Port Rules
– Edit the existing port rule. Since we are only interested in hosting NLB for smtp, we can change the port to 25, tcp only. Leave the filtering mode to Multiple hosts, Single affinity, and Equal Load weight. (I will discuss these values later on)

Click "OK" to save the Network Load Balancing settings. You will receive a warning about the TCP/IP properties not being configured correctly yet.
Select "Internet Protocol (TCP/IP)" and click "Properties".
Click "Advanced"
Under IP addresses, add a new IP address. Provide the IP address that is assigned to the cluster (the virtual IP address). Make sure the physical IP address of the server is the first one in the list (which should be the case already, but it’s always a good idea to double-check)

Save all changes. At this point, connectivity with the server will be lost. You will need to reboot the server before you can access it again remotely, so if you are configuring the server remotely, you may loose connection and you may not be able to reboot it (which is required to apply the new configuration). Pay attention to this, as you may isolate the server this way.

Reboot the server (if you still can :-) )

If all servers have been reboot, open Network Load Balacing Manager on one of the servers. Make sure to run this with an account that has admin permissions on all servers

Right click "Network Load Balancing Clusters" and choose "Connect to existing"

Enter the hostname of one of the servers that are part of the cluster and click "connect". In the "Clusters" view, you should now see the cluster that it is part of. Select the cluster and choose ‘Finish’

The cluster is now added to the window. You should see all nodes that are part of the cluster. If that is not the case, have a look at the "Known problems" section.

Select one of the nodes. If you have just set up the NLB, the hosts may still be "Converging", which means that they are exchanging configurations. If this process has been completed, the nodes should turn Green and should indicate "Converged"

From this point on, the NLB should work.

Try to connect to the SMTP server using the FQDN of the cluster. You should be connected to one of the nodes in the cluster. If one of the nodes in the cluster fails, the clients should be connected to one of the other nodes in the cluster.

Configuration options

In the procedure above, I have mentioned a couple of configuration options, which may require a little bit of explanation

Unicast vs Multicast

Unicast : one IP address has one MAC address. ALl cluster nodes are assigned the same MAC address. This technique is used in older versions of NLB. This MAC address is a combination of the cluster MAC and the host priority. In fact, clients never learn the real MAC address of the cluster, which sometimes results in problems. If you really want to use unicast, you may have to set the following registry key on all members in order to use the real MAC address on outgoing packets :

HKLM\SYSTEM\CurrentControlSet\Services\WLBS\Parameters\Interface\{Interface_GUID}
set MaskSourceMAC to 0 (REG_DWORD) (Default value is 1)

Another caveat of using unicast is the fact that the NLB driver will attempt to change the MAC address of the network interface card. If the adapter does not support this, you will need to replace the adapter. Futhermore, if you only have one network adapter, you will not be able to communicate to the other servers in the cluster anymore. So if you are forced to use unicast, make sure to use multiple network interface cards, and only apply NLB to one of the cards.
The benefit of using unicast is the fact that it works out of the box, with all routers and switches. After all, every network card has only one MAC address.
Also check the following MS Knowledge base article : Unicast NLB nodes cannot communicate over an NLB-enabled network

Multicast : this means that one IP address can be represented by multiple MAC addresses. This is the recommended technique unless you have indications that your network performance has decreased due to NLB multicast traffic. If your switches are experiencing "flooding", you can enable IGMP, which should bring down the load on the switches, however your switches/routers need to support IGMP as well. If your servers have only one network card, you should use multicast. You can find more information on IGMP on http://technet2.microsoft.com/windowsserver/en/library/482d4ca4-27c6-4673-99cf-57da485b2b2b1033.mspx?mfr=true

Note : the cluster operation mode must be the same on all cluster nodes.

You can find more information about unicast and multicast on the following MS Technet page : http://technet2.microsoft.com/windowsserver/en/library/aa15cdd3-7ac5-4846-904e-4ff282f8e7f11033.mspx?mfr=true
Alternatively, you can find some info on http://support.microsoft.com/kb/323437 and on http://support.microsoft.com/?id=197864 as well.

Host priority ID

This number indicates the node in the cluster and determines the orderin which traffic is delivered to a specific server / the priority of the host during the cluster selection process. The lower the number, the higher the preference. Just make sure to set a unique host value. Priority ID’s range from 1 to 31 (32 in total)

Port rules

Filtering mode :
Multiple hosts : use this to perform load balancing. You can set afinity parameters and load weight parameters, which are explained below.
Single host : in this scenario, traffic will only be handled by the cluster node with the highest Handling priority ID (which is not the same as the host priority ID – this ID is only used for traffic that is not covered by port rules). The handling priority number must be unique on each node. The lower the number, the higher the preference.

Disable mode : in this scenario, clustering acts as a basic firewall, and will block connections to the ports defined in the rule.

Afinity :
Choose this setting based on the ability for an application to maintain state across servers. If the application is not capable of maintaining state, you would want the client to connect to the same server. Especially http servers have issues with this.

None : this is true load balancing. If you application cannot deal with state information that is spread over multiple sessions and servers, then this will break your application. For SMTP/FTP/other applications that bundle an entire communication into a single session, this will work fine.
Single : use this to force clients to use the same server as the previous connection. While traffic from multiple clients will be distributed over multiple cluster nodes, the load may not be spread perfectly, because one client could generate more traffic and sessions than another client.
Class C : works in a similar way as "single", but it assigns a cluster node based on an entire class C network. So if one of the IP addresses in the same class C network has connected, the other ones will connect to the same cluster node

You must set the afinity parameters to the same values for each cluster node.

"Single" or even "Class C" afinity would work great for Terminal Services, since clients will then be able to reconnect to their disconnected sessions, on the same server.

Load weigth :

Using this parameter, you can define the accepted load on the cluster nodes. If you have 3 NLB servers, and 2 of them are capable of handling more clients, you could set the load weight to e.g. 60-60-40 (note : these values do not indicate a percentage, but rather a load handling capability comparison between the individual nodes)

Heartbeat

When you change something in the cluster node configurations, the changes are picked up dynamically by the other nodes. The timing is based on the unicast or multicast heartbeat messages that are exchanged in regular intervals.
If one of the node does not respond over five consecutive times, a "Convergence" action is initiated. This is the process that exchanges node availability, cluster parameters, host parameters, and port rules between the NLB cluster nodes. This process should take only a few seconds.

The number of missed responses and the interval between the missed response can be defined in the registry :
HKLM\SYSTEM\CurrentControlSet\Services\WLBS\Parameters\Interface\{Interface_GUID}
AliveMsgPeriod (REG_DWORD) : Heartbeat interval (in milliseconds). Default : 1000 (decimal). Possible values : between 100 and 10000.
AliveMsgTolerance (REG_DWORD): Defines number of heartbeat messages that can be missed before the NLB node becomes "unreachable" and the convergence process initiates. Default : 5 (decimal). Possible values : between 5 and 100.

When a node has become unreachable, and if you open the nlbmgr.exe (Network Load Balancing Manager tool), the node won’t be visible anymore. If the node comes back online, you need to close the tool and reopen it again. Refreshing the view won’t detect when the node is online again.

Drainstop

If, for any reason, you need to perform a controlled reboot of one of the cluster nodes (perhaps because of the installation of patches or software upgrades), you can use the drainstop functionality. This function will set the server in a mode so existing connections are still serviced, but new connections are no longer accepted. If you want to see whether all connections are gone, you can use the "nlb query" command. If you want to take a server out of drainstop mode, you need to stop the NLB and start the NLB again.

Known problems

If you open the Network Load Balancer Manager console and connect to the cluster, you should see all the members of the cluster. If that is not the case, and if you cannot add the other hosts/nodes to the cluster ("No interfaces are available for installing a new cluster"), please verify the following settings :

1. If your cluster nodes only have one network card, make sure the NLB is set to multicast. Otherwise, the cluster nodes won’t be able to talk to each other

2. Verify that the network cards have unique GUIDs. If that is not the case, remove the adapter on one of the servers (Device Manager – Uninstall), then use "Scan for hardware changes" to add it back, which will create a new GUID for the adapter

3. If you are running in a virtual environment, make sure the nodes are not running on the same physical virtual server.

IP Conflict warnings

If you remove NLB from one of the servers, you may receive a "IP conflict" warning on all of your servers. In order to prevent this / fix this, you must remove the cluster IP from the list of IP address in the TCP/IP properties on the server where you have disabled NLB.

What if nodes are not converging when you expect them to do ?

Turn on logging, look in the log files for conflicting
– IP configurations
– Port rule settings
– other configuration settings
– frequent converging attempts

You can get a lot of information by using the "nlb display all" command.

Posted in Windows Server | Tagged afinity, alivemsgperiod, alivemsgtolerance, cluster, converging, heartbeat, iis, iis-load-balancing-windows-server-2003-configuration, ip conflict, multicast, network load balancing, network-load-balancing-host-unreachable, nlb, nlb-nodes-at-physically-separated, smtp, smtp-cluster-windows-2003, take-server-out-of-drainstop, unicast, virtual-nlb-interface-cards, windows-server-2003-load-balancing-iis

Corelan Cybersecurity Research

:: Knowledge is not an object, it's a flow ::