Notes from the field on ColdFusion (or related) technical issues.

Wednesday, June 22, 2005

JRun Clustering with Windows 2003 NLB

There are a couple of tricky points to clustering ColdFusion or JRun with Windows Network Load Balancing enabled.

First, see Brandon Purcell's article for general information on JRun clustering with ColdFusion 6.1:
http://www.bpurcell.org/viewContent.cfm?ContentID=121

For ColdFusion 7 in Multi-Instance mode, use the ColdFusion Administrator to create the instances to cluster. Remember, each instance in a cluster has to have a unique name; I like to use the last part of the IP address in the name, e.g. "cfusion66". Once all instances are created, choose one ColdFusion 7 server to be your "cluster manager", and use the Administrator on that machine to create the cluster by registering the remote instances running on the other machines, then creating the actual cluster. Once that's done, you must restart all instances involved in the cluster, then use the WSCONFIG utility to re-install the Web server connector on every Web server involved. In order to access the "cfusion" instance to manage instances and clusters after that, you must go to the jrun4/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/jrun-web.xml and add an entry to point to your CFIDE folder, which may look like this:

<virtual-mapping>
<resource-path>/CFIDE</resource-path>
<system-path>C:/inetpub/wwwroot/CFIDE</system-path>
</virtual-mapping>

Then, you can access the cluster-manager ColdFusion 7 Administrator by pointing your browser to port 8300 on the server you want to manage.

I found the easiest / most reliable way to get this working is to use one interface in Multicast mode, and if necessary, add a static ARP entry to the router:
"Some routers require a static ARP entry because they do not support the resolution of unicast IP addresses to multicast media access control addresses. For example, Cisco routers require an ARP (address resolution protocol) entry for every virtual IP address. While Network Load Balancing uses Level 2 Multicast for the delivery of packets, Cisco's interpretation of the RFCs is that Multicast is for IP Multicast. So, when the router doesn't see a Multicast IP address, it does not automatically create an ARP entry, and one has to manually have to add it on the router."

(Source: http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/clustering/nlbbp.mspx)

If you DO have two interfaces, make sure they're both attached to the same LAN segment, and that the NLB interfaces are the secondary interfaces.

If they're not on the same LAN segment, then the cluster will usually fail to synchronize with the other members, and you'll have severe problems getting session failover working reliably.

If the NLB interface is left as the primary interface, then connections to the JNDI port fail (the TCP connections are closed as right after they are opened, with no actual data transferred), and the instance manager shows "network error" for remote instances.

With multiple interfaces active, it may also be necessary to add one or more UnicastPeer attribute(s) in the ClusterManager section of jrun4\servers\{instance}\SERVER-INF\jrun.xml

That being said, I found that disabling the secondary NIC and using Multicast mode worked well, each and every time, without having to do anything beyond registering the remote instance and adding it to the cluster.

Thanks to Matt Stevanus for finding the network order dependency.

No comments:

Post a Comment