Two Node Server Cluster Topology

Server Clusters:

Middleware application server should be setup as per fault tolerant architecture.

In two node separate clusters topology, we will have two node cluster in primary site and two node cluster in DR site.

Primary site nodes will be active carrying real traffic. DR site node will be passive; it will not take any workload while primary site is operational. In case of disaster, DR site clusters will become active and take the workload.

During production deployment, Processes should be deployed in both primary and DR site server clusters .Also externalized variables /process properties should be configured in both server accordingly.

Fault tolerant architecture.

It will have multilevel failure protection; server level, Load balancer (LB) lever, storage level. If any of the server is down, traffic will be forwarded to other server by load balancer.

From the internet, web traffic will come via WAF. If primary site LB is down, traffic can be forwarded to DR site LB by WAF. But unless it is absolute necessary, don’t channel the traffic to DR site, because you need to synch up data from DR site to primary site during failback.

If case storage /database failure of primary site, this setup can leverage data of DR as it is expected that data will be synched up to DR site in real time by leveraging storage replication tools.

Unlike java web application, the web service call is stateless. That means server does not need to maintain state for the web service call. Each call is independent. So session replication is not required. Though server can replicate other information like cache if distributed caching is used.

If server stores operational data in SAN or database that may be replicated to DR site. But if process does not require to store any information on disk or database then nothing needs to be replicated.

Failover:

In case of disaster source application can point to DR site URL directly. In that case source code may need to be changed where URLs are hard coded.. Normally URL are externalized variable and kept in a process properties. That process properties can be updated directly from server console without redeployment requirement. Another option is not to change the URL but is to change the IP address of the URL. In this case requester application does not have any impact in case of disaster. We should ask Domain name server administrator to change the IP address of the URL. The only problem in this case is that it may take sometimes for server to point to the correct IP address because new DNS lookup will not be initiated by server unless DNS records expire as per TTL value. . Typical TTL value in DNS records are set below 1 hour in case of web service call. In this case, it is expected that system will get the correct IP address of URL in DNS lookup after couple of hours. We should maintain a check list of activities for failover. Not only URL needs to get changed, other variables also may require to change. Also database/storage admin support may be required as replication needs to be in opposite direction (from DR to Primary site). Now- a-days modern middleware system can provide RPO to 0 and RTO less than 1 hour.

Failback:

If primary site comes up then application should use primary site infrastructure. All the application which is using DR site URL/IP address should use primary site URL/IP address. Requester application configuration /properties should be changed again to point to primary site resources. SAN and database of DR site should by synched up with primary site as required.