Some of the most common health checks I see with load balancers include tcp handshakes and tcp half-opens. In a TCP 3-way handshake healthcheck, the load balancer sends a SYN, gets a SYN, ACK from the server, and then sends an ACK back. At this point, it considers the resource up. In a TCP-Half-Open healthcheck, the load balancer sends a SYN, gets a SYN-ACK from the server, and then considers it up. It also sends a RST back to the server so the connection doesn’t stay open, but that’s neither here nor there.

We all know that a much better healthcheck would be something that validates content on the end-systems, like an HTTP GET for a specific page, looking for an HTTP 200 response so we know that the content exists, but that isn’t always necessary. Sometimes, a tcp-half-open or a tcp-handshake might be the best way to go.

If going with either tcp health check method, you’re simply checking whether something is answering at the specified port on your system. If you’re load balancing HTTP traffic to an apache box that runs apache on port 80, doing a tcp healthcheck to port 80 will usually tell you whether Apache is running, but won’t necessarily tell you that your content is valid. Of course, that’s ok if you trust your ability to validate that on your own.  An interesting problem with doing a tcp-check, is that you need to know whose health you’re actually checking!

Let’s assume for a moment that the servers to which you’re load balancing traffic are behind a firewall instead of being local to your load balancer. If the firewall is acting as a full-proxy (like an F5 load balancer does) and you simply send a tcp-half-open or tcp-handshake, all you’re doing is checking the health of the firewall. A full proxy will complete a 3 way handshake with the client (in this case the load balancer) before completing a 3-way handshake with the server. By doing this, the box can, to a certain point, keep the client from starting a SYN-Flood. The only way the server sees the traffic is if the 3-way handshake actually completed.

Here’s the traffic flow for sending a tcp 3-way handshake from the load balancer to a system behind a firewall:

1. The load balancer sends a SYN packet to the server.

2. Since the Firewall is a full-proxy, it actually gets the SYN, and sends a SYN, ACK to the load balancer.

3. The load balancer sends an ACK to what it assumes is the system it’s load balancing, but is actually the firewall.

4. Now that the handshake is complete, the firewall completes a 3-way handshake with the server.

5. Now, if the load balancer were to send an HTTP GET for /index.html, it would send it to the firewall and the firewall would send it to the server.

If we use our above flow for a TCP-Half-Open check, here’s what we get.

1. The load balancer sends a SYN to the destination server.

2. The firewall responds with a SYN, ACK.

3. The load balancer has no idea that the firewall, rather than the server, sent the SYN, ACK and therefore considers the connection up and sends a RST to kill the connection.

Another problem is that the firewall will complete a 3-way handshake with the load balancer even if the server isn’t online. While some devices, F5 load balancers for example, allow you to configure them so they don’t even complete a handshake if the systems behind them are down, this is far from the norm.  So, by doing a tcp-check, we aren’t actually checking the destination server’s health at all.

In short, it’s important to understand what systems are between your load balancer and the systems to which you want to send traffic. If you encounter a proxy on the way, you’ll likely want to use a more intelligent healthcheck than simply seeing whether a service is listening on a certain port. Using HTTP traffic as an example, send an HTTP-GET request for a certain page and look for a specific response code. Doing so will ensure your destination server, and not a firewall, is responding to your health checks.  As cloud computing continues to ramp up, it’ll become more frequent that load balancers are sending traffic to systems in the cloud, thus often encountering firewalls and full-proxies on the way.