You are currently browsing the tag archive for the ‘application delivery’ tag.

As I discussed in a previous post, simply redirecting a user to a “friendly” 404 page isn’t the best option. First, the user might not remember what they clicked/typed to get them to the page and also, simply clicking the back button might not be an option, especially if they submitted form data. Fortunately, as F5 LTMs are “Strategic Points of Control,” we can use them to better handle the situation.

 

First off, let’s determine the desired behavior when a user request induced an error code. In our case, let’s choose 404 as the error code to watch for. If we detect this error code being sent to the user, let’s redirect them to our Home Page (www.sample.com) rather than simply keeping them at an error page. To make their experience better, let’s also hold the user at an custom page for a few seconds while explaining their issue as well as the request that caused the problem.

 

Since we’re looking for error codes sent by the servers, we’ll need to run our commands from within the “HTTP_RESPONSE” event. As you’ll see from the DevCentral wiki page for HTTP_RESPONSE, there are examples for using the “HTTP::status” command to detect error codes and redirect users. For some, the rule below is perfectly acceptable.

 

when HTTP_RESPONSE {

if { [HTTP::status] eq “404” } {

HTTP::redirect “http://www.sample.com” }

}

 

Unfortunately, that rule would result in the user being sent to the redirect page without any explanation as to what they did wrong. So, we’re going to beef the rule up a bit. As you’ll recall from this post, we can set variables from the HTTP_REQUEST event and then reference them from our HTTP_RESPONSE event in order to show the user the link that caused the error.

Here’s a nice sample rule I just whipped up. We’re using “HTTP::respond” so the response comes directly from LTM. Also, I’m setting a variable “delay” to the amount of seconds to keep the user at the hold page.

 

when HTTP_REQUEST {

set hostvar [HTTP::host]

set urivar [HTTP::uri]

set delay 4

}

when HTTP_RESPONSE {

if { [HTTP::status] eq “404 } {

HTTP::respond 200 content \ “<html><head><title>Custom Error Page</title></head><body><meta http-equiv=’REFRESH” content=$delay;url=http://www.sample.com/></head>\<p><h2>Unfortunately, your request for $hostvar$urivar casued a 404 error. After 4 seconds, you’ll automatically be redirected to our home page. If you feel you’ve tried a valid link, please contact webmaster@sample.com. Sorry for this inconvenience.</h2></p></body></html>” “Content-Type” “text/html”

}

}

 

So, with that rule, the user requests a page that causes a 404 error. LTM will detect the 404 error, and instead of sending it to the user, it will respond with a 200 status code and HTML showing the user the link the requested as well as apologizing and telling them to contact your webmaster if there’s an issue. I was too lazy to use the HTML to make the e-mail address clickable, maybe next time. Also, by using “Meta Refresh,” we’re holding the user at the page for 4 seconds and then sending them to our error page. As you can see, HTTP::respond is a very powerful command. It’s pretty cool being able to use LTM to send HTML to a user.

 

 

Conserving public IP addresses has always been a good idea. Naturally, it’s become more important lately but that’s neither here nor there as far as this post goes.

Let’s assume you’re managing a website powered by an F5 BIG-IP LTM. You’ve got the following setup:

1. Virtual Server with IP Address 1.1.1.1 and listening on port 80.

2. A Pool called “pool_webservers” containing web servers 10.1.1.1:80, 10.1.1.2:80, 10.1.1.3:80, 10.1.1.4:80, and 10.1.1.5:80.

3. A DNS record “www.sample.com” pointing to the Virtual Server’s IP Address of 1.1.1.1

While the site is working fine, you’d like to be able to access individual web servers from an external network. This way, if a customer tells you your site isn’t working, you can test each server individually to try and narrow it down. Also, perhaps you’re releasing code to individual servers and would like to make sure it looks good.

This is a very common requirement for sites. Unfortunately, since your servers are using non-internet routable addresses from the 10.1.1.0 network, you can’t hit them externally.

People frequently deal with such an issue by doing one of the following:

1. Assign public IP addresses to each server and creating DNS records accordingly.

In this case, DNS might look like this: www1.sample.com=1.1.1.2, www2.sample.com=1.1.1.3, etc.

2. Create NATs on a public-facing router or Load Balancer to translate the public IPs to the server’s private ones.

In this case, DNS would look the same as above.

Unless you’re using port translation (1.1.1.2:80 = server 1, 1.1.1.2:1080 = server 2 etc,) then you’re using a Public IP address for each server you’d like to access. Since larger sites typically have far more than 5 servers, it’s each to chew up Public Addresses quickly.

Fortunately, we can use iRules to “route” requests to the proper web servers without using a single additional Public IP Address. From the DevCentral iRules Commands page, you’ll notice an event called “HTTP::host.” When a user types “www.sample.com” into their browser, their HTTP request contains an “HTTP Header” that contains the host (www.sample.com,) they requested.

If you’ll remember our layout, we have a Virtual Server at 1.1.1.1:80 served by “pool_webservers” with members 10.1.1.1:80, 10.1.1.2:80, 10.1.1.3:80, 10.1.1.4:80, and 10.1.1.5:80. http://www.sample.com points to 1.1.1.1 and is how users access the site. Now, we’d like the ability to target individual pool members from outside the network. Typically, this would require a public IP address for each web server but with iRules, we’re all set.

First, we’re going to create additional DNS records. Fortunately, they’re all going to point at the same 1.1.1.1 address as the other ones. Our DNS zone for “sample.com” now looks like this:

www IN A 1.1.1.1

www1 IN A 1.1.1.1

www2 IN A 1.1.1.1

www3 IN A 1.1.1.1

www4 IN A 1.1.1.1

www5 IN A 1.1.1.1

Now, it’s time to put together our iRule. As I was extremely inspired by Joe Pruitt’s recent post comparing iRule Control Statements, I thought I’d give multiple examples of how to accomplish our goal.

First, we’ll go with a simple “else, if” rule.

when HTTP_REQUEST {

if { [string tolower [HTTP::host]] eq “www1.sample.com” } {

pool pool_webservers member 10.1.1.1 80

} elseif { [string tolower [HTTP::host]] eq “www2.sample.com” } {

pool pool_webservers member 10.1.1.2 80

} elseif { [string tolower [HTTP::host]] eq “www3.sample.com” } {

pool pool_webservers member 10.1.1.3 80

} elseif { [string tolower [HTTP::host]] eq “www4.sample.com” } {

pool pool_webservers member 10.1.1.4 80

} elseif { [string tolower [HTTP::host]] eq “www5.sample.com” } {

pool pool_webservers member 10.1.1.5 80

}

}

Well, that was painless enough. If a user’s host header is “www1.sample.com,” we’re sending them to 10.1.1.1:80. Simply bind that iRule to our 1.1.1.1:80 Virtual Server and we’re set. You might also notice I’m using “string tolower.” That just converts the value to lowercase so I don’t have to support users inputting combinations of upper and lower case characters. Most browsers automatically convert the host header to lowercase but not all. If you read either “control statement” post above, you’ll notice that if/elses are hardly the most efficient method for doing something like this.

Now, we’ll try a “switch statement.”

when HTTP_REQUEST {

switch -glob [string tolower [HTTP::host]] {

“www1.sample.com” { pool pool_webservers member 10.1.1.1 80 }

“www2.sample.com” { pool pool_webservers member 10.1.1.2 80 }

“www3.sample.com” { pool pool_webservers member 10.1.1.3 80 }

“www4.sample.com” { pool pool_webservers member 10.1.1.4 80 }

“www5.sample.com” { pool pool_webservers member 10.1.1.5 80 }

default { pool pool_webservers }

}

}

This is a much cleaner, more efficient option. As you’ll notice, I used “-glob” with switch. Glob allows you to use wildcards and also look for patterns. If you read the above post comparing control statements, you’ll notice -glob isn’t as efficient as just using switch. Since we aren’t doing any pattern/wildcard matching here, you could easily leave off the -glob. I like to use it just in case I decide to add such enhancements later. I also used a “default” statement so requests not matching the other statements would go to our normal pool.

My personal preference is to use “classes/data groups.” A class is essentially a list that can be searched or matched. Typically, you have the field you’re matching and a value you can record should that value be matched. In version 10, the class features were greatly enhanced.

For our sample rule, our class could look like this:

class host_headers {

{

“www1.sample.com” { “10.1.1.1” }

“www2.sample.com” { “10.1.1.2” }

“www3.sample.com” { “10.1.1.3” }

“www4.sample.com” { “10.1.1.4” }

“www5.sample.com” { “10.1.1.5” }

}

}

In this case, “www1.sample.com” is what we’re matching against and “10.1.1.1” is the value we’d like to return. If simply using “class match,” we can ignore/omit the value on the right. If using “class search -value,” then we’re trying to return it. Here’s an example:

when HTTP_REQUEST {

if { [class match [string tolower [HTTP::host]] eq host_headers] } {

set hostvar [class search – value host_headers eq [string tolower [HTTP::host]]]

pool pool_webservers member $hostvar 80 }

}

The first thing we did was compare the host header to our class/datagroup called “host_headers.” If there’s a match, we set a variable called “hostvar” to the corresponding value. If the user requested “www1.sample.com,” for instance, the corresponding value in the class is “10.1.1.1.” So, now that “hostvar” = 10.1.1.1, we reference the variable in our pool command. So, the pool command essentially became “pool pool_webservers member 10.1.1.1 80.”

Joe’s “Comparing iRule Control Statements” showed that using classes was ridiculously efficient. Using classes can make it a bit more difficult to understand what an iRule does as it requires reading the rule and then reading the class contents. With that said, it’s very efficient and minimizes the amount of text within the rule. The ability to extract a value is very nice too.

To “complicate” things a bit, let’s assume you don’t want people outside of your IP space to access individual servers. If you’re releasing new code or price updates, there’s a fair chance you don’t want people hitting the system being worked on. To accomplish this, let’s create an address-type “data group/class.” containing the IP Address or Network we’d like to allow access. Let’s assume this class is called “allowed_access”

when HTTP_REQUEST {

if { [class match [string tolower [HTTP::host]] eq host_headers] and ! [class match [IP::client_addr] eq allowed_access] } {

HTTP::respond 403 “You’re not allowed!” }

else {

set hostvar [class search – value host_headers eq [string tolower [HTTP::host]]]

pool pool_webservers member $hostvar 80 }

}

Now, if a user requests one of our “specific server host-headers,” but doesn’t match the allowed IP addresses class, we’re going to respond with an HTTP 403. If they do match both conditions, the rule should operate normally.

While my examples used iRules to target specific servers using host headers, it shouldn’t stop there. Let’s say you’re administering tons of different sites similar to the following:

http://www.sample.com = main company page

http://www.domain.com = a domain registrar site you’re hosting

http://www.social.com = you’ve jumped on the social networking bandwagon and are hosting facebook variant

http://www.dating.com = self explanatory

It’s fair to assume you’d have different web servers hosting these sites. Typically, you’d have a different Virtual Server as well as the corresponding public IP as well. That’s not always necessary though. Using our switch statement from above, we can change our pool command a bit.

when HTTP_REQUEST {

switch -glob [string tolower [HTTP::host]] {

“www.sample.com” { pool pool_sample}

“www.domain.com” { pool pool_domain }

“www.social.com” { pool pool_social }

“www.dating.com” { pool pool_dating }

default { pool pool_default }

}

}

One of the more popular e-mail/forum signatures I see is “with iRules, you can.” I think this is a great example. Since LTM is a “Strategic Point of Control,” it can extract information such as the Host Header, or a Requested URI, and react to it.

It shouldn’t surprise anyone that I enjoy new technical challenges. While I think I’ve become pretty decent at writing iRules, I’m constantly reminded of how much more I have to learn.

Yesterday, someone posted a question on DevCentral that I couldn’t initially answer. They were running an online forum and wanted to keep a user from posting spam. Their idea was to search the post when it was submitted and if it contained a “blocked word,” prevent the post from being made. Unfortunately, the vast majority of my experience with iRules has been around inspecting HTTP GET requests and responses. In order to accomplish what this user wanted, the iRule would have to search the Payload of an HTTP POST which was new to me.

 

Fortunately, there were plenty of examples on DevCentral where people did something similar.  One of the most popular examples is for Sanitizing Credit Card Numbers. That iRule searches the response payload for strings that match credit card patterns. In this case, we’re searching the request data instead.

 

While the vast majority of rules I’ve seen only care about requests and responses, this was such an awesome reason to look at the payload, I thought I had to learn and also had to share it. Thanks to DevCentral user Hoolio’s posts as well as the awesome wiki, I had a relatively easy time learning. Yet another great reason for leveraging your “Strategic Points of Control” I’m curious to know what other uses for inspecting request/response data people could think of.

 

Here’s the code I ended up recommending.

 

when HTTP_REQUEST {

   # Only check POST requests
   if { [HTTP::method] eq "POST" } {

      # Default amount of request payload to collect (in bytes)
      set collect_length 2048

      # Check for a non-existent Content-Length header
      if {[HTTP::header Content-Length] eq ""}{

         # Use default collect length of 2k for POSTs without a Content-Length header
         set collect_length $collect_length

      } elseif {[HTTP::header Content-Length] == 0}{

         # Don't try collect a payload if there isn't one
         unset collect_length

      } elseif {[HTTP::header Content-Length] > $collect_length}{

         # Use default collect length
         set collect_length $collect_length

      } else {

         # Collect the actual payload length
         set collect_length [HTTP::header Content-Length]

      }

      # If the POST Content-Length isn't 0, collect (a portion of) the payload
      if {[info exists collect_length]}{

         # Trigger collection of the request payload
         HTTP::collect $collect_length
      }
   }
}

when HTTP_REQUEST_DATA {
# Define a string-type datagroup called dg_blocked containing words to be blocked
   if { [matchclass [HTTP::payload] contains dg_blocked] }{
      HTTP::respond 403 "Blocked"
   }
}


 

I’ve only recently started to look at the blog statistics provided by wordpress. One of my favorite data points is the “searches” through which users find my blog. One of the most popular searches pertains to having your F5 LTM use an iRule to send users to a maintenance page if all servers in a pool are down. Since I hate the idea of F5 customers being unable to leverage their device for a very common scenario like this, I thought I’d write a post.

 

As a reminder, there are plenty of examples of this exact scenario on devcentral.f5.com.

 

First, the easy way…simply utilize a “fallback host” in an HTTP Profile attached to your Virtual Server. If LTM is unable to connect to a pool member to serve a request, it’ll send the customer a redirect.

 

http://support.f5.com/kb/en-us/solutions/public/6000/500/sol6510.html?sr=11563781

 

As you’ll notice, the post above also illustrates how to use an iRule for a similar task.

when LB_FAILED {

if { [active_members [LB::server pool]] < 1 } {

HTTP::fallback “http://www.sample.com/redirect.html&#8221; }}

 

I rewrote the example rule a bit, but the point is there. If the pool to which the user was attached has less than 1 active member, then utilize a fallback. It’s important to note that the pool members are only inactive if they’ve failed their health checks. So, if you’re using a tcp port check and a pool member is throwing 500s for every request, it’ll remain up. In order to combat this, you can either use a better health check, or build additional logic into your rule.

 

when HTTP_RESPONSE {

if { [HTTP::status] eq “500” } {

HTTP::redirect “http://www.sample.com/redirect.html&#8221; }}

 

Now, rather than looking at the number of active pool members, we’re redirecting users if their pool member sent a 500. The negative to this method is that the pool might have other members that aren’t serving 500s which is why reselecting a pool member might be the better options. I’ll touch on that in another post.

 

As I discussed in this post, sending an HTTP GET for a page on a server to which you load balance traffic is one of the better health checks available. If you use the right page, it can be an extremely light-weight, yet highly reliable check.

In order to properly utilize these health checks, you need to know enough about the application you’re supporting to understand how it behaves when it fails.

In my case, I send traffic to a pool of Apache Servers running mod_weblogic. From there, the traffic is sent to application instances.

 

Using an F5 BIG-IP LTM as an example, there are several configurable parameters when defining a health check.

 

1. Interval (How often the check is sent)

2. Timeout (How long does the resource have to respond)

3. Send String (The request you’re sending the resource)

4. Receive String (What response causes the health check to pass?)

5. Receive Disable String (What response causes the health check to fail)

 

There’s several more, but let’s concentrate on the typical ones.

The default interval is 5 seconds while the timeout is 16. I’ve always been ok with that.

For our send string, let’s do “GET /login.jsp HTTP/1.1\r\nHost: \r\nConnection: Close\r\n\r\n”

So, we’re sending an HTTP GET for /login.jsp using HTTP/1.1 and an empty host header. We’re also closing out the connection so it doesn’t have to sit idle on the server.

For our receive string, let’s do “HTTP/1\.(0|1) (2)”

So, we’re considering a response starting with 2 using HTTP 1.0 or 1.1 as a success. Typically, a server will respond with a 200 when all is well so this is pretty typical.

 

Unfortunately for me, our resource actually sends an HTTP 301 (Permanent Redirect) when a user tries loading the login page. This happens fairly often, especially if you’re sending a health check for “/” and the resource redirects you to a different directory. Since we consider this permanent redirect to be normal behavior, we’ll modify our receive string to “HTTP/1\.(0|1) (2|3)” Now, we’re including all 3** responses as well. Since a failed resource will usually timeout or send a 404/500 when it fails, this should work well.

 

As I mentioned before, my LTM sends traffic to Apache which then sends it to our App instances via mod_weblogic. So, what happens when the app instances are down? I’d expect a 404 or 500 from Apache, right? Sure, as long as your application folks haven’t configured it to send an HTTP 302 (Temporary Redirect) so users go to a custom error page when the App Instances are down.

 

So, here’s what we’ve seen:

 

1. During normal conditions, the resource returns a 301 for its health check.

2. If application instances are down, the resource returns a 302 for its health check.

 

Naturally, we need to modify our Receive String

 

HTTP/1\.(0|1) (2|3)

to

HTTP/1\.(0|1) (2|301)

 

We’re still allowing any 2xx response but are now only allowing 301s.

 

We’ve done what we wanted to. We’ve configured a health check that accurately determines the system’s health. As you’ve noticed though, it required trial and error, and a lot of testing. When determining a health check strategy, it’s critical that either you or an application owner understands their application’s behavior while it’s working, and even more importantly, when it’s not. Also, it’s not always wise to “set and forget” these checks. If, for instance, our application folks changed the “/login.jsp” redirect from a 301 to a 302, the check would fail, and we’d have to come up with a new strategy.

 

 

If you have any familiarity with performance monitoring in a large environment, you’ve likely heard of Gomez. In a similar fashion, if you have experience with application delivery or load balancing, you’ve likely heard of F5. While F5 helps you deliver applications as efficiently as possible, Gomez typically helps you measure and monitor them.

Like most hosted monitoring services, Gomez provides the ability to test a website from multiple locations, multiple browsers, and multiple networks. While these capabilities give a site owner a view into when and where issues occur, they don’t 100% show what users are seeing. Obviously if DNS or routing isn’t working, Gomez will see it, just like your customers would. Unfortunately though, Gomez can’t replicate every single browser, network connection, and machine from which a client might hit your site.

To solve this problem, Gomez recommends “Real-User” monitoring. In order to leverage this technology, users must insert client side JavaScript onto their web page requests. Unfortunately, if you’re using Gomez, you’re likely monitoring a fairly large site so having to integrate this JavaScript could get very complicated.

Luckily for F5 users, Gomez is a Technology Alliance Partner which makes this problem quite a bit easier to solve. Since F5s are “Strategic Points of Control” that see the client requests and application responses, it’s easy enough to leverage them for the Real-User monitoring.

Joe Pruitt wrote a series of articles on how to leverage iRules to obtain real-user monitoring without having to make application changes.

Part 1 is here.

Part 2 is here.

Part 3 is here.

Throughout the series, Joe discusses how to link client requests to a Gomez account and allows site owners to view stats on a Page, Data Center, or Account basis. While it’s a fairly “complex” iRule, it’s an amazing example of utilizing “network scripting” to allow leveraging an amazing monitoring technology.

While a typical Gomez implementation gives you visibility into how your site is performing for their probes, real-user monitoring shows you how it’s performing for your actual customers. This is a huge win for both designers and troubleshooters. Imagine being able to see that 10% of your users are having issues with a particular page and only in a particular Data Center. Talk about expediting the troubleshooting process. Also, if you can see that your page load times are exceeding SLAs but only for mobile users, you’ve quickly identified a page that might be a candidate for mobile optimization.

Performance monitoring has obviously come a long way in the last few years. Once upon a time, it was adequate to load separate pages. Now, transactional monitoring is typically a requirement. Again, a simple Gomez implementation does allow you to monitor that your systems are handling transactions but it doesn’t tell you that your users are really completing them.

With most monitoring vendors, you pay extra to have a site tested from multiple locations. By utilizing real-user monitoring, you’ve turned every one of your visitors into a monitoring probe and are able to gather and act upon the data they’re generating for you. In my opinion, the biggest win with real-user monitoring is that you’re 100% seeing issues before your customers report them…as long as the user can get to your F5s anyways.

For awhile now, F5 has been referring to their BIG-IP products as “Strategic Points of Control.” When I first heard that phrase, I didn’t really understand what they were trying to say and assumed it was “marketing speak.” As I’ve gotten better at leveraging F5 technologies to solve my very complicated requirements, I’ve begun understanding what they meant.

I was going to write a blog post about “Strategic Points of Control” a couple months ago, but Lori MacVittie had already beaten me to it.

She defines Strategic Points of Control as “Locations within the data center architecture at which traffic (data) is aggregated, forcing all data to traverse the point of control.”

I think that’s a great definition so I’ll happily use it here.  For our example, let’s assume we’re hosting an E-Commerce site. Naturally, traffic traverses our F5 LTMs on its way to our application instances. This means the F5s are not only a point of failure, but also a point of control. They see all inbound and outgoing content for this application. Since F5 does a wonderful job of building L7 visibility into their devices, LTM becomes a great candidate for altering or reporting on the traffic flowing through it. Of course, just because it can, doesn’t mean it should.

Someone posted a question on DevCentral (F5’s User Community) wondering when it was prudent to use iRules. Naturally, most of us answered “it depends.”

While almost everyone appreciates the flexibility of iRules, some fear that might be used when they shouldn’t be.

I recently worked on a project that required us to ensure an HTTP application only used HTTPS. Since this application was being fronted by an F5 LTM pair, it made sense to terminate the SSL there and send cleartext between the F5 and application.  While sometimes, it’s as easy as making an HTTPS Virtual Server and applying an SSL profile containing the proper cert, I wasn’t that lucky. This particular application sent redirects to the user based on how it was being accessed. If it was being hit over HTTP, it sent redirects specifying http. If it was being hit over HTTPS, it sent redirects specifying https. In this case, even though we were using HTTPS between the client and LTM, the application would still see traffic over HTTP since we weren’t re-encrypting the data between LTM and the application. Naturally, this would cause a user to stop using SSL as soon as they clicked a link.

Fortunately, since LTM sees the traffic between itself and the application, it can see these redirects and rewrite them. By using “redirect rewrite,” I was able to rewrite the redirects sent by the application to use https. Unfortunately, this application also had javascript buttons that when clicked, would cause the user to send a GET request specifying HTTP. Again, since LTM is a “strategic point of control” and sees the traffic, I simply wrote an iRule to redirect all HTTP requests for this Virtual Server to HTTPS.

After creating the iRule for the redirect, I let the application team know that we were ready for them to start testing. They were somewhat surprised that I was able to make the application use HTTPS without them making any changes. One of them actually said, “awesome, I like when it’s easy like this and we don’t have to hack crap together.” With a huge smile on my face, I said, “that’s pretty much exactly what I just did.”

It only took me about 10 minutes to brush up on “redirect rewrites” and since I had written plenty of “http-to-https” iRules, this was extremely easy. At the end of the day though, I used iRules to fix an application “issue.” While this is one of the best features of iRules, it demonstrates their potential use as a mitigation tool. What if I was the only person to have a good understanding of iRules or how we were using LTM to handle the redirects for this application? If someone accidentally altered or removed that iRule, the application would start having issues. If the application code was rewritten to only use HTTPS, there really wouldn’t be any concerns. Of course, there are a ton of application instances and by making the change on the F5s, we keep traffic from having to get to the apps just to be redirected and also are able to make a change in only one place.

One of the most enjoyable posts I’ve made dealt with using iRules to generate Heatmaps to illustrate site visitors. Even though I tested this iRule and got it working well, I ended up choosing not to use it. Because my site leverages Akamai’s DSA product, we have access to very similar information through their portals. By using their site to track this info, I essentially traded one Strategic Point of Control for another. Obviously I saved myself a performance hit on our F5s, but it really came down to whether tracking users like this was a proper use of my LTMs.  The answer, as always, is that “it depends.” For sites that don’t have Akamai or some other product that also has visibility into information like this, F5 might be your best option.

Assuming you’re using Akamai and have an F5 deployment, you’ll run into several areas of overlapping technologies:

1. Using Context to handle different users…differently.

2. Protecting application resources by throttling users based on whether cookies exist.

3. Web Application Firewalling

4. Redirects

5. Limiting access to a site to certain geographic areas/types of users

6. Compression, Caching, Acceleration

The list could easily go on, but it demonstrates some potential challengers an architect might face. Since both Akamai and F5s are strategic points of control, which should you use? I think the most accepted rule is “the closer to the user, the better.” In reality, it comes down to a cost/benefit comparison. While making these decisions in Akamai-land both limit traffic to your infrastructure and also accelerate the user experience, there’s a price for that. Assuming you already have capacity on your LTM, it would be free (save labor) to use it instead whereas Akamai would likely charge for each feature.

I’ve often spoken about how valuable learning from failures is to an IT professional’s development. The challenge is how best to limit these failures to a controlled environment in which business impact is minimized.  Due to the complexities of IT environments, it’s not always easy to notice a “mis-configuration” when it happens, thus exposing businesses to potentially pro-longed issues.

Fortunately, a lot of systems provide logging capabilities. Pretty much every network vendor allows SNMP trapping and syslogging from their devices. The challenge is configuring these properly and making sure you’re always watching them.

Here’s an example iRule that limits access to certain domains:

when HTTP_REQUEST {

if { ! [class match  [HTTP::host] eq dg_host] } {

reject

log local0. “[IP::client_addr] went to [HTTP::host][HTTP::uri] and was rejected.” } }

The line “log local0. “[IP::client_addr] went to [HTTP::host][HTTP::uri] and was rejected.”” is only executed if a user hits the Virtual Server with a host-header that doesn’t exist in our Data group of allowed hosts.  This is similar to logging blocks on a router ACL.

While viewing my log entires, I quickly noticed I was blocking people trying to go to “Domain.com”, “DOMAIN.COM”, and “domain.com.”.  Since the users were still trying to go to the proper domain, I modified my statement from

“if { ! [class match  [HTTP::host] eq dg_host] } {”

to

if { ! [class match  [string tolower [[HTTP::host]] eq dg_host] } {

“string tolower” converts the specified string to lowercase. The reason I hadn’t initially done this was because most browsers automatically lower the host-header when they submit a request. By logging the blocks for my rule, I was able to see exactly what was getting blocked so I could make a change.

Since LTMs are typically placed at “strategic places of control” within a network, they can control and report on traffic. In this case, we’re logging the User’s IP address, Host Header, and URI request.

A typical log entry might look like “1.1.1.1 was blocked going to http://www.domain.com/index.html.&#8221;

This is actually a relatively simple logging statement. While having a recent issue where certain users weren’t accepting cookies from my LTM, I decided to add [HTTP::header “User-Agent”] to my logging which quickly pointed out that the users having issues were Google Droids which told me I needed to check our mobile-adaptive logic. If I added the User-Agent logic to my iRule above, I’d have quickly discovered which browsers don’t convert the host-headers to lower-case.

You can easily comment out logging commands from an iRule unless you need them. By locking at different points of your rule, you can quickly see at which steps you’re having issues.

One of F5’s best resources is its DevCentral community. On DevCentral, users can find tutorials, code samples, podcasts, forums, and many additional resources to help them leverage their investment in F5’s technologies. As an active contributor and reader of DevCentral, I was very pleased to see a tutorial on combining F5’s new built-in Geolocation database with Google’s charting API to make heatmaps to illustrate traffic patterns.

One of F5’s DevCentral employees, Colin Walker, first wrote a tutorial for using iRules to show domestic traffic patterns and then added the ability to illustrate world-wide patterns. By using these iRules, users are able to see a visual representation of how often their site is accessed from different areas of the country.

First, there’s a US view:

Then, there’s a world-view.

image

In both cases, the logic is relatively straight-forward. A user hits your site which triggers an iRule that increments a table on your F5 LTM. Based on the source address of the client, the F5 can determine from which state they originated and using the Google Charts API, can overlay that data onto a map, using different colors to represent different hit counts.

While this is great data, we still have to find a tangible use for it. Here are some thoughts I’ve had so far:

1. For companies using Akamai, the client_ip this iRule uses to determine the source of traffic will actually be Akamai’s server. If you want the true source, you need to change [IP::client_addr] to [HTTP::header “True-Client-IP”]. What’s even cooler is doing 1 heatmap with client_addr and 1 heatmap with True-Client-IP. The maps should actually look the same since Akamai has such a distributed computing model. Far more often than not, a user will hit an Akamai resource in their own state. If the maps aren’t the same, you have a problem.

2. Rather than simply using colors to illustrate access, keep a table of HTTP requests per state, look at the amount every 60 seconds, and divide by 60 to get HTTP Reqs/Sec for each state.

3. For E-Commerce sites that use promotions to target certain areas of the country, look at the heatmap before and after a promotion to see whether or not access from that area increased and if so, by how much.

4. If you don’t have any legitimate international customers, using the world view map can help you determine with which frequency your site is being accessed from outside the US. If often enough, it might be worthwhile using the built-in Geolocation services to block access for users outside the US.

5. Rather than looking at every single HTTP request, have the rule only look at certain ones – for instance a checkout page so you can compare conversion rate between states.

6. Same concept as number 5, but if you release a new product page, have your rule look at that page so you can determine where it’s most popular.

7. Watch the heatmap throughout the day to see during which hours different locations most frequently hit your site. In an elastic computing situation, this might allow you to borrow resources from systems that might not get hit until later in the day.

8. If you release a new mobile site, look at mobile browser user-agents as well as client ip address to see if mobile users in certain areas of the country are hitting your site more often than others. If you have bandwidth intensive applications, this might help determine where you’d derive the most benefit with another DC, or using a CDN.

These are just a few thoughts. I’m sure there are many many more opportunities to leverage these great technologies. It’s nice to see that F5 recognizes the value of including a Geolocation database with it’s product, but it’s even more impressive that they’re giving tangible examples of how to use this information to make a site better.

Another challenge is demonstrating these capabilities to the folks who make decisions based on them. In the past, IT has been criticized for finding solutions to problems that didn’t exist yet. New capabilities are being added so frequently that architects really need to look at very solution, determine whether there’s an opportunity, and then send such opportunities to decision-makers.

Some of the most common health checks I see with load balancers include tcp handshakes and tcp half-opens. In a TCP 3-way handshake healthcheck, the load balancer sends a SYN, gets a SYN, ACK from the server, and then sends an ACK back. At this point, it considers the resource up. In a TCP-Half-Open healthcheck, the load balancer sends a SYN, gets a SYN-ACK from the server, and then considers it up. It also sends a RST back to the server so the connection doesn’t stay open, but that’s neither here nor there.

We all know that a much better healthcheck would be something that validates content on the end-systems, like an HTTP GET for a specific page, looking for an HTTP 200 response so we know that the content exists, but that isn’t always necessary. Sometimes, a tcp-half-open or a tcp-handshake might be the best way to go.

If going with either tcp health check method, you’re simply checking whether something is answering at the specified port on your system. If you’re load balancing HTTP traffic to an apache box that runs apache on port 80, doing a tcp healthcheck to port 80 will usually tell you whether Apache is running, but won’t necessarily tell you that your content is valid. Of course, that’s ok if you trust your ability to validate that on your own.  An interesting problem with doing a tcp-check, is that you need to know whose health you’re actually checking!

Let’s assume for a moment that the servers to which you’re load balancing traffic are behind a firewall instead of being local to your load balancer. If the firewall is acting as a full-proxy (like an F5 load balancer does) and you simply send a tcp-half-open or tcp-handshake, all you’re doing is checking the health of the firewall. A full proxy will complete a 3 way handshake with the client (in this case the load balancer) before completing a 3-way handshake with the server. By doing this, the box can, to a certain point, keep the client from starting a SYN-Flood. The only way the server sees the traffic is if the 3-way handshake actually completed.

Here’s the traffic flow for sending a tcp 3-way handshake from the load balancer to a system behind a firewall:

1. The load balancer sends a SYN packet to the server.

2. Since the Firewall is a full-proxy, it actually gets the SYN, and sends a SYN, ACK to the load balancer.

3. The load balancer sends an ACK to what it assumes is the system it’s load balancing, but is actually the firewall.

4. Now that the handshake is complete, the firewall completes a 3-way handshake with the server.

5. Now, if the load balancer were to send an HTTP GET for /index.html, it would send it to the firewall and the firewall would send it to the server.

If we use our above flow for a TCP-Half-Open check, here’s what we get.

1. The load balancer sends a SYN to the destination server.

2. The firewall responds with a SYN, ACK.

3. The load balancer has no idea that the firewall, rather than the server, sent the SYN, ACK and therefore considers the connection up and sends a RST to kill the connection.

Another problem is that the firewall will complete a 3-way handshake with the load balancer even if the server isn’t online. While some devices, F5 load balancers for example, allow you to configure them so they don’t even complete a handshake if the systems behind them are down, this is far from the norm.  So, by doing a tcp-check, we aren’t actually checking the destination server’s health at all.

In short, it’s important to understand what systems are between your load balancer and the systems to which you want to send traffic. If you encounter a proxy on the way, you’ll likely want to use a more intelligent healthcheck than simply seeing whether a service is listening on a certain port. Using HTTP traffic as an example, send an HTTP-GET request for a certain page and look for a specific response code. Doing so will ensure your destination server, and not a firewall, is responding to your health checks.  As cloud computing continues to ramp up, it’ll become more frequent that load balancers are sending traffic to systems in the cloud, thus often encountering firewalls and full-proxies on the way.