You are currently browsing the tag archive for the ‘e-commerce’ tag.

I’ve only recently started to look at the blog statistics provided by wordpress. One of my favorite data points is the “searches” through which users find my blog. One of the most popular searches pertains to having your F5 LTM use an iRule to send users to a maintenance page if all servers in a pool are down. Since I hate the idea of F5 customers being unable to leverage their device for a very common scenario like this, I thought I’d write a post.

 

As a reminder, there are plenty of examples of this exact scenario on devcentral.f5.com.

 

First, the easy way…simply utilize a “fallback host” in an HTTP Profile attached to your Virtual Server. If LTM is unable to connect to a pool member to serve a request, it’ll send the customer a redirect.

 

http://support.f5.com/kb/en-us/solutions/public/6000/500/sol6510.html?sr=11563781

 

As you’ll notice, the post above also illustrates how to use an iRule for a similar task.

when LB_FAILED {

if { [active_members [LB::server pool]] < 1 } {

HTTP::fallback “http://www.sample.com/redirect.html&#8221; }}

 

I rewrote the example rule a bit, but the point is there. If the pool to which the user was attached has less than 1 active member, then utilize a fallback. It’s important to note that the pool members are only inactive if they’ve failed their health checks. So, if you’re using a tcp port check and a pool member is throwing 500s for every request, it’ll remain up. In order to combat this, you can either use a better health check, or build additional logic into your rule.

 

when HTTP_RESPONSE {

if { [HTTP::status] eq “500” } {

HTTP::redirect “http://www.sample.com/redirect.html&#8221; }}

 

Now, rather than looking at the number of active pool members, we’re redirecting users if their pool member sent a 500. The negative to this method is that the pool might have other members that aren’t serving 500s which is why reselecting a pool member might be the better options. I’ll touch on that in another post.

 

As everyone knows, retail is an extremely seasonal industry. Retail E-Commerce is no different so when building an environment to support a retail site, architects and engineers have to plan for the highest demand. Let’s pretend cloud computing doesn’t exist or isn’t feasible in this case.

 

You’ve got a site that has an average daily peak of 50Mbps but on Black Friday, the peak is 1.2Gbps. Besides Black Friday, no other day of the year exceeds 200Mbps. Naturally ISPs can provide burstable ethernet so you’re only paying for what you use, but switches, load balancers, etc might not provide the same capability. So, you might have to build (and buy) an infrastructure that supports 10 Gbps to provide for your “peak” growth as that 1.2Gbps number might grow at 40% a year or more.

 

Before building out this environment though, it might be beneficial to learn more about your “peak” demand. For instance, let’s say the peak happens at midnight on Black Friday and that it’s sustained from 12:00 – 12:50 AM. High demand continues the rest of the day, but never exceeds 500Mbps. Why are so many people hitting your site from 12:00 – 12:50 AM? Let’s assume the marketing people tell us that they release some sort of promotion allowing shoppers huge discounts starting at 12:00 AM and going throughout the day. Unfortunately, there’s only enough inventory for 100 of each discounted item, so shoppers hit the site as soon as they’re available.

 

Before this conversation, we were planning on building an infrastructure to support that 1.2Gbps (and beyond) number that’s only hit once per year, and for only an hour. Now that we know more about why that time period is so popular, it’s time to determine whether it’s “cost-effective.” Let’s say we’re spending $1M extra to support demand that exceeds 1Gbps. If we want to avoid that spend, what options do we have to keep our traffic spikes under 1Gbps? What if the promotions are released the night before Thanksgiving? What if different promotions were released each hour during the day? What if there was enough inventory to assure all customers the items they want? What if promotions were e-mailed to different customers at different times? Obviously a marketing group would be better able to answer these questions than I, but there’s a decent chance that such methods could eliminate the short (duration), large (size) spike. Perhaps rather than a 1.2Gbps spike from 12:00 – 12:50 AM, we see a 500Mbps spike from 11:00 PM – 3:00 AM. Assuming profitability isn’t tied to when folks are buying goods, such a change in traffic spikes would allow us to delay a large expense for at least another year.

 

Naturally, retail is a great arena for public cloud. What happens, though, when all retailers are on public cloud? Wouldn’t the cloud provider have to have a huge hardware footprint to support Black Friday for all of its retail customers? At any rate, supporting seasonal demand is definitely a challenge, but it poses some interesting opportunities.

As I discussed in my post about “Strategic Points of Control,” F5 LTMs are in a great position to capture and report on information. I’ve recently encountered several issues where I needed to log the systems sending HTTP 404/500 responses and the URLs for which they were triggered. While this information can be obtained from a packet capture, I find it much easier to simply leverage iRules to log the information.

 

If you don’t know too much about iRules, I’d encourage you to head over to DevCentral and do some reading. One of the first things you’ll learn is that there are several “events” in which an iRule can inspect and react to traffic. Each event has different commands that can be used. While some commands can be used in multiple events, some may not.

 

As an example, HTTP::host and HTTP::uri can be used in the HTTP_REQUEST event, but not in the HTTP_RESPONSE event. Since an HTTP Error Response sent by a server would occur in the HTTP_RESPONSE event (between server and LTM,) we can’t simply log the value of HTTP::host or HTTP::uri as those commands aren’t usable in the HTTP_RESPONSE context. Fortunately, variables can be set in one event and referenced in another which allows us to still access the proper information.

 

Here’s an overview of what we’re trying to accomplish:

 

1. A client makes a request to a Virtual Server on the LTM.

2. The LTM sends this request to a pool member.

3. If the pool member (server) responds with an HTTP Status code of 500, we want to log the Pool Member’s IP, the requested HTTP Host and URI, and the Client’s IP address.

 

We’ll be using the “HTTP::status” command to check for 500s. Since this command needs to be executed within the HTTP_RESPONSE event which doesn’t have access to HTTP::host or HTTP::uri, we’ll need to use variables.

From the HTTP_REQUEST event, we’ll utilize said variables to track the value of HTTP::host, HTTP::uri, and IP::client_addr.

The HTTP_REQUEST event in our iRule will look something like this:

when HTTP_REQUEST {

set hostvar [HTTP::host]

set urivar [HTTP::uri]

set ipvar [IP::client_addr] }

Now, we’ll check the HTTP status code from within the HTTP_RESPONSE event and if it’s a 500, we’ll log the value of the variables above.

when HTTP_RESPONSE {

if { [HTTP::status] eq 500 } {

log local0. “$ipvar requested $hostvar $urivar and received a 500 from [IP::server_addr]” }}

 

Now, whenever a 500 is sent, you can simply check your LTM logs and you’ll see the client who received it, the server that sent it, and the URL that caused it. This is a fairly vanilla implementation. I’ve had several situations in which I needed to also report on the value of a JSESSIONID cookie so our app folks could also check their logs. In a situation like that, you’d simply set and call another variable.

From HTTP_REQUEST:

set appvar [HTTP::cookie JSESSIONID]

From HTTP_RESPONSE:

log local0. “session id was $appvar”

 

This was a good example of how easily iRules can be leveraged to report on issues. Unfortunately though, this isn’t always a scalable option which is why I thought I’d talk about a product I’ve really enjoyed using.

The folks behind Extrahop call it an “Application Delivery Assurance” product. Since both co-founders came from F5, they have a great handle on Application Delivery and the challenges involved. Since I’m typically only concerned with HTTP traffic nowadays, I use Extrahop to track response times, alert on error responses, and also to baseline our environment. As an F5 user, I’m very pleased to see the product’s help section making recommendations on BIG-IP settings to tune if certain issues are seen.

I’d definitely encourage you to go check out some product literature. Since it’s not always fun to arrange a demo and talk to sales folks, they offer free analysis via www.networktimeout.com. Simply upload a packet capture, it’ll be run through an Extrahop unit, and you can see the technology in action.

 

 

If you have any familiarity with performance monitoring in a large environment, you’ve likely heard of Gomez. In a similar fashion, if you have experience with application delivery or load balancing, you’ve likely heard of F5. While F5 helps you deliver applications as efficiently as possible, Gomez typically helps you measure and monitor them.

Like most hosted monitoring services, Gomez provides the ability to test a website from multiple locations, multiple browsers, and multiple networks. While these capabilities give a site owner a view into when and where issues occur, they don’t 100% show what users are seeing. Obviously if DNS or routing isn’t working, Gomez will see it, just like your customers would. Unfortunately though, Gomez can’t replicate every single browser, network connection, and machine from which a client might hit your site.

To solve this problem, Gomez recommends “Real-User” monitoring. In order to leverage this technology, users must insert client side JavaScript onto their web page requests. Unfortunately, if you’re using Gomez, you’re likely monitoring a fairly large site so having to integrate this JavaScript could get very complicated.

Luckily for F5 users, Gomez is a Technology Alliance Partner which makes this problem quite a bit easier to solve. Since F5s are “Strategic Points of Control” that see the client requests and application responses, it’s easy enough to leverage them for the Real-User monitoring.

Joe Pruitt wrote a series of articles on how to leverage iRules to obtain real-user monitoring without having to make application changes.

Part 1 is here.

Part 2 is here.

Part 3 is here.

Throughout the series, Joe discusses how to link client requests to a Gomez account and allows site owners to view stats on a Page, Data Center, or Account basis. While it’s a fairly “complex” iRule, it’s an amazing example of utilizing “network scripting” to allow leveraging an amazing monitoring technology.

While a typical Gomez implementation gives you visibility into how your site is performing for their probes, real-user monitoring shows you how it’s performing for your actual customers. This is a huge win for both designers and troubleshooters. Imagine being able to see that 10% of your users are having issues with a particular page and only in a particular Data Center. Talk about expediting the troubleshooting process. Also, if you can see that your page load times are exceeding SLAs but only for mobile users, you’ve quickly identified a page that might be a candidate for mobile optimization.

Performance monitoring has obviously come a long way in the last few years. Once upon a time, it was adequate to load separate pages. Now, transactional monitoring is typically a requirement. Again, a simple Gomez implementation does allow you to monitor that your systems are handling transactions but it doesn’t tell you that your users are really completing them.

With most monitoring vendors, you pay extra to have a site tested from multiple locations. By utilizing real-user monitoring, you’ve turned every one of your visitors into a monitoring probe and are able to gather and act upon the data they’re generating for you. In my opinion, the biggest win with real-user monitoring is that you’re 100% seeing issues before your customers report them…as long as the user can get to your F5s anyways.

One of F5’s best resources is its DevCentral community. On DevCentral, users can find tutorials, code samples, podcasts, forums, and many additional resources to help them leverage their investment in F5’s technologies. As an active contributor and reader of DevCentral, I was very pleased to see a tutorial on combining F5’s new built-in Geolocation database with Google’s charting API to make heatmaps to illustrate traffic patterns.

One of F5’s DevCentral employees, Colin Walker, first wrote a tutorial for using iRules to show domestic traffic patterns and then added the ability to illustrate world-wide patterns. By using these iRules, users are able to see a visual representation of how often their site is accessed from different areas of the country.

First, there’s a US view:

Then, there’s a world-view.

image

In both cases, the logic is relatively straight-forward. A user hits your site which triggers an iRule that increments a table on your F5 LTM. Based on the source address of the client, the F5 can determine from which state they originated and using the Google Charts API, can overlay that data onto a map, using different colors to represent different hit counts.

While this is great data, we still have to find a tangible use for it. Here are some thoughts I’ve had so far:

1. For companies using Akamai, the client_ip this iRule uses to determine the source of traffic will actually be Akamai’s server. If you want the true source, you need to change [IP::client_addr] to [HTTP::header “True-Client-IP”]. What’s even cooler is doing 1 heatmap with client_addr and 1 heatmap with True-Client-IP. The maps should actually look the same since Akamai has such a distributed computing model. Far more often than not, a user will hit an Akamai resource in their own state. If the maps aren’t the same, you have a problem.

2. Rather than simply using colors to illustrate access, keep a table of HTTP requests per state, look at the amount every 60 seconds, and divide by 60 to get HTTP Reqs/Sec for each state.

3. For E-Commerce sites that use promotions to target certain areas of the country, look at the heatmap before and after a promotion to see whether or not access from that area increased and if so, by how much.

4. If you don’t have any legitimate international customers, using the world view map can help you determine with which frequency your site is being accessed from outside the US. If often enough, it might be worthwhile using the built-in Geolocation services to block access for users outside the US.

5. Rather than looking at every single HTTP request, have the rule only look at certain ones – for instance a checkout page so you can compare conversion rate between states.

6. Same concept as number 5, but if you release a new product page, have your rule look at that page so you can determine where it’s most popular.

7. Watch the heatmap throughout the day to see during which hours different locations most frequently hit your site. In an elastic computing situation, this might allow you to borrow resources from systems that might not get hit until later in the day.

8. If you release a new mobile site, look at mobile browser user-agents as well as client ip address to see if mobile users in certain areas of the country are hitting your site more often than others. If you have bandwidth intensive applications, this might help determine where you’d derive the most benefit with another DC, or using a CDN.

These are just a few thoughts. I’m sure there are many many more opportunities to leverage these great technologies. It’s nice to see that F5 recognizes the value of including a Geolocation database with it’s product, but it’s even more impressive that they’re giving tangible examples of how to use this information to make a site better.

Another challenge is demonstrating these capabilities to the folks who make decisions based on them. In the past, IT has been criticized for finding solutions to problems that didn’t exist yet. New capabilities are being added so frequently that architects really need to look at very solution, determine whether there’s an opportunity, and then send such opportunities to decision-makers.

One of my great “ah, ha!” moments in Application Delivery came when I was reading a post about compression by F5’s Lori MacVittie. At the time, I was with my previous employer and was considering starting a project to implement compression. When I began discussing it with others, I was told that certain versions of IE had issues with compressed data, even though they sent headers saying they accepted gzip. Since our customers were long term care facilities and could feasibly have older technologies, it wasn’t crazy to think they’d be browsing using pre-IE6 and might have problems. Since our principle rule in IT was to “First Do No Harm,” I didn’t want to cause a negative experience for some users simply to speed things up for others. My mistake at that time was that I made a blanket assumption about all users. I decided that because I shouldn’t compress content for older browsers, I couldn’t compress at all. In Lori’s article, she talks about how compression isn’t always advantageous – especially over a LAN. Prior to reading the article, I really hadn’t considered all the information users were giving me and even better, that I could make decisions based on that information.

When a user visits a website, their browser sends a number of “HTTP-Headers” (An incomplete list). A great example of an HTTP Header, is “User-Agent.” In this header, the web browser informs the site it’s visiting what type of browser it is.

Google Chrome, for example, sends “Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.428.0 Safari/534.1″

With this information, a website owner can decide to treat certain browsers differently than others.

An iPhone, for instance, might send something like”HTTP_USER_AGENT=Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1C25 Safari/419.3”

Some sites do an excellent job of leveraging this information to provide a better user experience. If you go to certain sites with an iPhone, you might notice yourself being redirected to m.example.com, or mobile.example.com or another “mobile-specific” site specially designed for a mobile device. Users obviously appreciate this since it keeps them from constantly having to zoom in and out and scrolling just to see the pages. While many companies create iPhone apps for situations like these, that doesn’t help people who have other mobile devices, hence the requirement for a mobile-specific site. One thing you’ll likely noticing when visiting mobile-specific sites is that it’s not simply the same content with a different resolution – it’s typically different menus, buttons, and fewer images. Since most iPhone browsing is being done via a cellular network, it’s good to consider latency as an experience inhibitor.

Using Lori’s example of providing compression only when it improves the user experience, we can apply the same logic to users with the iPhone user-agent header. On an F5 device, for instance, we’d create an iRule to be executed on HTTP_Request events that would look at the user-agent header and if it contained “iPhone,” we’d either send a redirect so the user would go to our mobile site, or compress data more aggressively, or even both. Using my own example of trying to compress data without causing issues for older browsers, I wouldn’t want to compress simply because a browser sent the “accept-encoding gzip” header – I’d really want to make sure I’m only compressing for user-agents I know can handle compressed data so it’d be a combination of both the “user-agent” and “accept-encoding gzip” headers. I often run into sites that, while being smart enough to detect my user-agent and make decisions based on it, provide a negative experience. For example, here’s the text I see when I navigate to a certain site using Google Chrome –

Please Upgrade Your Browser
To provide you with the most efficient experience, (removed)
utilizes advanced browser features of Internet Explorer 5.0 and greater.

Your Internet browser does not meet the minimum criteria.
Please download the latest version of Internet Explorer.

I’m obviously using one of the most capable browsers on the market, and this particular site not only says it won’t support me, but it also says I’d be better off with IE5. The only “saving grace” is that they provide a link through which I can download a browser they do support. Unfortunately, I’m stuck at this page and am not seeing any of their site’s content. Better behavior would be that I make it to their home page, am informed of the features the site doesn’t think I support, and can make a decision on whether I’d like to move forward. This site is obviously looking at the user-agent header, but is unfortunately making a blanket decision that because mine doesn’t contain IE, I’m not compatible. When this logic was written, Chrome didn’t exist. This behavior requires the logic to adapt constantly to new browsers. In this case, the site might be better off looking for headers that determine whether the browser supports the specific features required by IE.

Another interesting thought that popped into my head on the drive home yesterday was the type of inferences you can make about the person behind specific user-agents. If, for instance, I’m using Chrome to visit your site, I’m likely an advanced user who cares about new technology – do you really want to tell me that your site doesn’t support me and that I’d be better off with IE5? How about if I’m visiting your site with an iPhone – what does that say about me?

I’d love to see some of the analytics data comparing something like “conversion rate” for retail sites among different browsers. I imagine very few people purchase from their phone but I expect that quite a few of them are comparing prices – if that’s the case, it might make sense to have pricing readily available on your “mobile-specific” site.

I had an interesting discussion with a coworker about on which systems certain application logic should lie. In this case, our dialog revolved around whether an HTTP Redirect  should lie on a Web Server or on an F5 Application Delivery Controller. Naturally, being the ADC guy, I would want it on my system. Even with that said, I think it’s pretty obvious that logic like this should lie on the F5 device.

1. The F5 is closer to the user than the web server. If the F5 handles the redirect, the Web server doesn’t have to see the initial request, just the post-redirect one.

2. Instead of the redirect existing on multiple servers, it only has to exist on 1 (or 2) F5 devices.

Today, when most people discuss serving content on the “edge” or “closer to the customer,” they’re likely saying so because of the performance implications. The initial motivation for companies to utilize CDNs like Akamai  was to reduce dependence on their own infrastructure. By offloading static content to a CDN, companies could reduce their bandwidth costs and potentially even their server footprint. As demand for content-rich applications has increased, the main motivation for utilize a CDN has changed. The price of bandwidth has dropped dramatically while server consolidation technologies like Virtualization and Blades has made server resources cheaper than ever. Now, when a company chooses to utilize a CDN, it’s likely so its content can be even closer to its clients/users. Using technologies like Geographic Delivery, a user requesting a page from California can get sent to a CDN’s resource in California. This helps to deliver the rapid response time users have come to require out of new web applications.

There’s really no disputing that compression, caching, security, and redirects should be done as closer to the user as possible. The only potentially valid argument I see for not utilizing such services is a financial one. In retail, customers demand fast response times. In some environments, that isn’t the case. If users are apathetic to load time, then the optimal cost-effective solution would likely be one that doesn’t require a CDN at all…it’s all about finding out which solution fits best for your environment.

Someone recently asked me what “application delivery” meant. For those who have read my blogs, you’ll notice many of the topics touch on the subject of application delivery but really don’t offer a simple definition of what it means.  From my perspective, it’s the effort of getting the content from the web server to the client.

It’s such a simple concept and yet there’s so much involved. An infrastructure must be designed that allows for spreading requests across multiple servers, monitors the availability of those servers to ensure they can service requests, offload tasks from the servers if able to do so efficiently, monitor the performance of transactions, and possibly even optimize delivery speed using WAN acceleration, compression, or caching.

The more requests a site handles, the more magnified each of those components becomes. While our infrastructure is small enough that we don’t notice much of an impact by altering our session dispatch method from “round-robin” to “least amount of traffic” or “fewest connections,” someone like Amazon obviously notices. For Amazon, optimizing their dispatch method can result in them requiring hundreds fewer servers and in turn realizing hundreds of thousands of dollars in cost savings. Add the ability to offload SSL and modifying Layer 7 from a server to an Application Delivery Controller and the savings is even greater, not to mention the increased revenue brought about by the client completing their transactions more quickly.

One thing I have definitely touched on before is how  most companies should have a single person manage their “application delivery design.” The manager of these technologies, which we’ve called the Application Delivery Architect, is a position that should pay for itself in reduced infrastructure costs and increased customer spending. Unfortunately, as companies have become complacent and stuck in their designs, an App Delivery Architect is rarely required because a re-design is often impossible. I’ve noticed most places simply re-design one component of their infrastructure at a time. It’s a very “year-to-year” form of thinking. For instance, if customer demand has caused stability issues, a company might simply add more servers to a farm, rather than looking at how scalable the front-end application is or whether offloading tasks to an ADC might be a more effective solution. When there isn’t anyone tasked and empowered with creating an actual vision for application delivery, a company will likely struggle to reach a truly efficient solution.