How not to name your brand

In honor of 80%20 Shoes, signs you’ve chosen a bad brand name for your shoes, in the age of Google:

  1. Your name contains a URL entity.  When you put %20 in the URL it represents a space character.  Oops.  If your name doesn’t URL encode to your name, you’re doing something wrong.
  2. When someone enters your name in Google, it thinks they’re doing math.  Type 80%20 into Google, and it thinks you’re doing 80 modulo 20.  The result is zero!
  3. Nobody knows how to pronounce it so you can’t tell your friends, and when you do they can’t search for it.

We’re a finalist for the MITX Awards, 2009

I just got an email that informed me we’re a finalist for the MITX Awards this year in the “applied technology” category, which is great news.  You may recall that we won at last year’s award ceremony.  The award nomination this year is for our new product browser and some of the underlying geotargeting technology, which you should check out if you have not seen it already.

CDN Evaluation Criteria

A friend asked me about StyleFeeder’s experience using CDNs, so I sent him the list of criteria that we use to evaluate the various content delivery networks that we have tried.  We’re currently using Akamai, Cloudfront and Panther for various types of content.  I’ve talked to pretty much everybody in the CDN space over the years and I think that this list of questions is pretty solid.  If you think otherwise, I’d be happy to update this list with any new ideas you have.

Note that if you’re streaming large audio or video files, this list may not be a good one for you.  The questions are biased towards StyleFeeder’s needs, namely the fact that we have tens of millions of small product images floating around.

CDN Evaluation Criteria:

  1. Do you support HTTP compression?  Does content have to exceed a minimum size in order to be compressed?  (i.e. content less than 2Kb is not compressed)
  2. Do you allow us to override the Expires headers being sent by the origin?
  3. Will you obey long Expires headers (5 years) and cache content accordingly?
  4. Once a piece of content has been cached on your network, under what conditions would your CDN re-request that content from the origin?
  5. Do you allow full and partial (based on regexes) cache flushes?
  6. Can you demonstrate that you’re better than one of the “value” CDNs?
  7. How many nodes do you have?
  8. Do you have a lazy-loading caching proxy scheme (like Akamai, Panther, Voxel) to allow for easy deployment?
  9. Is there an API we can use to add or flush content?

Let me know if you have other questions that should be on the list!

Simple clustering with AWS and (free) RightScale

Here at StyleFeeder, we do a lot of things for the sake of performance. We recently decided to take a set of processes that we had running on a few large EC2 instances over at Amazon Web Services, and consolidate them into a couple of clusters.

First, you may ask, why use AWS at all? We don’t use it for everything, but this particular set of processes handles image storage, resizing, and delivery, through a content delivery network. We have many millions of images. If we wanted to stick them on a big file system, we would run out of inodes, so that’s out. We could put them in a sharded mysql database, and we have a big sharded mysql infrastructure for a lot of our data anyway, but that’s not how we started out, and it’s not exactly using the database for database-y things. To do this ourselves, we would have had to install a distributed file system of some kind, which seemed like a lot of work, so we decided to use AWS’s S3 for storage. Once your images are in S3, there are certain things it makes sense to do in EC2, since an EC2 instance can network to an S3 bucket pretty fast. Vendor lock-in? You bet, but hey, it’s giving us a pretty good value. To get the images out of S3 and on their way to our users’ computers and other shopping-enabled gadgets, we have some EC2 instances that resize them, store the resized versions for future use, and serve them up. The resizer/cacher logs indicate that although on average we don’t serve a given image up to too many users, we’re serving it for the Nth time, where N>1, about 96% of the time. If the CDNs could just keep them all around, forever, our servers wouldn’t be working as hard as they do. Our actual origin hit rates at the CDNs are something like 50%. What’s up with that? They can’t handle sparse sets of content? I don’t remember a disclaimer about that. Can you hear me, CDN people? I’m talking to you!!

But I digress. Back to AWS. AWS is a good value, of course, only as long as we’re efficiently using EC2 resources. That’s where the clusters come in. We’ll talk about this in terms of what you do in RightScale, rather than AWS alone. RightScale used to offer a lot of features that AWS just plain didn’t have. AWS has been filling those gaps, but we’re not planning to ditch RightScale any time soon, because they still make AWS resource management a lot easier than the raw services in the Amazon interface. If you’re a hard-core command-liner, this blog post and this manual tell you everything you need to know. We’ve gotten into the habit of doing these kinds of things with the free features of RightScale, until we get to the point where scripts save us real time or money. Here’s what we did:

  • Built up a medium-sized resizer/cacher, to do any and all of the miscellaneous things our various big boxes were doing.
  • ‘Bundled’ it into an ami image.
  • Created a ‘Server Template’ based on that image. Huh? Why do I need a ‘Server Template’? What’s wrong with the image? You can’t use raw images as the basis for servers in a ‘Deployment’ (see below). You need to have a server template. Does it have to be that way? I can’t see why, but hey, this is a free service (RightScale, that is), so who’s complaining?
  • Wrote some ‘Right Scripts’ so our server can start up and add itself into the mix in a fully functional state (starting apache and whatnot).
  • Created a ‘Deployment’ and added the server template to it.
  • Created more server templates, based on the same image, and added them to the deployment.
  • Created a ‘Load Balancer’, and registered the servers in the deployment with it ‘on boot’, and, for the ones that were already running, ‘now’.
  • Put a CNAME in our DNS for the load balancer. Huh? Couldn’t we just take an elastic IP address (one of those ‘permanent’ ips Amazon gives you) and assign the thing to that, so it takes over right away for the big old instance that was handling this? No, no we couldn’t. This is a pay service, and that sucks, but with a short time-to-live, it only sucks for a few minutes, so we’re going to overlook this. AWS seems to want to be able to scale the load balancer, or move it around, whenever they want, which I guess you might need in some circumstances. Note to people running java: watch out for the infamous java DNS caching problem, if you have jvms that talk to one of these load balancers. If Amazon switches the ip underneath you, your jvms will be talking to the void, unless you’ve configured the jre properly.

Now we can start all the nodes in our clusters, or just some of them, or whatever. If we’re feeling really ambitious, we can set these to auto-scale, but we’re already saving quite a bit of money and serving things faster than we were, so that will be for another day.

How is it deciding which node gets the traffic? The documentation seems to say that it’s round robin between availability zones, and then based on load within them. On this page Amazon says “Elastic Load Balancing metrics such as request count and request latency are reported by Amazon CloudWatch.” So, based on load, but measured in a black-box-y way. “Elastic Load Balancing automatically checks the health of your load balancing Amazon EC2 instances. You can optionally customize the health checks by using the elb-configure-healthcheck command.” You can do this in the RightScale interface as well. You can either accept the default that it checks for the presence of a TCP listener on port 80 (target=”TCP:80″), or you can give it something like “HTTP:80/path/to/my/image.jpg” that returns a “200 OK” when all is well. The default seems to work surprisingly well for these particular CPU-intensive activities that we’re clustering. We don’t see one server with a load of .3 while another is at 4. We do see some occasional differences, but they seem to even out pretty fast. We’ll be more precise if the differences start to get out of hand.

FoxyProxy Cloudera Config

When you have smashed your head into the table trying to get the included .pac file to work for Cloudera’s EC2 Hadoop setup and want something that works properly in FoxyProxy, simply use the following URL patterns (available in text below the graphic for your cut/paste pleasure):

foxyproxy-cloudera

As promised,

*://10*
*ec2*.amazonaws.com*
*ec2.internal*

Moving to another cloud

We are in the process of migrating one of our backend dataprocessing servers from a legacy hosting company in NYC to Contegix.  What’s unusual about this transition is that we’re moving the machine onto Contegix’s new cloud platform rather to a traditional server.  We’ve noticed a few things already.  When we were copying over a huge backup of our databases, we noticed that they were transferring across the network from NYC to St Louis at 93Mbps, which is not frigging bad!  As I write this, we’re loading over 100Gb of data into a MySQL server on our new Contegix cloud machine at ~30K blocks/second (as measured by vmstat), which means that this thing has lightning fast i/o… not surprising since the storage is on an EqualLogic SAN (Update: we later saw this increase to ~70K blocks/second).

The differences between this cloud platform and EC2 (which we still use for some other needs) are striking.  The application that we will host on this new vm sometimes needs a lot of memory.  With Contegix, we can grow that all the way up to 128Gb with 32 cores.  Amazon doesn’t even come close to that – their max is 15Gb.  Or you can figure out how to distribute your application over a bunch of hosts.  But sometimes you just need 20Gb of memory and all the problems go away.  Plus we don’t have to compete for these resources – they’re guaranteed to us.

I also like the fact that the machine doesn’t disappear into oblivion when it reboots, which is a feature (?) of EC2 instances.  We can grow our storage needs past that point that I care to think about on this platform as well.  Plus, we get all the Contegix support that we want if we choose to do crazy things with this host.

The virtualization technology is VMWare ESX, which is darn cool stuff (having just set it up on an integration server here a week or so ago, I have to say that I like what I have seen so far).  We’ve already seen our VM get hot-migrated to another physical box in order to maximize the resources available to us.  Things got slow for a little bit, but then they got lightning fast.  I think we were copying data into the machine at that point and saw no impact to open connections, etc.  Don’t ask me why, but I’m still surprised that this works reliably.

So far so good.  We’ll report back with more later.

Warning of the Day

I would like to nominate this stack trace for the Warning of the Day award:

java.lang.NumberFormatException: For input string: "Fuck"
 at  java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
 at java.lang.Long.parseLong(Long.java:403)
 at java.lang.Long.valueOf(Long.java:491)
 at java.lang.Long.decode(Long.java:634)

It was caused by the following IP address from the request header of an actual StyleFeeder user:

Fuck.You.Iran.Government, 10.94.117.124

The latter IP address (masked here to protect the innocent) geolocated to the Netherlands, in case you’re wondering. Some kind of anonymizing proxy, maybe?

This user proceeded to view one item…

OiOi Sophisticated Baby Bags Giraffe Print Messenger

OiOi Sophisticated Baby Bags Giraffe Print Messenger

…then started the signup process, but unfortunately didn’t go through with it.

More about Iranian shoppers in a moment.

[Edit: The registration service is in the Netherlands, but the address was actually registered to a company somewhere else in the world (and not in Iran either). The challenge of accurate geolocation looms.]

final

I saw this blog post referenced on Hacker News today and thought I’d pounce on one of Stephan’s opinions, which is that the final keyword in Java shouldn’t be used except on fields. I can’t disagree strongly enough with this. I first came across this advice in the O’Reilly Hardcore Java book about five years ago. The book dedicates a whole chapter to the final keyword, which initially surprised me – I didn’t think that the author could possibly have that much to say about it! However, the main idea is “enforced documentation,” which is an argument that really sold me on the concept. I find that whenever I see final on a method parameter, a local variable or whatnot, I have one less question about how that object reference will be used, especially when I’m reading someone else’s code. I know that it was the developer’s intent that the variable in question be assignable or not. I find this to be immensely helpful.

Stephan’s argument is that it impacts readability, but I don’t find that at all. In fact, we make final part of our cleanup process for our default formatter in our Eclipse setups at StyleFeeder. Does final help prevent bugs? Occasionally, yes, but the cases for this are rare enough that I hesitate to put this forth as the main argument in favor of it.

The Perl-ish argument against this is to “avoid putting bars on the windows”, since you never know how things may need to be used in the future. However, that simply doesn’t hold up as a solid argument 99% of the time. The two common use cases for sprinkling final into your Java code are on method parameters (which you should normally not be assigning to) and local variables. In the case of local variables, they are local and the impact of putting final on them is entirely contained within that scope. I have yet to see a reason why this would yield any unwanted side effects. Of course, making classes and methods final is a Big Decision and not what I’m talking about here.

doing a tail on rotating log files

We learned something today, not something new, but new to us. If you do a tail -f to view the end of a log file, it all goes great until your logging system rotates out the file. Then you’re stuck wondering if the program has halted or if your ssh connection died or if you hit Ctrl-S and froze your terminal.

But if you use tail -F (note the big F) it will check to see if the file has been rotated out and another put in its place, and will resume tailing on the new file. Happy tails to you!

The wasteland between Harvard and MIT

While sitting at my desk, if I turn my head to the right, I can see the famous MIT dome and part of the roof of the Stata Center.  If I turn my head to the left, I can see parts of the Harvard empire.  StyleFeeder is located between two of the most famous universities in the world and within a stone’s throw of one of the birthplaces of the Internet (the other two being CERN and UIUC).  The amount of bandwidth running through the fiber cables buried beneath the sidewalk under my feet as I walk around Central Square makes this one of the most wired places on the planet.  Indeed, Forbes ranks Boston fifth in the US for “wiredness.”

One would think that the options for getting blazingly fast Internet access at the StyleFeeder offices would be plentiful and cheap, right?

I live one town over from Cambridge and I get very reliable, very fast (20Mbps down / 2Mbps up) Internet access via the monopoly cable provider available to me as part of a bundle that probably breaks out to ~$40-50/month.  It’s actually great.  I don’t have major complaints.

Zip back over to our StyleFeeder office in Cambridge and the best we have available to us is a crappy Verizon DSL connection (ostensibly-but-not-really 7Mbps down and something stupidly slow upstream) for $200/month.  Frigging wonderful.

Occasionally, Comcast drops off flyers advertising service in our building.  Our entire company is under strict orders not to let any Comcast employee seen on our premises leave until they can get us a Comcast rep on the phone who is both able and willing to sell us Internet service.  As much as we’d like to take up Comcast’s offer to pay them for reliable, fast Internet service, they historically have not returned our phone calls and generally ignore us. One of my friends who lives literally a block away from our office has good Internet service from Comcast in her house, so this must be possible.

In the meantime, I’m left staring at MIT wondering why on earth I can’t tap into the wires that are running underneath our building.