10 Steps to a Better Data Feed

Here at StyleFeeder we work with a staggering number of merchant data feeds from affiliate networks and other partners.  The data quality of these feeds varies quite a bit, sometimes by sins of commission (data where it doesn’t belong), sometimes by sins of omission (leaving out important information).  In an effort to get the word out, we’ve produced a top 10 list for retail merchants creating product data feeds.  This is not a comprehensive list but a quick overview.

1.  STOP YELLING!
It’s easy for us to capitalize your product names and categories for emphasis on the page, but very hard to do the opposite and get the original data back.

2.  Item names should say what they are
If they’re pants, put that in the name.  If they’re wedge sandals, put that in the name.  If it’s a notebook computer, put that in the name.  Especially if there are multiple items in the shot, like a belt displayed with the top and the pants.

3.  Keywords should be descriptive and product-specific
Not a repeat of the item name, not all the words from the long description with delimiters between, not keywords about the store that don’t apply to the product.

4.  The longer the description, the better
People browsing your affiliates’ sites want information, and the more you give them the more will click through and the more will buy.

5.  Categories should be set, and contain the relevant information
If things have gender relevance, include that in the category names.  If you sell different types of items, the category should reflect what type each one is.

6. Brand fields should be filled, and consumer friendly
Customers like to search and browse by brand, and we can’t do this if the field is blank.  Also, customers don’t know your brand with corp or inc or things like that tacked on the end, so fill your feed with the name they know.

7.  Use pricing fields in a standard way
If all of your items have a sale price filled in, it’s probably not a sale and it should probably be in the regular price column.  Save the sale price column for specials.

8.  If an item doesn’t have a working image, leave the image URL blank
404 errors, “image not available” images, store logo images.  If you can’t leave them blank, use the same “noimage.gif” type URL for all the broken ones so we can code around it.

9.  Use identifier columns
UPC when available, ISBN for books, Manufacturer Part Numbers that you’d use to order the item from the manufacturer.  And consistent SKUs for your own store that stay the same over time for the same item.

10.  Talk with your affiliates!

This goes without saying, but just like shoppers are the customers of your products, your affiliates are the customers of your feed.   We may have good ideas, we may have terrible ideas, but either way we may tell you something you haven’t thought of.

Attatched Please, to Seek My Document of the Job

I’ve seen alot of resumes over the years.  There are many ways that one can write one, in terms of content and formatting.  I’ve seen talented people send in horrible looking resumes, and forgiven them.  I’ve seen people list too many skills and forgiven them (sometimes).  I’ve seen resumes that were too long, or too short, and forgiven them.  There are two unforgivable sins when it comes to reading a resume for a software developer: spelling and grammar.

Let me address the HR department’s concern first, this is not an ethnocentric policy. I do get these resumes from people for whom English is not a first language,  I also get these piles of verbal garbage from people who grew up near where I grew up, probably look like, probably talk like me.  I don’t really care, because there is no excuse for a problem so easily remedied.

The real reason is that by submitting a barely understandable resume to me, you’re showing me that you are not a detail-oriented person, and if there is one thing that all talented developers are, it’s that they sweat the details.  It also says that you aren’t willing to ask for help, another quality I require for someone wishing to join my team.

I do not expect perfection, a true grammarian could probably cover this blog post in squiggles of red ink.  I do not expect you to sound like a native English speaker.  It’s pretty well known that diversity of language leads to diversity of thought, an extremely valuable attribute of a team.   I only expect that you’ve proven that you can at least make an effort.

So, how easy is it to solve this problem?  This easy:

1. Show a friend who writes well.

2. Use an online editing service.  They are cheap, probably $50-100.  You’re applying for a job that will net you thousands of dollars for even a short freelance job, potentially millions for a career position.  Make the investment.

3. If you don’t have the money for an editing service or a credit card to pay them, go to your nearest college, put a  sign up at the library that says “Wanted: English major with excellent spelling and grammar skills to edit my resume.  $100″.  You’ll have a dozen responses by the next day.  Use craigslist and you’ll have a dozen responses in 15 minutes.

Updating Your Web Store? Don’t Do This!

The backstory

We work with a lot of web stores here at StyleFeeder, both in terms of user-added links and also by importing affiliate data feeds.  There are literally thousands of these, of varying quality, and we import a good number of them and load them regularly to keep the product listings up to date.

When dealing with lots of feeds, occasionally something odd happens.  A couple of months back we noticed that our conversions of clicks to sales at a certain retailer had dropped to basically zero, which naturally was a concern.  Looking further into it, all the product links that we had were going to 404 pages all of a sudden.  A quick comparison of the URLs we were using to the ones currently in use on the site revealed a change in link structure for the same items, which were still for sale on the site.   A look back through the affiliate messages from that merchant showed a notice that the ecommerce software would be changing but that the links in the affiliate interface would be updated to work.  Looking back, that probably meant the simple banner/text ads that they provide and not the data feed links.

The point

Changing ecommerce software to get new features or performance is a fine thing.  It’s always good to make a site run better and offer a better customer experience.  Having said this, unless you have tremendous brand recognition and customer loyalty there are few things you can do to harm your web site more than introducing an upgrade that breaks all of your inbound links.

  • Assuming your home page is still your homepage, those are the only links that will still work in the eyes of search engines, so you’ll lose all SEO benefits to your deep-linked pages.
  • In the modern web world of user generated content, and URL shorteners, you’ve likely just broken the bulk of product links on your site that users have added to sites such as StyleFeeder, Facebook, or TinyURL
  • Probably minor compared to the other two, Affiliates who have loaded your product feed and do refreshes based on product content changes (or don’t do refreshes) will be tripped up since your changes will hide behind an affiliate URL and possibly not be noticed as changes.

How to Avoid This

If this is something that absolutely has to be done, take an inventory of your inbound links and your affiliates.  Then notify them early and often that  a change is coming.  It’s likely that they’ll react as if you’re announcing a jump off the local bridge, but through the wonder of email you don’t have to watch or listen to this.

Preferably, something like mod_rewrite in Apache could be used to remap the old URL structure to the new one.  Or, if the product identifiers in the URL change then some other sort of page or script or program could be written to map the existing IDs to the new URLs.

Congrats to Contegix

I’ve been accused by a few people of being a Contegix fanboi, which is an label that I completely accept and agree with.  We’ve been hosting with them for around two and a half years after I decided that I didn’t want to handle a growing cluster of machines myself.  Since I ran a small hosting operation on the side for many years, I have some opinions about how things should be done.  When I found out that Contegix had not only institutionalized many of my beliefs as part of their support process, but also shared my opinions about how customer service should be run, I was thrilled.

We’ve grown with them over the years and they’ve kept up and have gone overboard for us on more than a few occasions, including dedicating a pair of Foundry hardware load balancers to us after our required configuration wasn’t available in a shared configuration.  Their support team is legendary and these are not isolated incidents. Normal mode of interaction with them basically involves:

  1. Send email to Contegix support requesting some arcane version of a custom built source package running in some non-standard way.
  2. Wait 30 seconds.
  3. Check email and read the “we are reviewing this and will reply shortly” message, which, as far as I can tell is actually written by a human each time.  It means that someone with a name is looking into things for you.
  4. Wait two to five minutes.
  5. Check email and read the “we have completed this request” message.

It’s like that every time.  And it’s been like that ever since we started working with Contegix.

When I came home yesterday to see that they had won the “Best Linux Friendly Hosting Provider” category in the readers’ choice wards in the latest issue of Linux Journal, it came as absolutely no surprise at all.  With service and support like this, it’s well deserved.

I’d like to the LJ page on their site… if it existed, which does not appear to be the case as of this writing.  It’s an odd twist that my paper magazine is more current than the LJ website.

Hiring a Product Designer

StyleFeeder is looking for a Product Designer to work on some upcoming features for our Personal Shopping Engine during a three month contract with us (possibly longer, but we’re focusing on 3 months for now).

We’re a small, fun-loving team that works hard and gets things done without any bureaucracy.   StyleFeeder is based in Central Square in Cambridge, MA, mere steps away from the Red Line T stop.

Who We Are

StyleFeeder is trying to improve how people shop.  Online shopping is currently focused on impulse buying, price seeking, and moving inventory. We think it should be about connecting people to products that are the best choice for them, based entirely on individual preferences. This is a big, exciting challenge, and we’re addressing it.

Take a look at our site and ask two questions, “Could I have built this?” and “Can I make this better?”. If the answer to both questions is yes, we should talk.

Here’s what we are looking for:

You are a designer

Photoshop and the other standard tools are second nature to you.  We’re very interested in “flow” as part of the design process.  You should have good sensibilities in this regard.  Your toolkit includes wireframes, personas, prototyping techniques.  You are creative and can re-think your approach based on user feedback.  Form following function is fine, but you still see a place for delighting users with human-centric design.  You consider yourself to be a practical person and view things holistically.  You have experience doing iterative user testing.

You can implement all (or most of!) your designs

Specifically, you can handle HTML, CSS, Javascript and the client side stuff.  We think libraries like YUI and jQuery are great for several reasons.  If you are philosophically against re-using other people’s code, it’s probably best if you move along to the next job posting.

Talk To Us

Send us your resume and a brief writeup about a project that excited you, and why it did so. If you have an online portfolio or blog you can share with us, we’d love to see those too. Email us at jobs/at/stylefeeder/com.

The Irony of Facebook (aka Verified Apps Program FAIL)

As you may know, StyleFeeder has a rather large Facebook application that we launched in the summer of 2007, just after the Facebook platform was announced. We grew quickly, mainly because the application is actually useful (rather than the apps that let you throw electric sheep at people, which are fun but also tiresome) and lets you share your shopping activity with your friends in a non-beacony-big-brother kind of way.

Since then, Facebook has gone through several redesigns, each of which successively depresses the visibility of applications on Facebook. It’s hard to find them, it’s hard to see them, they change the API willy-nilly and break all kinds of stuff and generally make app developers feel like we’re being slapped around. Am I being harsh? I’m not making this stuff up and we’ve alluded to it before. Check out the developer forums and you’ll see what I mean.

It was with much amusement this week that we received not one but two emails from Facebook.

I feel rejected

The first was a rejection notice for our $375 application to the Facebook Application Verification Program. We submitted this application weeks (possibly months?) ago, which Facebook kindly sat on for an extended period of time (but this is normal, apparently). Basically, this program is supposed to give your app extra visibility and a good ol’ Facebook seal of approval because they’ve apparently checked to make sure you’re complying with their terms of service and making Facebook a better place. It sounds like a good thing. (I even have hope that they’ll make the big apps play by the rules and not let them be all spammy like they’ve been in the past.)

Why the rejection? Two reasons, according to the email that we received:

Policy Violations:
1. Please bring your application into compliance with Facebook Platform Policy section 2.4 (see http://wiki.developers.facebook.com/index.php/Platform_Policy#2._Platform_Policy_Overview:_What_Applications_Cannot_Do). Section 2.4 states applications cannot mislead, confuse, or defraud the user in any way.

Please make sure it is clear to the user that they are navigating away from Facebook. For example, using the clicking on a product within the app should alert the user before bringing them away from Facebook.

As for number one, well, I think that reasonable people could disagree that clicking on the products in our application take you off of Facebook, so I’ll simply disagree on the grounds that it’s not confusing at all. But I’ll leave that one aside because the next one is awesome

2. Please bring your application into compliance with Facebook Platform Application Guidelines section I.1- I.3 (see http://developers.facebook.com/guidelines.php). These sections mention that applications cannot promote, or contain content (including any advertising content) referencing, facilitating, promoting or using, the following:

  • Adult content, including nudity, sexual terms and/or images of people in positions or activities that are excessively suggestive or sexual.
  • Obscene, defamatory, libelous, slanderous and/or unlawful content.
  • Hate speech, whether directed at an individual or a group, and whether based upon the race, sex, creed, national origin, religious affiliation, marital status, sexual orientation or language of such individual or group.
  • Content that is deceptive or fraudulent.
  • Content relating to the sale of liquor, beer, wine, tobacco products, ammunition and/or firearms.
  • Content relating to gambling, including without limitation, any online casino, sports books, bingo or poker.
  • Inflammatory religious content.
  • Politically religious agendas and/or any known associations with hate, criminal and/or terrorist activities.
  • Political content that exploits political agendas or uses “hot button” political issues for commercial use regardless of whether the Developer has a political agenda.
  • Illegal activity and/or illegal contests, pyramid schemes or chain letters.
  • Content from uncertified pharmacies.
  • Sale or use of web cams or surveillance equipment for non-legitimate use.
  • Spam” or other advertising or marketing content that violates applicable laws, regulations or industry standards.

Your application has images of adult content being added and shared by users (ex: search “thong”). Please remove all instances of this content from your application.

We have about 14M products on StyleFeeder from ~2500 reputable retailers along with a truckload of cool products that our users have added over the years. It’s entirely conceivable that we have tens of thousands of thongs on our site. I have no idea what the real number is, but I’m sure it’s a lot. We also have our own Terms of Service and don’t allow adult content on the site, but that’s really not what we’re talking about here.

I have a better idea: if you’re easily offended by seeing pictures of models wearing thongs, a good idea would be to not search for thongs. If this is an effort to protect underage children from seeing skimpy underwear, I have to wonder if Facebook is going to then go around policing all of the photos and textual content on their site? Because I’m quite sure that they’ve got much racier stuff than a few thongs… like, perhaps, Going.com’s extremely popular Naughty Gifts app (Hello Natasha!).

This really is a most perplexing reason to reject an application. By way of comparison, check out what a search for ‘thong‘ on Amazon yields. Guess what? Models wearing thongs. If Facebook thinks they’re going to become content police, they have a long way to go and they’d better stop allowing user-generated content on their own site.

The Ironic Part

This morning, we received our second communication from Facebook:

My name is [redacted] and I’m writing from the Facebook advertising team. Apps such as StyleFeeder have been particularly successful when advertised on Facebook, and our team would like to help you in developing a marketing strategy.

Nice. So they won’t give verified status to the the number one shopping application on Facebook, but you will happily offer to take our advertising dollars.

The thing is that we’ve met a bunch of people at Facebook over the years and they’re all nice, helpful people. But the clumsiness with which these application verification efforts has been managed is vexing, to say the least.

DNS vendor performance

I came across this post about DNS performance on Hacker News yesterday, which was interesting because I’d been conducting similar experiments for StyleFeeder. Our site is fast and has scaled well, but I’m always on the lookout to shave off a few milliseconds here and there from our requests. I was looking at our Pingdom reports for DNS recently, and decided to run a few comparisons.

We currently use DynDNS for DNS service. We switched to them probably two years ago after I decided that UltraDNS was stupidly expensive for what they were giving us. In the end, I managed to cancel our UltraDNS service over the phone by providing publicly available information, which was awesome as their standard contract at the time ran on a yearly basis and they ended up screwing me for another year of service; the whole thing left a very bad taste in my mouth. Although, UltraDNS runs some TLDs and big sites, so they obviously have some redeeming qualities. Pricing isn’t one of them. DynDNS was costing us 60x less at the time.

I’ll use the same format as JohnPhilipGreen’s original post to present my numbers. One difference is that my figures were generated using 30 days worth of data, not 3 days as John used.

DNS Server Test site Response time Standard Deviation
ns1.mydyndns.org www.stylefeeder.com 112ms 45ms
ns1.contegix.com www.stylefeeder.net 112ms 55ms
ns1.dnsmadeeasy.com www.hubspot.com 127ms 97ms

John basically disqualified DNSMadeEasy as part of his tests because the results were so erratic. I haven’t noticed anything that warrants disqualification like that, but they’re clearly the slowest of the three that I was monitoring.

Some notes: the HubSpot folks are friends of ours, so I figured they wouldn’t mind if I put monitors on their service. Contegix (by far the best managed hosting company I’ve ever worked with) runs our non-cloud infrastructure for us; they offer DNS service at no additional charge to hosting clients and we use it for one of our auxiliary domains, primarily for legacy reasons at this point.

Dynect was the big winner in John’s shootout, which is interesting, given that DynDNS and Dynect are part of the same overall parent company. They currently charge USD $27.50 per year, a far cry from what UltraDNS charges and - I’m guessing - substantially less than what Dynect charges. Is it worth it? For most small sites, probably not… for StyleFeeder, it may very well be worth switching to a faster provider, especially to gain a >50% reduction in response time depending on the cost (John was seeing 42ms as an average response time).

John, maybe you can republish your numbers after 30 days to see how things stand at that point?

The Most Trusted Name in … Search?

Lots of people find StyleFeeder via search engines, and the usual suspects top the list: Google, Yahoo, Ask, Live, etc. Making an appearance in the top 10 this month: CNN.

Appearing below CNN in number of search visits: Altavista, Lycos. My how times have changed.

That’s one big cloud

I just tried using a little utility called subcloud to mount an S3 bucket as a filesystem using fuse on a CentOS box. I did ‘df -h’ and I got this:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.9G  2.5G  7.0G  26% /
/dev/sda2             147G  6.9G  133G   5% /mnt
none                  851M     0  851M   0% /dev/shm
fuse                  256T     0  256T   0% /s3pd

Unfortunately, this bucket has so much data in it that I’m afraid my stupid attempt to see what happens when I type ‘ls’ will end up burning me. Curiosity killed the cat.


An easy VPN with DD-WRT

I’ve been using some ssh port forwards to get onto our office lan where we have our integration environment on days when I’m working remotely (i.e. we got 12 inches of snow yesterday). But those are more complicated these days since I need to connect to four or five services (databases, web services, etc.), so I thought I’d see if I could somehow coax DD-WRT - the open source firmware on our router - to give me VPN functionality. No coaxing required. This couldn’t have been easier. In fact, it’s downright impressive how easy it was. It took me literally three minutes to get it working with my OS/X laptop and the same again for Savage on a Windows box. Now, that’s cool.