Archive for the ‘Misc’ Category

Hello from Vegas: StyleFeeder hits KDD 2008

Tuesday, August 26th, 2008

I’m representing StyleFeeder at this year’s KDD conference, held in Las Vegas, Nevada. It might seem odd mixing seductive showgirls and stodgy statisticians, but I think it’s an excellent location choice. Gambling concepts such as probability, expected value and exploration vs. exploitation are core to many concepts in Machine Learning, Data Mining and Statistics.

KDD played host to the 2nd Recommender System/Netflix Prize Workshop. Gavin Potter showed us that users, movies, and even ratings sessions (date) impart significant biases on ratings, so much so that a model which simply captures these biases and completely ignores user-movie affinity yields a lower error score than than the original CineMatch algorithm. After some discussion of the fact that minimum-error recommenders tend to yield popular and somewhat uninteresting recommendations, Oscar Celma and Pedro Cano presented a study of this effect on music. They found that a collaborative filtering similarity metric was strongly biased toward popular music, whereas content-based and expert-based similarity metrics made it easier to explore “the long tail.” Next, a member of the Gravity Team, Gabor Takacs (who I later learned is the author of the best “big board” tic-tac-toe player in the world), provided a detailed description of their methods for the Netflix Prize. Their approach is an SVD-like matrix factorization, which incorporates incremental training, regularization, user/item bias, positivity constraints, and neighbor-based correction.

Based on discussions and other presentations, it sounds like a combination of matrix factorization and neighborhood based methods is the most common approach to the Netflix Prize of the leaders. Everyone at the workshop seemed to agree that Netflix did a surprisingly good job of selecting a goal for the competition: Netflix requires a 10% improvement over their CineMatch algorithm and the current top team has a 9.15% improvement. The difference seems small enough that the 10% goal must be reachable, but progress has slowed considerably, with improvement of only .72%-age points since the first progress prize was awarded last October.

As the main conference has started, it has become quite clear what the “hot” topic of the year is: social network modeling. Sessions on the topic have been packed and some top figures in the community have presented papers on the subject…

Startup Signage

Monday, August 25th, 2008

This isn’t technology related, but every startup on the planet should do what we did for our signage. It’s cost effective and looks great. Check out the photos.

Are there really 46 megabytes of underpants in the world?

Monday, August 25th, 2008

Getting data feeds from hundreds of vendors, it’s always surprising how big or small some of them can be.  Sometimes there is a rational explanation, like when they include a separate product entry for every size or color.  Sometimes it’s just astounding how much variety there is in certain product categories that I wouldn’t give two thoughts to.  I was pulling a new feed this morning, and the title of this post came to mind.

Amazon’s new elastic block service

Friday, August 22nd, 2008

I have a little piece on the wonderful Xconomy site about Amazon’s new EBS service that you should check out.

StyleFeeder’s Funny Video of the Week

Friday, August 22nd, 2008

If there’s one thing that we have here at StyleFeeder, it’s… well, a lot of data. If there’s another thing, I’d have to say that we all have a healthy sense of humor. Because we basically sit in front of teh intrawebs all day long, we are exposed to many humorous diversions, which we intend to share with you for your enjoyment. So, each Friday, you can come back here for our favorite video of the week.

Several of us are big fans of Flight of the Conchords. I actually discovered them from Erlack via the recommendation system on StyleFeeder… yes, really, that’s totally true. Anyway, enjoy and please put a note on your calendar to come back next Friday to enjoy the video that distracted us the most during the week.

Cool File Viewer

Wednesday, July 30th, 2008

As part of our effort around product data feeds, we get a lot of delimited text files of varying size and content delivered on a daily basis.  Previously, we primarily used OpenOffice.org Calc to view these files, since it has a fairly flexible text import facility and can deal with CSV files pretty well.  For some of the larger files it could take minutes to open and more alarmingly minutes to close, and this was placing major time constraints when a large number of new files came in and needed evaluation.

A survey around the web showed a few different approaches to dealing with delimited files.  Most involved configuring a desktop database like MS Access or a local install of MySQL to import the file, but that would involve creating a table and running a loader.  Really all we want to do is view the file and see that the important fields are being filled in and that the values are not abusive of the column semantics.  Other approaches mirrored our own, using OpenOffice.org or MS Excel as a viewer; there were also a few specialized tools to open delimited files but not spreadsheets.  Having tried a couple of these, they didn’t impress with their speed or usability.

Our answer came from halfway around the world, in New Zealand.  Kiwi Log Viewer, a tool designed to view web server logs on Windows, could open up tab delimited files in an incremental fashion.  This provided a grid view of the file in a really speedy manner.  The problem is that most of our files are pipe delimited, and Windows isn’t so friendly at doing ’sed’ on a file before passing it into a program.  Seeing that we were 90% of the way to what we wanted, we wrote to their support line and sure enough they’ve added a configurable delimiter to the features of their next version, currently available in beta.  While we’re big Open Source fans here, a responsive software company that cares about their curstomers is the next best thing.  If you’re looking for a good program to view delimited files or log files on Windows, check them out.  There is a free (as in beer) version as well if you want to try it first.

Happy 4th from StyleFeeder engineering

Saturday, July 5th, 2008

Boston was a key city in the War for Independence, and to celebrate that there is a big fireworks display over the Charles River every year.  This year the Erics of StyleFeeder (Savage and Kilby) were looking for a place to watch the show.  Down by the river was a mass of humanity, even compared to previous years, so we were worried about missing the show.  Then it occurred to us, doesn’t our office have a view down towards the river without many tall buildings in the way?  This proved to be the case, and we had a spectacular view of the show directly over the famous MIT Dome.

(photo by Eric Savage)

(photo by Eric Kilby)

The rest of the pictures can be found at these links via Flickr.  Remember, Creative Commons licensing is your friend, so feel free to use these as you wish.

Is that server running a bit slow?

Monday, June 30th, 2008
top - 12:59:09 up 6 days, 15:51,  4 users,  load average: 1050.04, 753.62, 356.
Tasks: 150 total,  24 running, 126 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.2%us, 87.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2059260k total,  2049260k used,    10000k free,     2412k buffers
Swap:  4208608k total,   523792k used,  3684816k free,   116432k cached

*shakes head in disbelief*

LinkShare Golden Link Awards and Symposium

Friday, June 27th, 2008

A few weeks ago we got a pointer to the LinkShare developer contest, open to any of their publishers who are using their web services in interesting ways. I wrote up an entry, mailed it in, and didn’t think too much about it for a while. Then a couple of weeks ago we got a message saying that we’re finalists for their Technology Genius Award, and that we should plan on having someone attend the festivities.

Fast forward to this week and I boarded the LimoLiner bus and headed off to New York. A few hours later I was checking into the Sheraton, and not long after that I was boarding another bus for the Plaza Hotel and the Golden Link Awards ceremony. After spending some time in a very dressy crowd, feeling like a fish out of water, I made my way to my seat at a table near the front of the room and after a nice lobster appetizer and steak dinner it was time for the show, hosted by Susie Essman (best known from Curb Your Enthusiasm).

Well into the program, it was time for the Technology Genius Award and the butterflies started going in my stomach. And the award goes to…. StyleFeeder! Went up, shook some hands, made a speech, and sat back down, all in a blur. After the show was over, got on another bus and returned to the hotel with trophy in tow. People kept asking if it was an Oscar, and I kept answering in the affirmative.

The next morning, up bright and early, I got on another bus to head down to Chelsea Piers down by the Hudson River, site of this year’s LinkShare Symposium. The morning was filled with speakers, headlined by James Surowiecki, the author of The Wisdom of Crowds. After lunch and more presentations, it was networking time, in which I was somewhat out of my element. During that time I ran into Adam Weiss of LinkShare, who had helped me with some technical issues back in the Spring, and he introduced me to Jessica Kingman who is our account manager over there. They both had good suggestions in terms of people who I should meet and talk to, and I made my way around to several of the advertisers booths exchanging cards and collecting conference swag. I even won another contest, taking away a nice 9 bottle wine cellar courtesy of our fellow Bostonians at SmartBargains.com. The conference finished around 6, and I headed back to the hotel with the guys from Buzzillions.com to drop off our bags.

After a long New York style evening of after, after-after, and after-after-after parties and some much needed rest, I boarded the LimoLiner back to Boston, with a shiny trophy nestled in my bag. It was worth the trip, and thanks to LinkShare and all their friendly people for making it a great trip.

Facebook Gotchas

Wednesday, June 25th, 2008

We just did a little refresh of our Facebook profile box, and I learned a couple things along the way that would have been nice to know ahead of time:

  • I knew that Facebook caches all referenced images, but I didn’t know that they will resize any referenced image larger than 400px down to 400px.
  • If an image is inaccessible for any reason (404, timeout, etc) it will be replaced with a blank image and cached, so you will need to change the name of the image when doing iterations.
  • FBML includes <fb:narrow> and <fb:wide> for providing different content to the two available profile columns, and this will work even with the Ajax-y reloading of a user moving the box. However, Facebook rewrites your css so that you don’t mess up the page, and the drag-n-drop doesn’t affect this, so don’t put css in these two tags. Instead, put your HTML in the tags, with different classes/ids.