Wednesday, May 23, 2007

reCAPTCHA: A new way to fight spam

You've probably seen a CAPTCHA before. It's those funky letters you have to enter before you sign up for an account on almost any website. I'm proud to announce a new type of CAPTCHA: reCAPTCHA: (click to see a live demo!).

You might notice that reCAPTCHA has two words. Why? reCAPTCHA is more than a CAPTCHA, it also helps to digitize old books. One of the words in reCAPTCHA is a word that the computer knows what it is, much like a normal CAPTCHA. However, the other word is a word that the computer can't read. When you solve a reCAPTCHA, we not only check that you are a human, but use the result on the other word to help read the book!

Luis von Ahn and myself estimated that about 60 million CAPTCHAs are solved every day. Assuming that each CAPTCHA takes 10 seconds to solve, this is over 160,000 human hours per day (that's about 19 years). Harnessing even a fraction of this time for reading books will greatly help efforts in digitalizing books.

reCAPTCHA provides an easy to use API for putting CAPTCHAs on your site. Installing is as easy as adding a few lines of code to your HTML and then making a HTTP POST request to our servers to verify the solution. We also wrote plugins for WordPress, MediaWiki, and phpBB to make it very easy to integrate.

One other interesting service reCAPTCHA provides is a way to securely obfuscate emails. Many sites display emails like bmaurer [at] foo [dot] com or use hacks with tables, javascript or encodings to get the same effect. Spammers are getting smarter and figuring out these tricks. Spammers are especially diligent at working around the strategies of well known open source software. Consider this warning on bugzilla.mozilla.org:

Although steps are taken to hide addresses from email harvesters, the spammers are continually getting better technology and it is almost guaranteed that the address you use with Bugzilla will get spam.

reCAPTCHA Mailhide provides a scalable solution to email obfuscation that can be widely deployed without being breakable. Mailhide provides a way to encrypt a user's email with a key only reCAPTCHA knows. reCAPTCHA will only display the email address when the user solves a CAPTCHA. With reCAPTCHA, I can display my email address as bmau...@andrew.cmu.edu. If you click on the three dots and solve a CAPTCHA, you can see my address. Mailhide provides a way for individual users to encode their email address as well as an API for services (like Bugzilla) to share an encryption key with reCAPTCHA.

If you're suffering problems with spam, take a look at reCAPTCHA. Not only can you solve your problems with spam, you can help preserve mankind's written history into the digital age!

Monday, May 14, 2007

LD_LIBRARY_PATH empty entries

Many of us developers have a bashrc that has lines like:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/install/lib

I've always known that this isn't perfect, that one should check $LD_LIBRARY_PATH isn't empty, but had always thought it was just a minor point. It turns out that the loader sees an empty entry as meaning the current working directory. This means that it looks there for libraries.

The reason I noticed this is because I was using sshfs to mount something on my workstation in Pittsburgh from my laptop in California. When I ran any command (for example "ls"), the loader would look for tons of libraries. Each one of these libraries, it'd execute a stat for. A round trip between Pittsburgh and California is 90ms... so you can imagine everything was quite slow.

Of course, there are security implications too. I'm not that worried about a rogue directory on my laptop, but on shared systems (such as some of the university ones), I can imagine this being a risk.

Wednesday, May 09, 2007

In the Bay Area...

Starting this Saturday I'll be in the Bay Area, specifically Mountain View, for my internship at Google. While there, I'll be working on Google Calendar.