Wednesday, February 28, 2007

Google's CAPTCHA Broken?

A few months ago, I found a nice trick that let me read comments on my blog without polling for them. GData allows you to get a ATOM feed of comments on your blog. For example mine is: http://bmaurer.blogspot.com/feeds/comments/default I put this in to Google Reader and blog comments show up just like any other type of blog entry. Recently, I've noticed that, from time to time, I am getting spam comments. However, Google uses a CAPTCHA to protect it's comments. This means one of two things:
  1. Google's CAPTCHAs have been broken
  2. Some spammers are willing to hire humans to break CAPTCHAs
The rate at which spammers post is very small, maybe one or two comments per month. I think this might support a theory that spammers are using humans (if they were using computers, I think it'd be easier to post on the blogs more often). However, Google may be using anti-spam filters in addition to the CAPTCHA (this would be easy enough for somebody to verify, just copy and paste some blatent spam in to blogger, and solve the CAPTCHAs). To be honest, I don't think blog spam would make enough of a profit to justify humans. Google is using the nofollow tag, so the links don't get any PageRank. I bet that spammers are able to break Google's CAPTCHA with a <1% href="http://www.ceas.cc/papers-2005/160.pdf">this paper from Microsoft Research on the importance of segmentation in CAPTCHAs).

CMU dorm policy: Nerds gone wild?

Recently Carnegie Mellon announced that it was going to test out a gender neutral housing program next semester. It's hard to see how this can be all that shocking (most university housing is co-ed by room). Of course, there's always somebody with an ridiculous point of view:
Unfair or not, my fear is that nerdy kids at Carnegie Mellon might put aside writing computer language for the space program and attempt to brush up their knowledge of biology in the privacy of their own dormitories. This is wrong. Nerds should not be having love affairs with other nerds. There is always the danger that in the throes of nerd passion, their thick glasses will collide or else they will drop heavy laptops onto vulnerable body parts. [CMU Dorm Policy: Nerds gone wild?]
I'm glad to hear that some folks think of students at CMU as nerds who need to be protected from distractions such as members of the opposite sex.

Wednesday, February 07, 2007

Big Media DMCA Notices: Guilty until proven innocent

It's no secret that media companies have started to hire companies such as BayTSP to automatically find file sharers and send letters to their ISPs. The goal of this is to use fear to persuade people to use legal methods of getting digital content.

Many ISP's, especially universities, trust the good faith of these companies and will automatically deactivate the Internet connection of those who they get notifications for. As a personal project, and with the help of Carnegie Mellon's Information Security Office (which employs me to work on various computing security tasks), I decided to investigate the reliability of notices from companies such as BayTSP. The answer: the companies do not actually gather the data they claim to. Their standards for sending DMCA notices are very low.

In order to understand the issues, it's first necessary to have a basic understanding of BitTorrent. In order to download something via BitTorrent you download a ".torrent" file from any number of sites that index the content. This file contains a fingerprint for every piece of the file that you are attempting to download. It also contains a reference to a tracker. This tracker is the way that peers (the people downloading the content) find each other. After contacting the tracker, you contact each of the potential peers that the tracker shares with you (and other peers may contact you). The client then begins swapping parts of the file with each of the peers. What the media companies object to is that in the process of downloading the file, your client will offer parts of their copyrighted content to other users -- a violation of copyright law. In order to catch these violations, BayTSP advertises fake clients to the Bittorrent tracker and uses the list of peers which it gets back to find violations

For my investigation, I wrote a very simple BitTorrent client. My client sent a request to the tracker, and generally acted like a normal Bittorrent client up to sharing files. The client refused to accept downloads of, or upload copyrighted content. It obeyed the law.

I placed this client on a number of torrent files that I suspected were monitored by BayTSP (For my own protection I don't want to identify the torrents used for this research. I used the fact that NBC is a client of BayTSP to find trackers. If you want to check if BayTSP is monitoring a torrent, look for IPs coming from ranges in test.blocklist.org). Because the university's information security office is very diligent about processing DMCA notices, I would be able to tell if the BayTSP folks sent notices based on this. With just this, completely legal, BitTorrent client, I was able to get notices from BayTSP.

To put this in to perspective, if BayTSP were trying to bust me for doing drugs, it'd be like getting arrested because I was hanging out with some dealers, but they never saw me using, buying, or selling any drugs.

The fact that BayTSP does not confirm that the client it is accusing actually uploads illegal content could cause false identification of innocent users. BitTorrent trackers work via a standard HTTP request request, for example:

GET /announce?info_hash=579CC43E4D66D35AE22312985EA04275939AB477&peer_id=asdfasdfadfasdf&port=12434&compact=1

One easy way to make somebody look likea bittorrenter would be to get them to go to a website with the code <img src="http://tracker.com:12345/announce?info_hash=579CC43E4D66D35AE22312985EA04275939AB477&peer_id=asdfasdfadfasdf&amp;amp;amp;port=12434&compact=1" />. They'd be on the tracker, and BayTSP would see their IP address, and might send them an infringement notice. BayTSP might check that they are listening on the port they advertise (maybe even check for a BitTorrent handshake). If the user is using bittorrent for legal usages, you could just advertise a port they were listening on. More investigation is needed into exactly what triggers the notice.

One even easier trick you can use: the BitTorrent clients BayTSP uses support Peer Exchange. You can give them the name of another peer for them to rat out to the ISP.

At the end of the day, BayTSP (and probably other similar companies) are sending DMCA notices which claim that they detected a user uploading and downloading copyrighted files. This is a lie. They didn't catch the user in the act of downloading. A lying tracker, a peer using peer exchange, hostile web page, or buggy BitTorrent client could all result in a false DMCA notice.

If your ISP forwards a DMCA notice from these guys, point them here. This research suggests that they have no evidence of wrong-doing. If ISPs learn that the folks sending them DMCA notices are not being completely honest, they may be willing to reconsider their position about how they respond to the notices. The people I work with at Carnegie Mellon seemed willing to reevaluate their policies given this evidence. I believe that ISPs should require that any peer-to-peer related DMCA notice include a statement regarding exactly what evidence of sharing was found. Ideally, the notice should contain evidence that could be corroborated with log files (for example, "we found that the client at 123.1.2.3 uploaded 1 MB of file X to 4.3.2.1". The ISP may be able to check that there was 1 MB of traffic between these two clients).

A piece of good news for anybody who has gotten a bittorrent related notice from BayTSP: it doesn't seem like a studio could do much in terms of court action with the evidence BayTSP gives them.

For the technically minded, I though I'd share some observations of the behavior of BayTSP's clients

  • BayTSP's clients don't don't accept incoming connections, only send outgoing ones. I wonder what exactly this is for.
  • Some of the BayTSP clients claim to be using Azureus (and support Azureus extensions), while others run libtorrent. I'm not sure why they are doing this
  • When BayTSP's clients connect to a BT user, they claim to not have downloaded any of the file, but refuse uploads. Not only does this behavior not make any sense for an actual user, but it seems like BayTSP would want to accept data, which might provide proof of infringement.
  • Some of the IP ranges I noticed coming from BayTSP were: 154.37.66.xx, 63.216.76.xx, 216.133.221.xx. Sometimes, they make themselves really obvious on the tracker. For example, 154.37.66.xx and 63.216.76.xx will send 10 clients to the same tracker all claiming to listen on port 12320. Maybe trackers should block these folks