Tuesday, March 22, 2005

Greg Duffy Bakes Google's Cookies

Greg Duffy, a college student in Texas, has figured out how to read copyrighted Google Print books in their entirety by "baking" the cookies Google uses to restrict searches of protected material. Duffy hopes his bravery will result in a job with Google. I just hope it doesn't land him a lawsuit. During the commotion following his posts, his name mysteriously disappeared from Google's web search, to reappear only after Google Print fixed the vulnerability. Read his post on how he did it. Here's some:
So recently I wrote some software to grab and store up a bunch of cookies, keep them for more than 24 hours, and then automate searching for pages by this method. If I wanted to view page 100, the software would search for it and attempt to extract the image with a regular expression. If that doesn't work, it will search for page 99 and extract the "next page" link to get to page 100. It will continue doing this for page 101, 98, and 102 until it finds the correct page. Whenever a cookie would hit the hard limit, I'd replace it with a new cookie from the queue. By grabbing the "next" and "previous" links automatically in this "inductive" fashion and using the search for skipping, I could view an entire book on Google Print with one click every time. I later modified the software to spit out a PDF of the book. I used simple components like GoogleCookie (cookie with accessible properties), GoogleCookieOven (queue with "baking time", i.e. it only pops when the head of the queue is old enough to get the ability to search), and GoogleCookieBaker (thread that keeps the oven full of baking cookies by querying Google for new ones when the number drops below a certain threshold).
I love all the cookie-isms.


Post a Comment

<< Home