Free Captchas, Google App Engine and OCR

Captchas are the distorted, almost unreadable strings you have to retype in a web page in order to do a specific action. The purpose of the captcha is to make sure the form has been filled by a human and not a “bot”. Most of the time you will find captchas on login and comment pages.

Wen time arrived to add captchas to my CryptoEditor website I found a service provided by Google called reCaptcha. ReCaptcha takes advantage of the massive number of users using it everyday to digitize printed books. How can that be? It’s very simple!

Instead of showing one combination of letters and numbers to the user, reCaptcha will ask you to resolve two words. One of these is a word reCaptcha already know the digital translation and the other is a word scanned from a printed book that the OCR application has not been able to resolve. Of course you don’t know which is which.

The word reCaptcha already know will be used to allow you or block access to the service and your guess for the other word, the one the OCR was unable to resolve, will be added to a database to train the OCR reading this book.

The integration of reCaptcha in a Google App Engine is quite straightforward. You first download the reCaptcha python API. Then you copy the module to your project and adapt  it to use the GAE fetchurl module instead of urllib2 to call the reCaptcha server over http.

If you are lazy like I am, you can download an already adapted version of from the Joscha Feth blog. This post explains in details how to implement it in your Google App Engine application.

To be able to use the service, you will need to register to the online service. Since it is a Google service, you can use your google login to register your websites to the service. You will be provided with a private and a public key. NEVER publish or exchange your private key.

Visit the reCaptcha wiki site to find information on how to personalize the look and feel of the captcha form using CSS.

Leave a comment

Your email address will not be published. Required fields are marked *