Search

Thursday, September 9, 2010

Breaking visual captcha : introduction

What is Captcha ?
CAPTCHA stands for Completely Automated Public Turing. Captcha used  to tell Computers and Humans Apart. Though there are few type of captcha, Visual Captcha is the most widely used because visual captcha is easy to implemented in screen-involved activity.


Do we need to do that ? What is it used for ?
We need captha to validate make sure that some processes are really executed by human, not by computer / robots. Those special process usually involving form input, such as registration, commenting, and others.













Visual Captcha, can you show ?
Absolutely. But first I want to generalize difficulty of breaking visual captcha by 2 categories :
  1. Weak Captcha.
  2. Strong Captcha.
Images below shows samples of captchas that I considered weak and strong. I will explain reasons, why one captchas image are easy and others are strong, completely on my further writing on this blog.


Weak captchas
Strong captchas - hard to break
Those characters in images above are easy, at least most of them, to be read by us, humans. But for computers it is a whole different thing. Our representation of characters of different from computer's.
Characters are recognized by computer as integers only.

So how do Captchas are read then ?
Experimental garden show 1 example of to defeat captcha using man-in-the-middle-attack.
Thought many had succeed in breaking visual captcha, some can be found here, I want to find my own way started from this post  :)

So you'd never brake any captchas ?
No, sir. I have not.
I see this captcha-breaking thing not as a result, yet it is a journey. The journey that starts here.

Starting with these captcha 'features' I had seen so far :

  1. Visual captchas are consists of 10 characters, usually less then 10.
  2. There are conditions that disturbing normal character appearance, such as : characters are stretched & skewed, presence of dust pixels, etc.
Therefore I assumed there are 3 process involved in order breaking visual captcha :
  1. Slice captcha image into single character images.
  2. Remove dust and background pixels.
  3. Read cleaned character, in order of appearance. This can be done using NN (Artificial Neural Network).
First 2 processes need not in particular order. Third process will always be the last process. Many (programming) classes is available to help complete the character-reading process. For instance, we could use GOCR (java).

I will write down my attempt in breaking captcha latter on this blog.
So, stay tune..

1 comment: