Captcha Hacking(2) - Collecting Data and Analyzing
Collecting Data and Analyzing for Captcha Hacking (2)
Captcha Hacking Series
Captcha Hacking(1) - Defining the Problem
Captcha Hacking(2) - Collecting Data and Analyzing
Captcha Hacking(3) - Data Cleaning
Captcha Hacking(4) - Training with KNN Algorithm
Captcha Hacking(5) - Automating to solve the Captcha problem
Topic
How much Data Collecting
How much data should we collect to solve this Captcha problem?
- Since the characters each in the problem look very similar, we don’t need much data
- From the questions, you can see that the number characters are about the same size compared to operation characters
- Possibility of characters: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *
Data Collecting
We can simply collect data by solving the problems that are shown in localhost:10000
- As talked previously, we can see that the questions get harder and harder to solve
- You can right click the image and save them in a separate file
- try and get all the possible characters with the images (for myself, I could collect all the characters with 4 image files)
Data Analyzing
We then analyze the image by getting the colors of each characters using any tool as you can see from image below, one of the color is rgb(170, 255, 170), with hexcode #aaffaa (you can interpret 2 letters each as rgb so ‘aa’ = r ‘ff’ = g, ‘aa’ = b)
Things we found from analyzing image
- Blue: from RGB value, B is always FF
- Green: from RGB value, G is always FF
- Red: from RGB value, R is always FF
- Blue & Green: from RGB value, R is always AA or lower
- Blue & Red: from RGB value, G is always AA or lower
- Green & Red: from RGB value, B is always AA or lower
This post is licensed under CC BY 4.0 by the author.