João Soutto - Data scientist (umgrauemeio)
13/12/2022
Have you ever needed to prove that you are human to a website? Well, this type of test is a type of validation that protects sites and applications from spam and abusive activities. But it goes well beyond that...
The goal is not only to prove to Google that you are not a robot. These associations of words to images serve as a free data generation service to train artificial intelligence models.
And what does the fire have to do with it? Hold on, let's break it down.
In a world with endless solutions that use AI, the gold of the current era has become data generation. In the test above, by clicking on images associated with taxis, the user is indicating which images contain a taxi. The widespread availability of this data offers opportunities for training various algorithms.
To understand why this is, it is interesting to understand the concept of supervised learning. This is a technique for developing machine learning algorithms that seeks to classify data based on the collection of patterns or features that characterize the different classes.
Classes represent a group of objects with similar characteristics, patterns or features. In the example mentioned throughout this article, the class would be precisely the taxi.
To do this, a supervised machine learning model is provided with lots of labeled data, called training data. Labeled data is data that already has some class identification, precisely for this reason it is considered supervised, as a human needs to label this data. This algorithm will then learn the features associated with a class so that it can classify new data.
Returning to taxis to make this understanding more palpable: the labeled data would be precisely the association of images with the present classes and the similar patterns would be, for example, the yellow color present in most taxis.
Well, keeping that in mind and knowing that usually thousands or even millions of carefully labeled data are needed to achieve a good model, who is going to do the grunt work of classifying all the data? YOU.
Yes, only a human being could do that. In fact, in many cases, free labeling tools are used to generate these labels. This is where Google comes in with the reCAPTCHA tool.
The test presents correctly labeled images as shown in the figure above (selection 1, representing images with taxis, and selection 3, a group of images without taxis).
However, it also includes at least one image that has not yet been confirmed to have the object present (selection 2, in the figure above).
The correctly labeled images serve to determine whether it is a human controlling the actions of the machine or a robot programmed for it.
The others are free classifications that users provide.
These new labeled data can serve as input for a taxi recognition algorithm.
Applied, for example, to forest fire images, it could assist in training algorithms for early detection of wildfires, increasing the efficiency of firefighting brigades and saving resources used due to fighting at early stages of the fire. The reCAPTCHA would look something like this:
With a purpose like this, you certainly wouldn't mind answering, every now and then, whether you're a robot or not, right?
This is the first in a series of articles that discuss firefighting, forest monitoring, and artificial intelligence applied to images.