This interactive game teaches you about Active Learning, a machine learning technique where the algorithm actively selects the most informative data points for labeling.
You'll see a 2D plot with points and two important lines:
Purple line: the current model fitted on the labeled points
Gray dashed line: The ground truth decision boundary
Your goal: Make the purple line match the gray line by labeling as few points as possible.
Notice how the game often selects points close to the current decision boundary. These points have the highest uncertainty and provide the most information to the model.
In the early stages, you'll see significant shifts in the decision boundary with each new label. This demonstrates how Active Learning can quickly improve model performance with minimal labeled data.
As you label more points, you'll notice smaller changes in the boundary. This illustrates the efficiency of Active Learning - it front-loads the most impactful data points.
Active Learning is a subfield of machine learning where the algorithm can interactively query a user (or some other information source) to label new data points.
Unlike traditional supervised learning, where the algorithm learns from a static, pre-labeled dataset, active learning algorithms are designed to be more efficient with their training data.
They aim to achieve high accuracy using as few labeled training instances as possible, thereby minimizing the cost of obtaining labeled data.
The key idea behind active learning is to reduce the labeling effort by selecting the most informative instances for labeling, rather than relying on random sampling or pre-defined sampling strategies.
There are several strategies used in active learning to select the next point for labeling:
While active learning can be very effective, it also has some limitations:
In practice, I like to do some mixing of 80% active learning and 20% random sampling of unlabeled points to ensure that the model is exposed to a diverse set of data points while still benefiting from the efficiency of active learning.
Despite these limitations, active learning remains a powerful tool in many machine learning applications, particularly when labeling costs are high and unlabeled data is plentiful.
Created by Mario Filho with the help of Claude 3.5 Sonnet.