Implementing Active Learning

Using Active Learning to quickly improve your models on Roboflow.

Written by Mohamed Traore

Last published at: May 18th, 2022

"Active learning is a machine learning training strategy that enables an algorithm to proactively identify subsets of training data that may most efficiently improve performance. More simply – active learning is a strategy for identifying which specific examples in our training data can best improve model performance. It's a subset human-in-the-loop machine learning strategies for improving our models by getting the most from our training data." - What is Active Learning?

The beauty of Active Learning is that it can help you create a more robust model with fewer images. The rounds of training that will be required for keeping your model working well, and improving it, will quickly decrease over time.

There are 3 key ways to implement Active Learning:

1. Continuously Collect New Images at Random

You can code random sampling into your deployment solution.

Random sampling will help to bring more images into your project that are representative of the true deployment environment.

Advantages:

  1. The model will quickly learn what to detect, and what not to detect, within the environment it is present in.
  2. More images that contain the objects of interest (what our model is trained to detect) - after labeling and re-training, our model will do much better within its deployment environment!

Example:

You are running inference (deploying your model) for a period of 6 hours. In this 6-hour time frame, you programmatically set your device (i.e NVIDIA Jetson or OAK Device) to save a set number of images every 15 minutes (i.e "save 3 images every 15 minutes"). Return the collected images to Roboflow or store them on your devices/system for future upload to Roboflow.

  • Over time, you will see fewer false detections.
    • Additionally, you will get some images with the objects of interest (what your model is trained to "see"), regardless of the confidence level.
    • The advantage of this is that your model will be reinforced to continue doing a "good job" on the objects of interest, and improve on where it has done a "bad job" (low confidence detection, or no detection when there should have been one) regardless of the confidence level it was detected at.

2. Collect New Images Below a Given Confidence Threshold

This method can be employed by selecting a confidence threshold for detections and requiring (within your own code) that any detections of objects (that the model is trained to recognize) with a confidence threshold below ___%, or 0.___ is sampled and sent back to your workspace on Roboflow for further examination, i.e testing in the Hosted Web App or Inference API to find the issue + labeling, and re-training.

Advantages:

  1. The model will quickly learn what to detect, and what not to detect, within the environment it is present in.
  2. The confidence level for detections will quickly begin to increase since you're essentially telling your model (after labeling and re-training), "hey, you did a bad job here - this is what you were supposed to do."
  3. More images that contain the objects of interest (what our model is trained to detect) - after labeling and re-training, our model will do much better within its deployment environment!

Examples:

  • Begin with a set confidence threshold of 40% - any objects (or every __ number of times an object) detected at a confidencelevel below 40%, or 0.40, should programmatically be set to return to Roboflow or stored on your devices for future upload to Roboflow.
    • Next: After re-training and receiving better training metrics and/or more data from the production environment, set the confidence threshold to 50%, and any objects detected at a confidence level below 50%, or 0.50, should programmatically be set to return to Roboflow or stored on your devices/system for future upload to Roboflow.
    • Later: Over time, you will be able to set a higher benchmark for confidence levels, such as 60%, 75%, or 80%. The reason you don't want to start at these levels for confidence benchmarks is that you will receive a very high number of false detections, or detections at low confidence, with newer models.

3. Solicit Your Application's Users to Verify Model Predictions

Advantages:

  1. User feedback is a great tool. You essentially get more sets of eyes to help with the quality control of your model.
  2. You can also add bug reports, in case the model isn't working at all (i.e, alerts for when the system or app crashes or the model runs but doesn't detect anything).

Example:

  • Add "bug reports" within your dashboard, mobile app, or however you have chosen to create your computer vision product. Use these bug reports to quickly identify ways to improve the functionality of your computer vision product, and most importantly, ensure that users have a model that is not only working well but primed to quickly improve.
    • Take any images from bug reports and programmatically set them for return to Roboflow or stored them on your system for future upload to Roboflow.

Resources for Implementing Active Learning

Sample Code

First, install the Roboflow Python pip package (PyPi)

# installing the Roboflow Python pip package
pip install roboflow
  1. Copy/paste the code block below into VSCode, XCode, PyCharm, Spyder (or another code editor)
  2. Update the values for model [name], model version, api_key, and device_name within the "rf" object.
  3. Save the python file to a directory - be sure to note the directory name and file name as we'll need these later for the deployment to work.
from roboflow import Roboflow
import json
# private api key found in Roboflow > YOURWORKSPACE > Roboflow API
# NOTE: this is your private key, not publishable key!
# https://docs.roboflow.com/rest-api#obtaining-your-api-key
private_api_key = "INSERT API KEY HERE"

# gain access to your workspace
rf = Roboflow(api_key=private_api_key)
workspace = rf.workspace()

# you can obtain your model path from your project URL, it is located
# after the name of the workspace within the URL - you can also find your
# model path on the Example Web App for any dataset version trained
# with Roboflow Train
# https://docs.roboflow.com/inference/hosted-api#obtaining-your-model-endpoint
model_path = "INSERT MODEL PATH HERE"
project = workspace.project(f"{model_path}")

# be sure to replace with a path to your file
# if you run this in Colab, be sure to upload the file to colab, hover over
# the file name, and select the 3 dots on the right of the file name to copy
# the file path, and paste it as the value for "imgpath"
img_path = "INSERT IMAGE PATH HERE"

# establish number of retries to use in case of upload failure
project.upload(f"{img_path}", num_retry_uploads=3)

Roboflow's Upload API

The Upload API can be used to upload images to a new project

  • example: Project 1: Currently Deployed Model | Project 2: Active Learning Images - randomly collected images | Project 3: Active Learning Images - images below confidence threshold of 30% | Project 4: Active Learning Images - images below a confidencethreshold of 50%

Roboflow's Inference API

The "InferenceHosted" and "UploadHosted" scripts are to be used for images that are hosted on a server. The "InferenceLocal" and "UploadLocal" scripts are to be used for images that are hosted on your device (i.e, computer hard drive or edge device).

Roboflow's Python Package