This is the fourth episode of our series that dives into training and deploying computer vision models at the edge. And I have some good news: we're already past the halfway point. Have a look at the series outline and the remainder of the fascicles:

How to install single node OpenShift on AWS
How to install single node OpenShift on bare metal
Red Hat OpenShift AI installation and set up
Model training in Red Hat OpenShift AI
Prepare and label custom datasets with Label Studio
Deploy computer vision applications at the edge with MicroShift

Introduction

Training artificial intelligence (AI) models is a pivotal process in the development of AI systems, demanding significant time and resources. The importance of this training phase cannot be overstated, as it is during this stage that the model learns to recognize patterns, make predictions, and perform tasks based on the provided data.

Red Hat OpenShift AI provides a robust platform for conducting AI model training. This platform empowers us to efficiently perform model development iterations, fine-tune parameters, and validate performance, ultimately facilitating the creation of high-quality AI solutions. We will use this platform to train our YOLO algorithm based on previously processed data.

Info alert: Note

Red Hat OpenShift AI deployment on single node OpenShift is currently not officially supported. Refer to the OpenShift AI official documentation to get more information about supported platforms and configurations.

For this demo, we have decided to develop a real-time detection system that will provide efficient and accurate animal locations and some statistics within video sequences. In this article, we are going to show you the complete process for training the model based on a dataset of animal images previously labeled. It is worth mentioning that the original set of images and labels has been prepared in advance and converted to YOLO format to simplify the explanation.

However, if you want to build a demo to detect a custom set of objects, don't worry. We show you how to label your own image dataset in the article Prepare and label custom datasets with Label Studio. Then you will only have to replace the animals dataset with your custom one.

Project set up

Once you are logged in the OpenShift AI dashboard, the first step is to create the project where all of the resources related to our project will live. It is considered a good practice to create separate projects each time to ensure component isolation and better access control. Here you have the steps to create the new project:

On the left menu in the dashboard, navigate to the Data Science Projects tab.
Click the Create data science project button.
There you can type your preferred project name. In my case, it will be safari.
Finally, click Create.

That’s how easy it is to create the project namespace. This is where all the resources tailored to this demo will be deployed.

Create workbench

Now that we have our newly created safari project, we can configure our workbench:

Click Create workbench.
Once on the workbench configuration page, complete the fields to match the following specifications:

Name: safari (insert any preferred name).
Image selection: PyTorch.
Version selection: 2023.2 (recommended).
Container size: Medium (this will depend on your node resources; the more, the better).
Accelerator: NVIDIA GPU.
Number of accelerators: 2. (In this case, our node has 2 NVIDIA GPU cards, but you will need to select the number that applies to your environment.)
Check the Create new persistent storage box.
- Name: safari (insert any preferred name).
- Persistent storage size: 80 GiB (we can always extend it later if needed).

Once you have completed the form, click Create workbench. You will be redirected to the project dashboard, where the workbench is starting. It could take a couple of minutes to pull the image before changing the status to Running. Your project should look similar to Figure 1.

Project dashboard. — Figure 1: In the project dashboard, the workbench status has switched to Running.

Now you can access the Workbench Interface by clicking Open and logging in using your kubeadmin credentials.

That's it! Our workbench is ready and the next step will be to deploy the Notebook containing the training instructions. Now, jump to the model training section.

Model training

The moment to directly work with the AI model has arrived. When you open your workbench, you will be directed to a Jupyter environment, a versatile computational platform used for interactive computing, data analysis, and scientific research. It provides a web-based interface that allows users to create files with code in multiple programming languages, as shown in Figure 2.

Notebooks in Jupyter serve as interactive computing documents that combine live code, visualizations, explanatory text, and multimedia elements in a single interface, allowing users to execute code blocks individually, visualize results immediately, and document their processes in real-time. We can always spin up a new Python Notebook and start programming right from scratch, but OpenShift AI also makes it possible to import Git repositories and visualize the dataset and Notebooks directly from the Jupyter interface.

On the left side of the screen, you should see the Git icon (shown in Figure 3).

Click Clone a Repository and paste the Safari GitHub repository URL: https://github.com/OpenShiftDemos/safari-demo

After a few moments, you should see the safari-demo directory cloned in your Jupyter environment. Let me briefly explain the repo folder architecture:

notebooks: Stores different notebooks. We will use: Safari_YOLOv8.ipynb.
dataset: Contains the images and annotations for the animal images.
weights: Stores the weights resulting from the training. You can use them in case you don’t want to train the model on your own.

Info alert: Note

If you are trying to build your own model, you can use the notebook in the safari-demo as a reference. Remember to adapt the following steps to point out the model to your custom dataset.

Navigate to safari-demo > notebooks > Safari_YOLOv8.ipynb to open the notebook. The file contains code cells that can be run by clicking the Play button at the top. At this point, you can proceed with the training by reading through the notebook or via this article, as we will be reviewing some of the most important code cells.

First of all, we are going to clone the official YOLO repository and install some of the package dependencies:

!pip install --upgrade pip
!pip install pickleshare
!pip install seaborn
!pip install opencv-python-headless
!pip install py-cpuinfo

!git clone https://github.com/ultralytics/ultralytics
%cd ultralytics

from ultralytics import YOLO
from PIL import Image

Next, verify that the images and labels for the training are in the right path. If you are using your own dataset, from now on, you will have to replace this information with the path where your dataset images are stored.

!ls /opt/app-root/src/safari-demo/dataset/*

The output will show us the training, test, and validation folders with the images and labels subfolders. Also, the data.yaml file will be listed. Let me show you the information this file contains:

train: /opt/app-root/src/safari-demo/dataset/train/images
val: /opt/app-root/src/safari-demo/dataset/test/images

nc: 80
names: ['Hippopotamus', 'Sparrow', 'Magpie', 'Rhinoceros', 'Seahorse', 'Butterfly', 'Ladybug', 'Raccoon', 'Crab', 'Pig', 'Bull', 'Snail', 'Lynx', 'Turtle', 'Canary', 'Moths and butterflies', 'Fox', 'Cattle', 'Turkey', 'Scorpion', 'Goldfish', 'Giraffe', 'Bear', 'Penguin', 'Squid', 'Zebra', 'Brown bear', 'Leopard', 'Sheep', 'Hamster', 'Panda', 'Duck', 'Camel', 'Owl', 'Tiger', 'Whale', 'Crocodile', 'Eagle', 'Otter', 'Starfish', 'Goat', 'Jellyfish', 'Mule', 'Red panda', 'Raven', 'Mouse', 'Centipede', 'Lizard', 'Cheetah', 'Woodpecker', 'Sea lion', 'Shrimp', 'Polar bear', 'Parrot', 'Kangaroo', 'Worm', 'Caterpillar', 'Spider', 'Chicken', 'Monkey', 'Rabbit', 'Koala', 'Jaguar', 'Swan', 'Frog', 'Hedgehog', 'Sea turtle', 'Horse', 'Ostrich', 'Harbor seal', 'Fish', 'Squirrel', 'Deer', 'Lion', 'Goose', 'Shark', 'Tortoise', 'Snake', 'Elephant', 'Tick']

Info alert: Note

You will need to create a similar file if you are using a custom dataset, modifying the number of classes, labels list, and the route to your images.

As you can see, this is the file that YOLO uses as a reference to know where the training and validation folders are located. We also need to let it know how many classes we have. In our case, there are 80 different animals. Next comes the list of the class names in order. This is important when labeling the dataset images. Figure 4 shows an example.

Labels format. — Figure 4: Each line in the text file corresponds to a boundary box in the image.

Each line in the text file corresponds to a boundary box. The first number on each line corresponds to the class name. In this example, 0 means Zidane, but in our model, 0=Hippopotamus, as shown in the data.yaml.

Now that we know the basics, it’s time to train the model. As you can see below, the code is quite simple. First, we load a pretrained model that the YOLO Ultralytics team provides. These weights will be used as a starting point for the training with the new animal data. Next, we just need to call the train function and fill in a couple of parameters:

data: the path to our data.yaml file.
epoch: maximum number of iterations during the training.
imgsz: size of the images used for the training.
batch: number of images used during each training iteration.

model = YOLO("yolov8m.pt")
model.train(data='/opt/app-root/src/safari-demo/dataset/data.yaml', epochs=100, imgsz=640, batch=16)

Here starts the training of the YOLOv8 model using our dataset. In the first line of the output shown when running the cell, you should spot your GPU card, which is used to speed up the process. In my case, it’s the Tesla M60 GPU card:

Ultralytics YOLOv8.0.221 🚀 Python-3.9.16 torch-1.13.1+cu117 CUDA:0 (Tesla M60, 15102MiB)

Wait until the training process finishes. This will be done automatically when either the function reaches the iteration number specified in the epoch parameter or if at some point there is no significant accuracy improvement between iterations. The training time will depend on different factors, including the size of the images and the GPU used. When finished, the weights file will be automatically saved in the following folder:

Results saved to runs/detect/train

At this point, our recently trained model should be able to detect animals on images. Let's try it out by passing a sample image. We just need to load our weights file to the model and specify the path to the image used as an example.

Info alert: Note

If you want to save some time and skip the training process, you can use the weights file provided in the Git repository (safari-demo > weights > best.pt). Modify the paths to point to the file if needed.

model = YOLO('/opt/app-root/src/safari-demo/notebooks/ultralytics/runs/detect/train/weights/best.pt')
results = model('/opt/app-root/src/safari-demo/dataset/validation/sample.png', save=True)

Here you have the results (Figure 5):

Image.open('/opt/app-root/src/safari-demo/notebooks/ultralytics/runs/detect/predict/sample.png')

Brown bear image. — Figure 5: The trained model correctly identifies a brown bear in the image, which means that the model is working.

Our brown bear is detected correctly. Now that we know that our model is working, we just need to save the model in onnx format so that we can use it in a container image later:

model.export(format='onnx')

The file is saved in the following folder. Navigate to the directory and download it to your computer. We will use it later to be part of our Safari application:

Results saved to
 /opt/app-root/src/safari-demo/notebooks/ultralytics/runs/detect/train/weights/best.onnx

That’s all we need for the training. We are ready to jump to the latest episode: the model deployment in Red Hat build of MicroShift.

Next steps

In this tutorial, you used Red Hat OpenShift AI to train a YOLO v8 model. Our exploration has not only delved into the intricacies of object detection but also showcased the integration of computer vision cutting-edge technology with the robust OpenShift platform.

As we bid farewell to this series, our final destination awaits in the next article, where we will witness the deployment of our trained model onto MicroShift. Join us in the grand finale: Deploy computer vision applications at the edge with MicroShift

Last updated: May 3, 2024

Model training in Red Hat OpenShift AI

Introduction

Project set up

Create workbench

Model training

Next steps

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue