How to train YOLACT: Real-time instance segmentation with Custom Dataset

5 min readMay 30, 2021

Introduction

In object detection methods object is highlighted by a bounding box, where the bounding box is represented by four pixel points. The instance segmentation method takes image annotation to the next level. Where it uses pixel-level to highlight an object. Instance mask contains multiple pixel points as coordinates that highlight the outline of the object to detect it. YOLACT (You Only Look At CoefficienTs) is a real-time one-stage instance segmentation model developed to detect object instances and pixel segmented masks. (Paper link: YOLACT: Real-time Instance Segmentation) The authors mentioned the model achieved 29.8 mAP on MS COCO at 33.5 fps on a single Titan XP. In this post, we’ll walk through how to prepare a custom dataset for instance segmentation, and train it on YOLACT.

Speed-performance trade-off for various instance segmentation methods on COCO (Source — YOLACT paper)

Data Preparation

Data Collection

For this work, the method was applied to identify defective leaves. The custom data set was built by collecting images of two different classes of leaves, which are defective and not.

Image Annotation

I used the labelme image annotation tool for the data labeling process. Please download and install label me from the following link. I prefer labelme, which is easy to use and they gave a good example for instance segmentation (labelme-Instance Segmentation Example). Before the annotation process, create a labels.txt file with the class names, add “__ignore__” and “_background_” as the first two rows of the file. Then execute the following command to run labelme.

labelme <image_folder> --labels labels.txt --autosave

Use create polygons to annotate images and it will generate a JSON file for each annotated image in the image folder.

I create separate datasets for training and validation. After the annotation process, the dataset must convert to COCO JSON format. For that process, run the labelme2coco.py script from labelme-Instance Segmentation Example

./labelme2coco.py <input_folder> <output_folder> --labels labels.txt

After that process, It will create a COCO format JSON file with annotated images in the output folder.

Training Process

For the training process, clone the YOLACT Github repository from https://github.com/dbolya/yolact and follow the installation steps from the repository. I set up the anaconda with PyTorch 1.4.0 and torchvision 0.5.0 as my environment in Ubuntu 20.04.2 LTS.

Dataset Configuration

For the dataset configuration, open the yolact/data/config.py file and go to the “DATASETS” section. Create a copy from dataset_base and configure it for the customs data set. I configured it according to my data set. Set the ‘train_images’ path to your training images folder and the ‘valid_images’ path to the validation images folder. Also, set the ‘train_info’ path to the train annotation file and the ‘valid_info’ path to the validation annotation file. Here I used two classes for the training and set the ‘class_names’ with class labels.

# ----------------------- DATASETS ----------------------- #
.
.
.leaves_dataset = dataset_base.copy({'name': 'Leaves_Dataset','train_images':'Input_data/Train_Images',
'train_info':'Input_data/train_annotations.json','valid_images': 'Input_data/Valid_Images',
'valid_info':   'Input_data/valid_annotations.json','class_names': ('defective', 'non_defective'),})

Model Configuration

After the dataset configuration process, we have to configure the model. They have given imagenet pre-trained models for the training process. Check out the training section in the YOLACT repository for the pre-trained models. Resnet101, Resnet50 and Darknet53 are the pre-trained models they have provided. I used the pre-trained Darknet53 model in this training process. You can download the models from the given links in the repository. After the downloading, place it in the weights folder. Then go to the “YOLACT v1.0 CONFIGS” section in the yolact/data/config.py file and set up the model configuration. In here, I used Darknet53 as the backbone architecture, so yolact_darknet53_config was used for model configuration. Set the ‘dataset' to the dataset name, which is used in the earlier step. If you need, you can use Resnet50 or Resnet101 as the backbone architecture and you need to change the model configuration according to the model. Here, ‘max_size’ represents the image size.

# ----------------------- YOLACT v1.0 CONFIGS -------------------- #
.
.
.yolact_darknet53_leaves_config = yolact_darknet53_config.copy({'name': "leaves_detection",
'dataset': leaves_dataset,
'num_classes': len(leaves_dataset.class_names) + 1,
'max_size': 416,})

After the model configuration open the yolact/train.py file, In here you can change the parameters for the training or run python train.py --help to view the description of arguments. They have given a good explanation for those arguments. If you are willing to use the default parameters, execute the following command for the training with the config argument. You have to give the model configuration name as the config argument.

python train.py --config=yolact_darknet53_leaves_custom_config

During the training process, it calculates the mAP of the mask and bounding for different IOU thresholds. You can get an idea of the model performance from that table.

It will take some time for the training. After several epochs, I eliminated the training process. Because the result was good enough on the validation dataset. A weight file will save in the weights folder after the interruption. You can use that weight file for the evaluation. Execute the following command for the evaluation process.

python eval.py --trained_model=weights/leaves_detection_185_1782_interrupt.pth --config=yolact_darknet53_leaves_custom_config --score_threshold=0.60 --images=test_images:output_images

If you want to get better results or continue the training after the interruption, you can resume the training from the previous weights. Check out the YOLACT github repository for more information.