0. Introduction

In this tutorial I will explain how to use the Darknet tool to develop a object detector that can find the selected object using an image or a video as input. I trained it to find my Amazon basics mouse.

1. Installing Darknet

1.1. On windows

I found this github page where the author explains how to install darknet on windows using Visual Studio and how to use CUDA for GPU acceleration. In my opinion it’s easy and clear. If you need more infos, please write me an email.

1.2. On Linux

I used the “official” repo for pjreddie and cloned it using git:

git clone https://github.com/pjreddie/darknet.git
cd darknet

When you have to compile the source, you can choose wheter using CPU or CUDA computing. If you want to use Nvidia’s toolkit, you have to edit the first line of the Makefile:

GPU=0
#if you don't want to use CUDA
GPU=1
#instead

(I explain how to install CUDA drivers and cuDNN on both linux and Windows in this) If you have OpenCV installed or you want to be able to see the images processed by our detector, you’ll need to enable it in the Makefile like this:

OPENCV=1

3. Preparing the dataset

I experienced that the most difficult (sometime also boring) part of the detector training is the creation of the dataset and the labelling of all the images. I used 200 images, and I recommend not to use less than 150/175 images.

3.1. BBox Label Tool

I found this python2.7 script from this tutorial. This tool can label only images with “.JPEG” extension, so you’ll have to convert all the images before processing them with it. You have to put all the images in the “Images” folder, in a sub-directory that is named with sequential integer numbers like “000”, “001”, … First we have to clone the repo so:

git clone https://github.com/puzzledqs/BBox-Label-Tool.git

When we have everything, we can run it in the terminal if we are using linux:

cd BBox-Label-Tool-master
python2 main.py

If you are getting errors, you may need to install some python libraries:

sudo apt-get install python-pil

If you are using windows, you can use the IDLE to execute it and the libraries can be downloaded from here. You have to insert the image folder in the text box at the top of the app, that click “LOAD”. Then, you have to draw a box around the object you want to detect and do this for all images. The result should be something like this

Once all images are labeled, we can start the training process.

3.2. Converting the annotation

BBox-Label-Tool generates files like this:

1
492 304 607 375

But YOLOv2 needs a different format so we have to use another script. The script is written by Guanghan Ning but we have to change it a bit. First, change line 15:

classes = ["stopsign"]

Change the object that the detector shold find. In my case, a mouse

classes = ["mouse"]

Then line 34, 35, 37:

mypath = "labels/stopsign_original/"
outpath = "labels/stopsign/"

cls = "stopsign"

Change the first line with the path of the txts with the label infos, the second one with the destination folder and the cls with the object. In my case:

mypath = "BBox-Label-Tool-master/Labels/001/"
outpath = "label/"

cls = "mouse"

Then, you can run the converter:

python2 convert.py

At the end, the annotations should look like:

0 0.508796296296 0.419135802469 0.106481481481 0.0876543209877

3.3. Creating the config files

I used the script from here to generate train.txt and test.txt, the files that contain the information about the image location and description files. So download process.py and execute it:

python2 process.py

If everything was good, you shoud have a train.txt like this:

data/obj/00165.JPEG
data/obj/00195.JPEG
data/obj/00177.JPEG
data/obj/00175.JPEG
data/obj/00151.JPEG
...

Then we have to create the obj.data and obj.names files. The first one is:

classes= 1 
train  = train.txt 
valid  = test.txt 
names = obj.names 
backup = backup 

The second should have as many lines as the object classes, in my case only one so is something like this:

MOUSE

The last file is the neural network configuration. I only modified the default yolo-voc.cfg file, in order to be suitable for my 920m graphic card. You can find my configuration here.

4. Training

Darknet uses the same syntax both in windows and linux, so the commands are very similar. First of all, we have to go inside the darknet folder

cd darknet
mkdir backup

Then if you are using windows, then

cd build/darknet/x64

Here should be a file called “darknet.exe” and here you have to create a folder “backup”. If you don’t create it you’ll have error while training the detector because the software won’t know where to save the data. Now you can start the training itself. Type:

darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 #if you are on windows 
./darknet detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 #if you are on linux

The training automatically saves the trained weights every 100 cycle until 1000 and every 1000 after it. After 20h my 920m trained 2000 cycle and the result is pretty good.

5. Evaluation

To see if the detector is precise enough, you can use the the following commands:

./darknet detector test cfg/obj.data cfg/yolo-obj.cfg backup/yolo-obj_xxxx.weights data/obj/xxxxx.JPEG

You have to insert the name of the saved weights and the image you want to test. This is my test:

6. Notes

  • if you get errors like this you aren’t using enough images
 Avg IOU: -nan, Pos Cat: -nan, All Cat: -nan, Pos Obj: -nan, Any Obj: 0.001763, count: 0
  • using GPU instead of CPU improved the training a lot: with CPU (Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz) it took 2400s for each cycle, when with GPU(Nvidia 920m with 382 CUDA cores) it took only 78s