0. Introduction
In this tutorial I will explain how to use the Darknet tool to develop a object detector that can find the selected object using an image or a video as input. I trained it to find my Amazon basics mouse.
1. Installing Darknet
1.1. On windows
I found this github page where the author explains how to install darknet on windows using Visual Studio and how to use CUDA for GPU acceleration. In my opinion it’s easy and clear. If you need more infos, please write me an email.
1.2. On Linux
I used the “official” repo for pjreddie and cloned it using git:
git clone https://github.com/pjreddie/darknet.git
cd darknet
When you have to compile the source, you can choose wheter using CPU or CUDA computing. If you want to use Nvidia’s toolkit, you have to edit the first line of the Makefile:
GPU=0
#if you don't want to use CUDA
GPU=1
#instead
(I explain how to install CUDA drivers and cuDNN on both linux and Windows in this) If you have OpenCV installed or you want to be able to see the images processed by our detector, you’ll need to enable it in the Makefile like this:
OPENCV=1
3. Preparing the dataset
I experienced that the most difficult (sometime also boring) part of the detector training is the creation of the dataset and the labelling of all the images. I used 200 images, and I recommend not to use less than 150/175 images.
3.1. BBox Label Tool
I found this python2.7 script from this tutorial. This tool can label only images with “.JPEG” extension, so you’ll have to convert all the images before processing them with it. You have to put all the images in the “Images” folder, in a sub-directory that is named with sequential integer numbers like “000”, “001”, … First we have to clone the repo so:
git clone https://github.com/puzzledqs/BBox-Label-Tool.git
When we have everything, we can run it in the terminal if we are using linux:
cd BBox-Label-Tool-master
python2 main.py
If you are getting errors, you may need to install some python libraries:
sudo apt-get install python-pil
If you are using windows, you can use the IDLE to execute it and the libraries can be downloaded from here. You have to insert the image folder in the text box at the top of the app, that click “LOAD”. Then, you have to draw a box around the object you want to detect and do this for all images. The result should be something like this
Once all images are labeled, we can start the training process.
3.2. Converting the annotation
BBox-Label-Tool generates files like this:
1
492 304 607 375
But YOLOv2 needs a different format so we have to use another script. The script is written by Guanghan Ning but we have to change it a bit. First, change line 15:
classes = ["stopsign"]
Change the object that the detector shold find. In my case, a mouse
classes = ["mouse"]
Then line 34, 35, 37:
mypath = "labels/stopsign_original/"
outpath = "labels/stopsign/"
cls = "stopsign"
Change the first line with the path of the txts with the label infos, the second one with the destination folder and the cls with the object. In my case:
mypath = "BBox-Label-Tool-master/Labels/001/"
outpath = "label/"
cls = "mouse"
Then, you can run the converter:
python2 convert.py
At the end, the annotations should look like:
0 0.508796296296 0.419135802469 0.106481481481 0.0876543209877
3.3. Creating the config files
I used the script from here to generate train.txt and test.txt, the files that contain the information about the image location and description files. So download process.py and execute it:
python2 process.py
If everything was good, you shoud have a train.txt like this:
data/obj/00165.JPEG
data/obj/00195.JPEG
data/obj/00177.JPEG
data/obj/00175.JPEG
data/obj/00151.JPEG
...
Then we have to create the obj.data and obj.names files. The first one is:
classes= 1
train = train.txt
valid = test.txt
names = obj.names
backup = backup
The second should have as many lines as the object classes, in my case only one so is something like this:
MOUSE
The last file is the neural network configuration. I only modified the default yolo-voc.cfg file, in order to be suitable for my 920m graphic card. You can find my configuration here.
4. Training
Darknet uses the same syntax both in windows and linux, so the commands are very similar. First of all, we have to go inside the darknet folder
cd darknet
mkdir backup
Then if you are using windows, then
cd build/darknet/x64
Here should be a file called “darknet.exe” and here you have to create a folder “backup”. If you don’t create it you’ll have error while training the detector because the software won’t know where to save the data. Now you can start the training itself. Type:
darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 #if you are on windows
./darknet detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 #if you are on linux
The training automatically saves the trained weights every 100 cycle until 1000 and every 1000 after it. After 20h my 920m trained 2000 cycle and the result is pretty good.
5. Evaluation
To see if the detector is precise enough, you can use the the following commands:
./darknet detector test cfg/obj.data cfg/yolo-obj.cfg backup/yolo-obj_xxxx.weights data/obj/xxxxx.JPEG
You have to insert the name of the saved weights and the image you want to test. This is my test:
6. Notes
- if you get errors like this you aren’t using enough images
Avg IOU: -nan, Pos Cat: -nan, All Cat: -nan, Pos Obj: -nan, Any Obj: 0.001763, count: 0
- using GPU instead of CPU improved the training a lot: with CPU (Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz) it took 2400s for each cycle, when with GPU(Nvidia 920m with 382 CUDA cores) it took only 78s
November 21, 2017 at 2:19 pm
Hi, Thanks for the tutorial, however there is no content in your github repo. Can I know how you modify it? I have a 720M graphic card here
November 21, 2017 at 2:20 pm
Hi, thank you for your tutorial. However, currently there is nothing in your github repo. Can I know how you modify the cfg file? I have a 720M graphic card with me.. thanks
November 21, 2017 at 3:22 pm
Hi Kyle. I have some trouble with the uploading of my files due to dimensions. I hope I’ll find some ways to upload it on the cloud.
You can download my cfg file here. I think my configuration should be suitable for your 720M. If not, change this lines:
[net]
# Testing
batch=64
subdivisions=64
At the beginning of the file decreasing the batch value and increasing the subdivisions value until it works. The first one represents the number of images used per step, the second one the VRAM used (more is less VRAM).
November 21, 2017 at 3:35 pm
Thanks for the super quick reply Simone,
I am facing a bit confusion here as which directory or where should I be storing all our images & its respective .txt annotation file, train.txt and test.txt. I believe it should be storing somewhere with the darknet dir but im not sure where. Can you please help me?
November 21, 2017 at 3:51 pm
Thank you Kyle. This is my folder tree:
YOLOv2_main_folder
├── backup
├── cfg
| ├── obj.data
| ├── obj.names
| └── yolo-obj.cfg
├── data
| └── obj
| └── 00000.JPEG
| ├── 00000.txt
| └── ...
├── train.txt
├── test.txt
└── darknet
November 22, 2017 at 6:42 am
Hey Thanks again,
I am running the training on Ubuntu (VM), it runs really slow so I am suspecting that it does not link with my GPU?
November 22, 2017 at 10:14 am
Hi Kyle,
I’m not totally sure, but I think that the VM is the bottleneck. Are you using the CUDA drivers?
March 26, 2018 at 11:58 am
You really make it seem so easy with your presentation but I find this matter to be actually something that I think I would never understand. It seems too complex and very broad for me. I am looking forward for your next post, I’ll try to get the hang of it!
March 28, 2018 at 8:37 pm
Thank you so much for this comment. If you need some further help, I’ll be glad to answer as soon as possible.