[Docker] YOLOv4 training


pre-requirment


docker, nvidia-docker

Dataset folder


Make directory that mount to docker container(folder)


dir location is up to you
$ mkdir data
$ cd data
$ mkdir train test backup cfg

Train/Test data (image, annotation file)


move your custom data pair(image,annotation file) to each train/test directory
/data/test

Make names file(.names)


custom.names is whatever you want
ex) esens.names
but you have to uniform the name with .data
$ vi <custom>.names

list up your custom dataset classes
class number
0 : person
1 : bicycle
2 : car
...
5 : truck

Make data file(.data)


custom.data is whatever you want
ex) esens.data
but you have to uniform the name
$ vi <custom>.data

class = custom data class number
train = train data list file location
test = test data list file location
names = .names file location
backup = weight are stored to backup folder

Make train/test data list file(.txt)


use this code below to list up your train/test dataset
-> train.txt, test.txt
run this code at /data
import os

#dataset 정합 후 train:test = 8:2 로 data/train data/test 에 저장하고 
#이코드 각 각 돌려서 위에처럼 txt파일 생성해야함 생성된 txt파일들 train돌릴때 넣어줘야함

# dir_path => "<your_directory>/data"
dir_path = "/home/jay/DataSets/coco/data"
train_data_path = "data/train"
test_data_path = "data/test"

data = []
for (root, directories, files) in os.walk(dir_path):
    for file in files:
        if '.jpg' in file:
            #file_path = os.path.join(train_data_path, file)
            file_path = os.path.join(test_data_path, file)
            print(file_path)
            data.append(file_path)

# train.txt / test.txt 
data_list = open(r'test.txt','w') 

for i in data:
    data_list.write(i + '\n')

data_list.close()
result test.txt

cfg file (.cfg)


if you use custom cfg file you can put in here
copy yolov4.cfg or yolov4-tiny.cfg ... from alexeyab/darknet github repository and change parameters
  • batch : 64
  • division=16(メモリ負荷32または64)
  • max_batch = 2000 * = 2000
  • steps = 0.8max_batch, 0.9max_batch = 1600, 1800
  • width、heigth:32の倍数APがいいです.416, 416
  • 各[yolo]レイヤ]class=
  • 重要![yolo]前[畳み込み]層フィルタ=(+5)*3=18
  • random=1なら解像度の違う環境でAPも良いです.when error occurs change to 0
  • my custom data cfg file
    [net]
    batch=64
    subdivisions=32
    # Training
    #width=512
    #height=512
    width=416
    height=416
    channels=3
    momentum=0.949
    decay=0.0005
    angle=0
    saturation = 1.5
    exposure = 1.5
    hue=.1
    
    learning_rate=0.0013
    burn_in=1000
    # class num * 2000
    max_batches = 12000
    policy=steps
    # max_batches *0.8 and *0.9
    steps=9600,10800
    scales=.1,.1
    
    #cutmix=1
    mosaic=1
    
    #:104x104 54:52x52 85:26x26 104:13x13 for 416
    
    [convolutional]
    batch_normalize=1
    filters=32
    size=3
    stride=1
    pad=1
    activation=mish
    
    # Downsample
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=3
    stride=2
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -2
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=32
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -1,-7
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    # Downsample
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=2
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -2
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=64
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -1,-10
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    # Downsample
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=2
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -2
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -1,-28
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    # Downsample
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=3
    stride=2
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -2
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -1,-28
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    # Downsample
    
    [convolutional]
    batch_normalize=1
    filters=1024
    size=3
    stride=2
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -2
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=3
    stride=1
    pad=1
    activation=mish
    
    [shortcut]
    from=-3
    activation=linear
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=mish
    
    [route]
    layers = -1,-16
    
    [convolutional]
    batch_normalize=1
    filters=1024
    size=1
    stride=1
    pad=1
    activation=mish
    
    ##########################
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=1024
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    ### SPP ###
    [maxpool]
    stride=1
    size=5
    
    [route]
    layers=-2
    
    [maxpool]
    stride=1
    size=9
    
    [route]
    layers=-4
    
    [maxpool]
    stride=1
    size=13
    
    [route]
    layers=-1,-3,-5,-6
    ### End SPP ###
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=1024
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [upsample]
    stride=2
    
    [route]
    layers = 85
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [route]
    layers = -1, -3
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [upsample]
    stride=2
    
    [route]
    layers = 54
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [route]
    layers = -1, -3
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=256
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=256
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=128
    size=1
    stride=1
    pad=1
    activation=leaky
    
    ##########################
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=256
    activation=leaky
    
    [convolutional]
    size=1
    stride=1
    pad=1
    filters=22
    activation=linear
    
    
    [yolo]
    #mask = 0,1,2
    #anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
    mask = 0,1
    anchors = 13, 25,  42, 76,  73,184, 154,254, 313,302
    classes=6
    num=9
    jitter=.3
    ignore_thresh = .7
    truth_thresh = 1
    scale_x_y = 1.2
    iou_thresh=0.213
    cls_normalizer=1.0
    iou_normalizer=0.07
    iou_loss=ciou
    nms_kind=greedynms
    beta_nms=0.6
    max_delta=5
    
    
    [route]
    layers = -4
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=2
    pad=1
    filters=256
    activation=leaky
    
    [route]
    layers = -1, -16
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=256
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=512
    activation=leaky
    
    [convolutional]
    size=1
    stride=1
    pad=1
    filters=22
    activation=linear
    
    
    [yolo]
    #mask = 3,4,5
    #anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
    mask = 2,3 
    anchors = 13, 25,  42, 76,  73,184, 154,254, 313,302
    classes=6
    num=9
    jitter=.3
    ignore_thresh = .7
    truth_thresh = 1
    scale_x_y = 1.1
    iou_thresh=0.213
    cls_normalizer=1.0
    iou_normalizer=0.07
    iou_loss=ciou
    nms_kind=greedynms
    beta_nms=0.6
    max_delta=5
    
    
    [route]
    layers = -4
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=2
    pad=1
    filters=512
    activation=leaky
    
    [route]
    layers = -1, -37
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=1024
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=1024
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    filters=512
    size=1
    stride=1
    pad=1
    activation=leaky
    
    [convolutional]
    batch_normalize=1
    size=3
    stride=1
    pad=1
    filters=1024
    activation=leaky
    
    [convolutional]
    size=1
    stride=1
    pad=1
    filters=33
    activation=linear
    
    
    [yolo]
    mask = 3,4,5 
    #anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
    anchors = 13, 25,  42, 76,  73,184, 154,254, 313,302
    classes=6
    num=9
    jitter=.3
    ignore_thresh = .7
    truth_thresh = 1
    random=0
    scale_x_y = 1.05
    iou_thresh=0.213
    cls_normalizer=1.0
    iou_normalizer=0.07
    iou_loss=ciou
    nms_kind=greedynms
    beta_nms=0.6
    max_delta=5

    Result



    Docker image and train


    Pull Docker image

    $ sudo docker pull cjh2626002/yolo-ros:train

    run image with mount

    $ xhost +
    $ sudo docker run -it -e DISPLAY=unix$DISPLAY --device /dev/video0:/dev/video0 --privileged -v /tmp/.X11-unix/:/tmp/.X11-unix/ -v <your-data-directory>/data/:/workspace/darknet/data --gpus all cjh2626002/yolo-ros:train

    docker container with darknet

    # cd darknet/data
    you can check your local directory is mounted to container

    Training


    execute darknet at root directory of repo
    and you have to download pre-trained weight file
    yolov4 : https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137
    # ./darknet detector train data/<custom>.data data/cfg/<custom>.cfg yolov4.conv.137 -map

    enjoy!