【厨丁解牛】ゼロからRetinaNet(六):RetinaNetのトレーニングとテストを実現
126797 ワード
文書ディレクトリ
ゼロ実装RetinaNet(一)から(五)まで、RetinaNetを完全に再現しました.この再現の構想は主にターゲット検出器を3つの独立した部分に分ける:順方向ネットワーク、loss計算、decode復号.よく見ると、loss部分とdecode部分には実際に一定の重複コードが存在することがわかりますが、私は重複コードを取り出してコード多重化していません.これは主に3つの部分の高集約と低結合を実現するためです.そうすれば、現在の最新の目標検出方法を適用して3つの独立部分を変更し、積み木のように改善された目標検出器を構築することができます.次はRetinaNetのトレーニングとテストを開始することができます.
RetinaNetのトレーニング
RetinaNet論文(https://arxiv.org/pdf/1708.02002.pdf)では、標準的なトレーニング方法は、momentum=0.9、weight_decay=0.0001のSGDオプティマイザ、batch_size=16であり、カード間同期BNを使用する.合計90000回反復し,初期学習率は0.01であり,60000回と80000回でそれぞれ学習率を10で割った.私の訓練過程は上と少し違いますが、違いは大きくありません.16に90000を乗じて118287(COCO 2017_trainのピクチャの数)で割ると、約12.17個のepochが計算される.だから私たちの訓練は12個のepochまで訓練しました.簡単にするために、私はAdamオプティマイザを使って、学習率を自動的に減衰することができます.従来の経験によれば、Adamオプティマイザは初期収束速度がsgdより速いが、最終訓練の結果はsgdよりやや劣る(収束の局所的な最良の利点はsgdの良いものではない)が、差は小さく、RetinaNetでは一般的なmAP差は最大0.5ポイントを超えない.DetectronとDetectron 2の枠組みでは,前述のRetinaNet論文の標準的な訓練方法を1 x_と呼ぶ.training.同様に、反復回数と学習率減衰の反復回数indexに2と3を乗じたものを2 x_と呼ぶtrainingと3 x_training.
COCOデータセットでRetinaNetをテストする
COCOでRetinaNetの性能をテストするにはpycocotoolsを直接使用することができます.cocoevalのCOevalクラスが提供するAPI.RetinaNetクラスの順方向計算結果(anchorを含む)をRetinaDecoderクラスに送り込んで復号し、復号後のbboxをscaleに従って元の画像上のサイズに拡大すればよい(復号後のbboxサイズサイズはresize後の画像より大きいため).次に、各画像で検出されたターゲットの無効なターゲット(class_indexは-1)をフィルタリングし、josnファイルに一定のフォーマットで書き込み、COCOevalを呼び出して計算すればよい.
COCOevalクラスは12個の性能指標を提供する:self.maxDets = [1, 10, 100] # decoder max_detection_num
stats[0] = _summarize(1)
stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
各結果の意味は次のとおりです. # , COCO stats[0], coco2017_val coco2017_test , 0.2~0.5
stats[0] : IoU=0.5:0.95,area=all,maxDets=100,mAP
stats[1] : IoU=0.5,area=all,maxDets=100,mAP
stats[2] : IoU=0.75,area=all,maxDets=100,mAP
stats[3] : IoU=0.5:0.95,area=small,maxDets=100,mAP
stats[4] : IoU=0.5:0.95,area=medium,maxDets=100,mAP
stats[5] : IoU=0.5:0.95,area=large,maxDets=100,mAP
stats[6] : IoU=0.5:0.95,area=all,maxDets=1,mAR
stats[7] : IoU=0.5:0.95,area=all,maxDets=10,mAR
stats[8] : IoU=0.5:0.95,area=all,maxDets=100,mAR
stats[9] : IoU=0.5:0.95,area=small,maxDets=100,mAR
stats[10]:IoU=0.5:0.95,area=medium,maxDets=100,mAR
stats[11]:IoU=0.5:0.95,area=large,maxDets=100,mAR
COCOデータセットでテストされたコードは以下のように実現される.def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_eval_result = evaluate_coco(val_dataset, model, decoder)
return all_eval_result
def evaluate_coco(val_dataset, model, decoder):
results, image_ids = [], []
for index in range(len(val_dataset)):
data = val_dataset[index]
scale = data['scale']
cls_heads, reg_heads, batch_anchors = model(data['img'].cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
scores, classes, boxes = decoder(cls_heads, reg_heads, batch_anchors)
scores, classes, boxes = scores.cpu(), classes.cpu(), boxes.cpu()
boxes /= scale
# make sure decode batch_size=1
# scores shape:[1,max_detection_num]
# classes shape:[1,max_detection_num]
# bboxes shape[1,max_detection_num,4]
assert scores.shape[0] == 1
scores = scores.squeeze(0)
classes = classes.squeeze(0)
boxes = boxes.squeeze(0)
# for coco_eval,we need [x_min,y_min,w,h] format pred boxes
boxes[:, 2:] -= boxes[:, :2]
for object_score, object_class, object_box in zip(
scores, classes, boxes):
object_score = float(object_score)
object_class = int(object_class)
object_box = object_box.tolist()
if object_class == -1:
break
image_result = {
'image_id':
val_dataset.image_ids[index],
'category_id':
val_dataset.find_category_id_from_coco_label(object_class),
'score':
object_score,
'bbox':
object_box,
}
results.append(image_result)
image_ids.append(val_dataset.image_ids[index])
print('{}/{}'.format(index, len(val_dataset)), end='\r')
if not len(results):
print("No target detected in test set images")
return
json.dump(results,
open('{}_bbox_results.json'.format(val_dataset.set_name), 'w'),
indent=4)
# load results in COCO evaluation tool
coco_true = val_dataset.coco
coco_pred = coco_true.loadRes('{}_bbox_results.json'.format(
val_dataset.set_name))
coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
all_eval_result = coco_eval.stats
return all_eval_result
COCOデータセットでのトレーニングとテストでは、RetinaNet論文のデータセット設定に従い、coco_を使用します.2017_trainデータセットトレーニングモデル、coco_を使用2017_valデータセットテストモデル.IoU=0.5:0.95の場合、最大100個のdetectターゲットを保持し、すべてのサイズのターゲットのmAP(pycocools.coevalのCOCOevalクラスの_summarizeDets関数のstats[0]値)をモデルのパフォーマンス表現として保持します.
VOCデータセットでRetinaNetをテストする
VOCデータセットでトレーニングとテストを行う場合、detectron 2でfaster rcnnを使用してVOCデータセットでテストをトレーニングする方法(https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)を使用して、VOC 2007 trainval+VOC 2012 trainvalデータセットトレーニングモデルを使用し、VOC 2007 testデータセットテストモデルを使用します.試験時にVOC 2007の11 point metric方式を用いてmAPを計算する.
テストコードは古典的なVOCテストコードを使用して、入力と出力を適切にしただけです.def compute_voc_ap(recall, precision, use_07_metric=True):
if use_07_metric:
# use voc 2007 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(recall >= t) == 0:
p = 0
else:
# get max precision for recall >= t
p = np.max(precision[recall >= t])
# average 11 recall point precision
ap = ap + p / 11.
else:
# use voc>=2010 metric,average all different recall precision as ap
# recall add first value 0. and last value 1.
mrecall = np.concatenate(([0.], recall, [1.]))
# precision add first value 0. and last value 0.
mprecision = np.concatenate(([0.], precision, [0.]))
# compute the precision envelope
for i in range(mprecision.size - 1, 0, -1):
mprecision[i - 1] = np.maximum(mprecision[i - 1], mprecision[i])
# to calculate area under PR curve, look for points where X axis (recall) changes value
i = np.where(mrecall[1:] != mrecall[:-1])[0]
# sum (\Delta recall) * prec
ap = np.sum((mrecall[i + 1] - mrecall[i]) * mprecision[i + 1])
return ap
def compute_ious(a, b):
"""
:param a: [N,(x1,y1,x2,y2)]
:param b: [M,(x1,y1,x2,y2)]
:return: IoU [N,M]
"""
a = np.expand_dims(a, axis=1) # [N,1,4]
b = np.expand_dims(b, axis=0) # [1,M,4]
overlap = np.maximum(0.0,
np.minimum(a[..., 2:], b[..., 2:]) -
np.maximum(a[..., :2], b[..., :2])) # [N,M,(w,h)]
overlap = np.prod(overlap, axis=-1) # [N,M]
area_a = np.prod(a[..., 2:] - a[..., :2], axis=-1)
area_b = np.prod(b[..., 2:] - b[..., :2], axis=-1)
iou = overlap / (area_a + area_b - overlap)
return iou
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_ap, mAP = evaluate_voc(val_dataset,
model,
decoder,
num_classes=20,
iou_thread=0.5)
return all_ap, mAP
def evaluate_voc(val_dataset, model, decoder, num_classes=20, iou_thread=0.5):
preds, gts = [], []
for index in tqdm(range(len(val_dataset))):
data = val_dataset[index]
img, gt_annot, scale = data['img'], data['annot'], data['scale']
gt_bboxes, gt_classes = gt_annot[:, 0:4], gt_annot[:, 4]
gt_bboxes /= scale
gts.append([gt_bboxes, gt_classes])
cls_heads, reg_heads, batch_anchors = model(img.cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
preds_scores, preds_classes, preds_boxes = decoder(
cls_heads, reg_heads, batch_anchors)
preds_scores, preds_classes, preds_boxes = preds_scores.cpu(
), preds_classes.cpu(), preds_boxes.cpu()
preds_boxes /= scale
# make sure decode batch_size=1
# preds_scores shape:[1,max_detection_num]
# preds_classes shape:[1,max_detection_num]
# preds_bboxes shape[1,max_detection_num,4]
assert preds_scores.shape[0] == 1
preds_scores = preds_scores.squeeze(0)
preds_classes = preds_classes.squeeze(0)
preds_boxes = preds_boxes.squeeze(0)
preds_scores = preds_scores[preds_classes > -1]
preds_boxes = preds_boxes[preds_classes > -1]
preds_classes = preds_classes[preds_classes > -1]
preds.append([preds_boxes, preds_classes, preds_scores])
print("all val sample decode done.")
all_ap = {}
for class_index in tqdm(range(num_classes)):
per_class_gt_boxes = [
image[0][image[1] == class_index] for image in gts
]
per_class_pred_boxes = [
image[0][image[1] == class_index] for image in preds
]
per_class_pred_scores = [
image[2][image[1] == class_index] for image in preds
]
fp = np.zeros((0, ))
tp = np.zeros((0, ))
scores = np.zeros((0, ))
total_gts = 0
# loop for each sample
for per_image_gt_boxes, per_image_pred_boxes, per_image_pred_scores in zip(
per_class_gt_boxes, per_class_pred_boxes,
per_class_pred_scores):
total_gts = total_gts + len(per_image_gt_boxes)
# one gt can only be assigned to one predicted bbox
assigned_gt = []
# loop for each predicted bbox
for index in range(len(per_image_pred_boxes)):
scores = np.append(scores, per_image_pred_scores[index])
if per_image_gt_boxes.shape[0] == 0:
# if no gts found for the predicted bbox, assign the bbox to fp
fp = np.append(fp, 1)
tp = np.append(tp, 0)
continue
pred_box = np.expand_dims(per_image_pred_boxes[index], axis=0)
iou = compute_ious(per_image_gt_boxes, pred_box)
gt_for_box = np.argmax(iou, axis=0)
max_overlap = iou[gt_for_box, 0]
if max_overlap >= iou_thread and gt_for_box not in assigned_gt:
fp = np.append(fp, 0)
tp = np.append(tp, 1)
assigned_gt.append(gt_for_box)
else:
fp = np.append(fp, 1)
tp = np.append(tp, 0)
# sort by score
indices = np.argsort(-scores)
fp = fp[indices]
tp = tp[indices]
# compute cumulative false positives and true positives
fp = np.cumsum(fp)
tp = np.cumsum(tp)
# compute recall and precision
recall = tp / total_gts
precision = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = compute_voc_ap(recall, precision)
all_ap[class_index] = ap
mAP = 0.
for _, class_mAP in all_ap.items():
mAP += float(class_mAP)
mAP /= num_classes
return all_ap, mAP
compute_に注意してくださいvoc_ap関数のuse_07_metric=TrueはVOC 2007を用いた11 point metric方式でmAPを計算することを示し、use_07_metric=FalseはVOC 2010を使用した後の新しいmAP計算方式を表す.
完全なトレーニングとテストコード
我々は訓練中に12個のepochを訓練し,5個のepochごとにモデル性能表現を試験し,訓練完了時にもモデル性能表現を試験した.完全なトレーニングとテストコードは以下のように実現される(ここではCOCOデータセット上のトレーニングとテストコードであり、VOCデータセット上でトレーニングとテストを行うには少し修正すればよい).
config.pyファイル:import os
import sys
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
from public.path import COCO2017_path
from public.detection.dataset.cocodataset import CocoDetection, Resize, RandomFlip, RandomCrop, RandomTranslate
import torchvision.transforms as transforms
import torchvision.datasets as datasets
class Config(object):
log = './log' # Path to save log
checkpoint_path = './checkpoints' # Path to store checkpoint model
resume = './checkpoints/latest.pth' # load checkpoint model
evaluate = None # evaluate model path
train_dataset_path = os.path.join(COCO2017_path, 'images/train2017')
val_dataset_path = os.path.join(COCO2017_path, 'images/val2017')
dataset_annotations_path = os.path.join(COCO2017_path, 'annotations')
network = "resnet50_retinanet"
pretrained = False
num_classes = 80
seed = 0
input_image_size = 600
train_dataset = CocoDetection(image_root_dir=train_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="train2017",
transform=transforms.Compose([
RandomFlip(flip_prob=0.5),
RandomCrop(crop_prob=0.5),
RandomTranslate(translate_prob=0.5),
Resize(resize=input_image_size),
]))
val_dataset = CocoDetection(image_root_dir=val_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="val2017",
transform=transforms.Compose([
Resize(resize=input_image_size),
]))
epochs = 12
batch_size = 15
lr = 1e-4
num_workers = 4
print_interval = 100
apex = True
train.pyファイル:import sys
import os
import argparse
import random
import shutil
import time
import warnings
import json
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
warnings.filterwarnings('ignore')
import numpy as np
from thop import profile
from thop import clever_format
from apex import amp
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from torchvision import transforms
from config import Config
from public.detection.dataset.cocodataset import COCODataPrefetcher, collater
from public.detection.models.loss import RetinaLoss
from public.detection.models.decode import RetinaDecoder
from public.detection.models.retinanet import resnet50_retinanet
from public.imagenet.utils import get_logger
from pycocotools.cocoeval import COCOeval
def parse_args():
parser = argparse.ArgumentParser(
description='PyTorch COCO Detection Training')
parser.add_argument('--network',
type=str,
default=Config.network,
help='name of network')
parser.add_argument('--lr',
type=float,
default=Config.lr,
help='learning rate')
parser.add_argument('--epochs',
type=int,
default=Config.epochs,
help='num of training epochs')
parser.add_argument('--batch_size',
type=int,
default=Config.batch_size,
help='batch size')
parser.add_argument('--pretrained',
type=bool,
default=Config.pretrained,
help='load pretrained model params or not')
parser.add_argument('--num_classes',
type=int,
default=Config.num_classes,
help='model classification num')
parser.add_argument('--input_image_size',
type=int,
default=Config.input_image_size,
help='input image size')
parser.add_argument('--num_workers',
type=int,
default=Config.num_workers,
help='number of worker to load data')
parser.add_argument('--resume',
type=str,
default=Config.resume,
help='put the path to resuming file if needed')
parser.add_argument('--checkpoints',
type=str,
default=Config.checkpoint_path,
help='path for saving trained models')
parser.add_argument('--log',
type=str,
default=Config.log,
help='path to save log')
parser.add_argument('--evaluate',
type=str,
default=Config.evaluate,
help='path for evaluate model')
parser.add_argument('--seed', type=int, default=Config.seed, help='seed')
parser.add_argument('--print_interval',
type=bool,
default=Config.print_interval,
help='print interval')
parser.add_argument('--apex',
type=bool,
default=Config.apex,
help='use apex or not')
return parser.parse_args()
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_eval_result = evaluate_coco(val_dataset, model, decoder)
return all_eval_result
def evaluate_coco(val_dataset, model, decoder):
results, image_ids = [], []
for index in range(len(val_dataset)):
data = val_dataset[index]
scale = data['scale']
cls_heads, reg_heads, batch_anchors = model(data['img'].cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
scores, classes, boxes = decoder(cls_heads, reg_heads, batch_anchors)
scores, classes, boxes = scores.cpu(), classes.cpu(), boxes.cpu()
boxes /= scale
# make sure decode batch_size=1
# scores shape:[1,max_detection_num]
# classes shape:[1,max_detection_num]
# bboxes shape[1,max_detection_num,4]
assert scores.shape[0] == 1
scores = scores.squeeze(0)
classes = classes.squeeze(0)
boxes = boxes.squeeze(0)
# for coco_eval,we need [x_min,y_min,w,h] format pred boxes
boxes[:, 2:] -= boxes[:, :2]
for object_score, object_class, object_box in zip(
scores, classes, boxes):
object_score = float(object_score)
object_class = int(object_class)
object_box = object_box.tolist()
if object_class == -1:
break
image_result = {
'image_id':
val_dataset.image_ids[index],
'category_id':
val_dataset.find_category_id_from_coco_label(object_class),
'score':
object_score,
'bbox':
object_box,
}
results.append(image_result)
image_ids.append(val_dataset.image_ids[index])
print('{}/{}'.format(index, len(val_dataset)), end='\r')
if not len(results):
print("No target detected in test set images")
return
json.dump(results,
open('{}_bbox_results.json'.format(val_dataset.set_name), 'w'),
indent=4)
# load results in COCO evaluation tool
coco_true = val_dataset.coco
coco_pred = coco_true.loadRes('{}_bbox_results.json'.format(
val_dataset.set_name))
coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
all_eval_result = coco_eval.stats
return all_eval_result
def train(train_loader, model, criterion, optimizer, scheduler, epoch, logger,
args):
cls_losses, reg_losses, losses = [], [], []
# switch to train mode
model.train()
iters = len(train_loader.dataset) // args.batch_size
prefetcher = COCODataPrefetcher(train_loader)
images, annotations = prefetcher.next()
iter_index = 1
while images is not None:
images, annotations = images.cuda().float(), annotations.cuda()
cls_heads, reg_heads, batch_anchors = model(images)
cls_loss, reg_loss = criterion(cls_heads, reg_heads, batch_anchors,
annotations)
loss = cls_loss + reg_loss
if cls_loss == 0.0 or reg_loss == 0.0:
optimizer.zero_grad()
continue
if args.apex:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
optimizer.step()
optimizer.zero_grad()
cls_losses.append(cls_loss.item())
reg_losses.append(reg_loss.item())
losses.append(loss.item())
images, annotations = prefetcher.next()
if iter_index % args.print_interval == 0:
logger.info(
f"train: epoch {epoch:0>3d}, iter [{iter_index:0>5d}, {iters:0>5d}], cls_loss: {cls_loss.item():.2f}, reg_loss: {reg_loss.item():.2f}, loss_total: {loss.item():.2f}"
)
iter_index += 1
scheduler.step(np.mean(losses))
return np.mean(cls_losses), np.mean(reg_losses), np.mean(losses)
def main(logger, args):
if not torch.cuda.is_available():
raise Exception("need gpu to train network!")
torch.cuda.empty_cache()
if args.seed is not None:
random.seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
cudnn.deterministic = True
gpus = torch.cuda.device_count()
logger.info(f'use {gpus} gpus')
logger.info(f"args: {args}")
cudnn.benchmark = True
cudnn.enabled = True
start_time = time.time()
# dataset and dataloader
logger.info('start loading data')
train_loader = DataLoader(Config.train_dataset,
batch_size=args.batch_size,
shuffle=True,
num_workers=args.num_workers,
collate_fn=collater)
logger.info('finish loading data')
model = resnet50_retinanet(**{
"pretrained": args.pretrained,
"num_classes": args.num_classes,
})
for name, param in model.named_parameters():
logger.info(f"{name},{param.requires_grad}")
flops_input = torch.randn(1, 3, args.input_image_size,
args.input_image_size)
flops, params = profile(model, inputs=(flops_input, ))
flops, params = clever_format([flops, params], "%.3f")
logger.info(f"model: '{args.network}', flops: {flops}, params: {params}")
criterion = RetinaLoss(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
decoder = RetinaDecoder(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
model = model.cuda()
optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
patience=3,
verbose=True)
if args.apex:
amp.register_float_function(torch, 'sigmoid')
amp.register_float_function(torch, 'softmax')
model, optimizer = amp.initialize(model, optimizer, opt_level='O1')
model = nn.DataParallel(model)
if args.evaluate:
if not os.path.isfile(args.evaluate):
raise Exception(
f"{args.resume} is not a file, please check it again")
logger.info('start only evaluating')
logger.info(f"start resuming model from {args.evaluate}")
checkpoint = torch.load(args.evaluate,
map_location=torch.device('cpu'))
model.load_state_dict(checkpoint['model_state_dict'])
all_eval_result = validate(Config.val_dataset, model, decoder)
if all_eval_result is not None:
logger.info(
f"val: epoch: {checkpoint['epoch']:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
return
best_map = 0.0
start_epoch = 1
# resume training
if os.path.exists(args.resume):
logger.info(f"start resuming model from {args.resume}")
checkpoint = torch.load(args.resume, map_location=torch.device('cpu'))
start_epoch += checkpoint['epoch']
best_map = checkpoint['best_map']
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
logger.info(
f"finish resuming model from {args.resume}, epoch {checkpoint['epoch']}, best_map: {checkpoint['best_map']}, "
f"loss: {checkpoint['loss']:3f}, cls_loss: {checkpoint['cls_loss']:2f}, reg_loss: {checkpoint['reg_loss']:2f}"
)
if not os.path.exists(args.checkpoints):
os.makedirs(args.checkpoints)
logger.info('start training')
for epoch in range(start_epoch, args.epochs + 1):
cls_losses, reg_losses, losses = train(train_loader, model, criterion,
optimizer, scheduler, epoch,
logger, args)
logger.info(
f"train: epoch {epoch:0>3d}, cls_loss: {cls_losses:.2f}, reg_loss: {reg_losses:.2f}, loss: {losses:.2f}"
)
if epoch % 5 == 0 or epoch == args.epochs:
all_eval_result = validate(Config.val_dataset, model, decoder)
logger.info(f"eval done.")
if all_eval_result is not None:
logger.info(
f"val: epoch: {epoch:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
if all_eval_result[0] > best_map:
torch.save(model.module.state_dict(),
os.path.join(args.checkpoints, "best.pth"))
best_map = all_eval_result[0]
torch.save(
{
'epoch': epoch,
'best_map': best_map,
'cls_loss': cls_losses,
'reg_loss': reg_losses,
'loss': losses,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
}, os.path.join(args.checkpoints, 'latest.pth'))
logger.info(f"finish training, best_map: {best_map:.3f}")
training_time = (time.time() - start_time) / 3600
logger.info(
f"finish training, total training time: {training_time:.2f} hours")
if __name__ == '__main__':
args = parse_args()
logger = get_logger(__name__, args.log)
main(logger, args)
上で実現したのはnnです.DataParallelモードでのトレーニング、config.pyファイルとtrain.pyファイルの各スーパーパラメータは、以下のモデル評価におけるResNet 50-RetinNet-apex-aug項目のスーパーパラメータ設定に対応します.分布式訓練方法は次の文章で実現します.訓練を行うにはpython trainだけです.pyでいいです.
モデル再現状況評価
6つの文章の各方面のRetinaNetに対する再現方法によると、現在、論文のRetinaNetモデルの点数と3つの問題がある.
COCOでRetinaNetの性能をテストするにはpycocotoolsを直接使用することができます.cocoevalのCOevalクラスが提供するAPI.RetinaNetクラスの順方向計算結果(anchorを含む)をRetinaDecoderクラスに送り込んで復号し、復号後のbboxをscaleに従って元の画像上のサイズに拡大すればよい(復号後のbboxサイズサイズはresize後の画像より大きいため).次に、各画像で検出されたターゲットの無効なターゲット(class_indexは-1)をフィルタリングし、josnファイルに一定のフォーマットで書き込み、COCOevalを呼び出して計算すればよい.
COCOevalクラスは12個の性能指標を提供する:
self.maxDets = [1, 10, 100] # decoder max_detection_num
stats[0] = _summarize(1)
stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
各結果の意味は次のとおりです.
# , COCO stats[0], coco2017_val coco2017_test , 0.2~0.5
stats[0] : IoU=0.5:0.95,area=all,maxDets=100,mAP
stats[1] : IoU=0.5,area=all,maxDets=100,mAP
stats[2] : IoU=0.75,area=all,maxDets=100,mAP
stats[3] : IoU=0.5:0.95,area=small,maxDets=100,mAP
stats[4] : IoU=0.5:0.95,area=medium,maxDets=100,mAP
stats[5] : IoU=0.5:0.95,area=large,maxDets=100,mAP
stats[6] : IoU=0.5:0.95,area=all,maxDets=1,mAR
stats[7] : IoU=0.5:0.95,area=all,maxDets=10,mAR
stats[8] : IoU=0.5:0.95,area=all,maxDets=100,mAR
stats[9] : IoU=0.5:0.95,area=small,maxDets=100,mAR
stats[10]:IoU=0.5:0.95,area=medium,maxDets=100,mAR
stats[11]:IoU=0.5:0.95,area=large,maxDets=100,mAR
COCOデータセットでテストされたコードは以下のように実現される.
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_eval_result = evaluate_coco(val_dataset, model, decoder)
return all_eval_result
def evaluate_coco(val_dataset, model, decoder):
results, image_ids = [], []
for index in range(len(val_dataset)):
data = val_dataset[index]
scale = data['scale']
cls_heads, reg_heads, batch_anchors = model(data['img'].cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
scores, classes, boxes = decoder(cls_heads, reg_heads, batch_anchors)
scores, classes, boxes = scores.cpu(), classes.cpu(), boxes.cpu()
boxes /= scale
# make sure decode batch_size=1
# scores shape:[1,max_detection_num]
# classes shape:[1,max_detection_num]
# bboxes shape[1,max_detection_num,4]
assert scores.shape[0] == 1
scores = scores.squeeze(0)
classes = classes.squeeze(0)
boxes = boxes.squeeze(0)
# for coco_eval,we need [x_min,y_min,w,h] format pred boxes
boxes[:, 2:] -= boxes[:, :2]
for object_score, object_class, object_box in zip(
scores, classes, boxes):
object_score = float(object_score)
object_class = int(object_class)
object_box = object_box.tolist()
if object_class == -1:
break
image_result = {
'image_id':
val_dataset.image_ids[index],
'category_id':
val_dataset.find_category_id_from_coco_label(object_class),
'score':
object_score,
'bbox':
object_box,
}
results.append(image_result)
image_ids.append(val_dataset.image_ids[index])
print('{}/{}'.format(index, len(val_dataset)), end='\r')
if not len(results):
print("No target detected in test set images")
return
json.dump(results,
open('{}_bbox_results.json'.format(val_dataset.set_name), 'w'),
indent=4)
# load results in COCO evaluation tool
coco_true = val_dataset.coco
coco_pred = coco_true.loadRes('{}_bbox_results.json'.format(
val_dataset.set_name))
coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
all_eval_result = coco_eval.stats
return all_eval_result
COCOデータセットでのトレーニングとテストでは、RetinaNet論文のデータセット設定に従い、coco_を使用します.2017_trainデータセットトレーニングモデル、coco_を使用2017_valデータセットテストモデル.IoU=0.5:0.95の場合、最大100個のdetectターゲットを保持し、すべてのサイズのターゲットのmAP(pycocools.coevalのCOCOevalクラスの_summarizeDets関数のstats[0]値)をモデルのパフォーマンス表現として保持します.
VOCデータセットでRetinaNetをテストする
VOCデータセットでトレーニングとテストを行う場合、detectron 2でfaster rcnnを使用してVOCデータセットでテストをトレーニングする方法(https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)を使用して、VOC 2007 trainval+VOC 2012 trainvalデータセットトレーニングモデルを使用し、VOC 2007 testデータセットテストモデルを使用します.試験時にVOC 2007の11 point metric方式を用いてmAPを計算する.
テストコードは古典的なVOCテストコードを使用して、入力と出力を適切にしただけです.def compute_voc_ap(recall, precision, use_07_metric=True):
if use_07_metric:
# use voc 2007 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(recall >= t) == 0:
p = 0
else:
# get max precision for recall >= t
p = np.max(precision[recall >= t])
# average 11 recall point precision
ap = ap + p / 11.
else:
# use voc>=2010 metric,average all different recall precision as ap
# recall add first value 0. and last value 1.
mrecall = np.concatenate(([0.], recall, [1.]))
# precision add first value 0. and last value 0.
mprecision = np.concatenate(([0.], precision, [0.]))
# compute the precision envelope
for i in range(mprecision.size - 1, 0, -1):
mprecision[i - 1] = np.maximum(mprecision[i - 1], mprecision[i])
# to calculate area under PR curve, look for points where X axis (recall) changes value
i = np.where(mrecall[1:] != mrecall[:-1])[0]
# sum (\Delta recall) * prec
ap = np.sum((mrecall[i + 1] - mrecall[i]) * mprecision[i + 1])
return ap
def compute_ious(a, b):
"""
:param a: [N,(x1,y1,x2,y2)]
:param b: [M,(x1,y1,x2,y2)]
:return: IoU [N,M]
"""
a = np.expand_dims(a, axis=1) # [N,1,4]
b = np.expand_dims(b, axis=0) # [1,M,4]
overlap = np.maximum(0.0,
np.minimum(a[..., 2:], b[..., 2:]) -
np.maximum(a[..., :2], b[..., :2])) # [N,M,(w,h)]
overlap = np.prod(overlap, axis=-1) # [N,M]
area_a = np.prod(a[..., 2:] - a[..., :2], axis=-1)
area_b = np.prod(b[..., 2:] - b[..., :2], axis=-1)
iou = overlap / (area_a + area_b - overlap)
return iou
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_ap, mAP = evaluate_voc(val_dataset,
model,
decoder,
num_classes=20,
iou_thread=0.5)
return all_ap, mAP
def evaluate_voc(val_dataset, model, decoder, num_classes=20, iou_thread=0.5):
preds, gts = [], []
for index in tqdm(range(len(val_dataset))):
data = val_dataset[index]
img, gt_annot, scale = data['img'], data['annot'], data['scale']
gt_bboxes, gt_classes = gt_annot[:, 0:4], gt_annot[:, 4]
gt_bboxes /= scale
gts.append([gt_bboxes, gt_classes])
cls_heads, reg_heads, batch_anchors = model(img.cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
preds_scores, preds_classes, preds_boxes = decoder(
cls_heads, reg_heads, batch_anchors)
preds_scores, preds_classes, preds_boxes = preds_scores.cpu(
), preds_classes.cpu(), preds_boxes.cpu()
preds_boxes /= scale
# make sure decode batch_size=1
# preds_scores shape:[1,max_detection_num]
# preds_classes shape:[1,max_detection_num]
# preds_bboxes shape[1,max_detection_num,4]
assert preds_scores.shape[0] == 1
preds_scores = preds_scores.squeeze(0)
preds_classes = preds_classes.squeeze(0)
preds_boxes = preds_boxes.squeeze(0)
preds_scores = preds_scores[preds_classes > -1]
preds_boxes = preds_boxes[preds_classes > -1]
preds_classes = preds_classes[preds_classes > -1]
preds.append([preds_boxes, preds_classes, preds_scores])
print("all val sample decode done.")
all_ap = {}
for class_index in tqdm(range(num_classes)):
per_class_gt_boxes = [
image[0][image[1] == class_index] for image in gts
]
per_class_pred_boxes = [
image[0][image[1] == class_index] for image in preds
]
per_class_pred_scores = [
image[2][image[1] == class_index] for image in preds
]
fp = np.zeros((0, ))
tp = np.zeros((0, ))
scores = np.zeros((0, ))
total_gts = 0
# loop for each sample
for per_image_gt_boxes, per_image_pred_boxes, per_image_pred_scores in zip(
per_class_gt_boxes, per_class_pred_boxes,
per_class_pred_scores):
total_gts = total_gts + len(per_image_gt_boxes)
# one gt can only be assigned to one predicted bbox
assigned_gt = []
# loop for each predicted bbox
for index in range(len(per_image_pred_boxes)):
scores = np.append(scores, per_image_pred_scores[index])
if per_image_gt_boxes.shape[0] == 0:
# if no gts found for the predicted bbox, assign the bbox to fp
fp = np.append(fp, 1)
tp = np.append(tp, 0)
continue
pred_box = np.expand_dims(per_image_pred_boxes[index], axis=0)
iou = compute_ious(per_image_gt_boxes, pred_box)
gt_for_box = np.argmax(iou, axis=0)
max_overlap = iou[gt_for_box, 0]
if max_overlap >= iou_thread and gt_for_box not in assigned_gt:
fp = np.append(fp, 0)
tp = np.append(tp, 1)
assigned_gt.append(gt_for_box)
else:
fp = np.append(fp, 1)
tp = np.append(tp, 0)
# sort by score
indices = np.argsort(-scores)
fp = fp[indices]
tp = tp[indices]
# compute cumulative false positives and true positives
fp = np.cumsum(fp)
tp = np.cumsum(tp)
# compute recall and precision
recall = tp / total_gts
precision = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = compute_voc_ap(recall, precision)
all_ap[class_index] = ap
mAP = 0.
for _, class_mAP in all_ap.items():
mAP += float(class_mAP)
mAP /= num_classes
return all_ap, mAP
compute_に注意してくださいvoc_ap関数のuse_07_metric=TrueはVOC 2007を用いた11 point metric方式でmAPを計算することを示し、use_07_metric=FalseはVOC 2010を使用した後の新しいmAP計算方式を表す.
完全なトレーニングとテストコード
我々は訓練中に12個のepochを訓練し,5個のepochごとにモデル性能表現を試験し,訓練完了時にもモデル性能表現を試験した.完全なトレーニングとテストコードは以下のように実現される(ここではCOCOデータセット上のトレーニングとテストコードであり、VOCデータセット上でトレーニングとテストを行うには少し修正すればよい).
config.pyファイル:import os
import sys
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
from public.path import COCO2017_path
from public.detection.dataset.cocodataset import CocoDetection, Resize, RandomFlip, RandomCrop, RandomTranslate
import torchvision.transforms as transforms
import torchvision.datasets as datasets
class Config(object):
log = './log' # Path to save log
checkpoint_path = './checkpoints' # Path to store checkpoint model
resume = './checkpoints/latest.pth' # load checkpoint model
evaluate = None # evaluate model path
train_dataset_path = os.path.join(COCO2017_path, 'images/train2017')
val_dataset_path = os.path.join(COCO2017_path, 'images/val2017')
dataset_annotations_path = os.path.join(COCO2017_path, 'annotations')
network = "resnet50_retinanet"
pretrained = False
num_classes = 80
seed = 0
input_image_size = 600
train_dataset = CocoDetection(image_root_dir=train_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="train2017",
transform=transforms.Compose([
RandomFlip(flip_prob=0.5),
RandomCrop(crop_prob=0.5),
RandomTranslate(translate_prob=0.5),
Resize(resize=input_image_size),
]))
val_dataset = CocoDetection(image_root_dir=val_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="val2017",
transform=transforms.Compose([
Resize(resize=input_image_size),
]))
epochs = 12
batch_size = 15
lr = 1e-4
num_workers = 4
print_interval = 100
apex = True
train.pyファイル:import sys
import os
import argparse
import random
import shutil
import time
import warnings
import json
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
warnings.filterwarnings('ignore')
import numpy as np
from thop import profile
from thop import clever_format
from apex import amp
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from torchvision import transforms
from config import Config
from public.detection.dataset.cocodataset import COCODataPrefetcher, collater
from public.detection.models.loss import RetinaLoss
from public.detection.models.decode import RetinaDecoder
from public.detection.models.retinanet import resnet50_retinanet
from public.imagenet.utils import get_logger
from pycocotools.cocoeval import COCOeval
def parse_args():
parser = argparse.ArgumentParser(
description='PyTorch COCO Detection Training')
parser.add_argument('--network',
type=str,
default=Config.network,
help='name of network')
parser.add_argument('--lr',
type=float,
default=Config.lr,
help='learning rate')
parser.add_argument('--epochs',
type=int,
default=Config.epochs,
help='num of training epochs')
parser.add_argument('--batch_size',
type=int,
default=Config.batch_size,
help='batch size')
parser.add_argument('--pretrained',
type=bool,
default=Config.pretrained,
help='load pretrained model params or not')
parser.add_argument('--num_classes',
type=int,
default=Config.num_classes,
help='model classification num')
parser.add_argument('--input_image_size',
type=int,
default=Config.input_image_size,
help='input image size')
parser.add_argument('--num_workers',
type=int,
default=Config.num_workers,
help='number of worker to load data')
parser.add_argument('--resume',
type=str,
default=Config.resume,
help='put the path to resuming file if needed')
parser.add_argument('--checkpoints',
type=str,
default=Config.checkpoint_path,
help='path for saving trained models')
parser.add_argument('--log',
type=str,
default=Config.log,
help='path to save log')
parser.add_argument('--evaluate',
type=str,
default=Config.evaluate,
help='path for evaluate model')
parser.add_argument('--seed', type=int, default=Config.seed, help='seed')
parser.add_argument('--print_interval',
type=bool,
default=Config.print_interval,
help='print interval')
parser.add_argument('--apex',
type=bool,
default=Config.apex,
help='use apex or not')
return parser.parse_args()
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_eval_result = evaluate_coco(val_dataset, model, decoder)
return all_eval_result
def evaluate_coco(val_dataset, model, decoder):
results, image_ids = [], []
for index in range(len(val_dataset)):
data = val_dataset[index]
scale = data['scale']
cls_heads, reg_heads, batch_anchors = model(data['img'].cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
scores, classes, boxes = decoder(cls_heads, reg_heads, batch_anchors)
scores, classes, boxes = scores.cpu(), classes.cpu(), boxes.cpu()
boxes /= scale
# make sure decode batch_size=1
# scores shape:[1,max_detection_num]
# classes shape:[1,max_detection_num]
# bboxes shape[1,max_detection_num,4]
assert scores.shape[0] == 1
scores = scores.squeeze(0)
classes = classes.squeeze(0)
boxes = boxes.squeeze(0)
# for coco_eval,we need [x_min,y_min,w,h] format pred boxes
boxes[:, 2:] -= boxes[:, :2]
for object_score, object_class, object_box in zip(
scores, classes, boxes):
object_score = float(object_score)
object_class = int(object_class)
object_box = object_box.tolist()
if object_class == -1:
break
image_result = {
'image_id':
val_dataset.image_ids[index],
'category_id':
val_dataset.find_category_id_from_coco_label(object_class),
'score':
object_score,
'bbox':
object_box,
}
results.append(image_result)
image_ids.append(val_dataset.image_ids[index])
print('{}/{}'.format(index, len(val_dataset)), end='\r')
if not len(results):
print("No target detected in test set images")
return
json.dump(results,
open('{}_bbox_results.json'.format(val_dataset.set_name), 'w'),
indent=4)
# load results in COCO evaluation tool
coco_true = val_dataset.coco
coco_pred = coco_true.loadRes('{}_bbox_results.json'.format(
val_dataset.set_name))
coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
all_eval_result = coco_eval.stats
return all_eval_result
def train(train_loader, model, criterion, optimizer, scheduler, epoch, logger,
args):
cls_losses, reg_losses, losses = [], [], []
# switch to train mode
model.train()
iters = len(train_loader.dataset) // args.batch_size
prefetcher = COCODataPrefetcher(train_loader)
images, annotations = prefetcher.next()
iter_index = 1
while images is not None:
images, annotations = images.cuda().float(), annotations.cuda()
cls_heads, reg_heads, batch_anchors = model(images)
cls_loss, reg_loss = criterion(cls_heads, reg_heads, batch_anchors,
annotations)
loss = cls_loss + reg_loss
if cls_loss == 0.0 or reg_loss == 0.0:
optimizer.zero_grad()
continue
if args.apex:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
optimizer.step()
optimizer.zero_grad()
cls_losses.append(cls_loss.item())
reg_losses.append(reg_loss.item())
losses.append(loss.item())
images, annotations = prefetcher.next()
if iter_index % args.print_interval == 0:
logger.info(
f"train: epoch {epoch:0>3d}, iter [{iter_index:0>5d}, {iters:0>5d}], cls_loss: {cls_loss.item():.2f}, reg_loss: {reg_loss.item():.2f}, loss_total: {loss.item():.2f}"
)
iter_index += 1
scheduler.step(np.mean(losses))
return np.mean(cls_losses), np.mean(reg_losses), np.mean(losses)
def main(logger, args):
if not torch.cuda.is_available():
raise Exception("need gpu to train network!")
torch.cuda.empty_cache()
if args.seed is not None:
random.seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
cudnn.deterministic = True
gpus = torch.cuda.device_count()
logger.info(f'use {gpus} gpus')
logger.info(f"args: {args}")
cudnn.benchmark = True
cudnn.enabled = True
start_time = time.time()
# dataset and dataloader
logger.info('start loading data')
train_loader = DataLoader(Config.train_dataset,
batch_size=args.batch_size,
shuffle=True,
num_workers=args.num_workers,
collate_fn=collater)
logger.info('finish loading data')
model = resnet50_retinanet(**{
"pretrained": args.pretrained,
"num_classes": args.num_classes,
})
for name, param in model.named_parameters():
logger.info(f"{name},{param.requires_grad}")
flops_input = torch.randn(1, 3, args.input_image_size,
args.input_image_size)
flops, params = profile(model, inputs=(flops_input, ))
flops, params = clever_format([flops, params], "%.3f")
logger.info(f"model: '{args.network}', flops: {flops}, params: {params}")
criterion = RetinaLoss(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
decoder = RetinaDecoder(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
model = model.cuda()
optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
patience=3,
verbose=True)
if args.apex:
amp.register_float_function(torch, 'sigmoid')
amp.register_float_function(torch, 'softmax')
model, optimizer = amp.initialize(model, optimizer, opt_level='O1')
model = nn.DataParallel(model)
if args.evaluate:
if not os.path.isfile(args.evaluate):
raise Exception(
f"{args.resume} is not a file, please check it again")
logger.info('start only evaluating')
logger.info(f"start resuming model from {args.evaluate}")
checkpoint = torch.load(args.evaluate,
map_location=torch.device('cpu'))
model.load_state_dict(checkpoint['model_state_dict'])
all_eval_result = validate(Config.val_dataset, model, decoder)
if all_eval_result is not None:
logger.info(
f"val: epoch: {checkpoint['epoch']:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
return
best_map = 0.0
start_epoch = 1
# resume training
if os.path.exists(args.resume):
logger.info(f"start resuming model from {args.resume}")
checkpoint = torch.load(args.resume, map_location=torch.device('cpu'))
start_epoch += checkpoint['epoch']
best_map = checkpoint['best_map']
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
logger.info(
f"finish resuming model from {args.resume}, epoch {checkpoint['epoch']}, best_map: {checkpoint['best_map']}, "
f"loss: {checkpoint['loss']:3f}, cls_loss: {checkpoint['cls_loss']:2f}, reg_loss: {checkpoint['reg_loss']:2f}"
)
if not os.path.exists(args.checkpoints):
os.makedirs(args.checkpoints)
logger.info('start training')
for epoch in range(start_epoch, args.epochs + 1):
cls_losses, reg_losses, losses = train(train_loader, model, criterion,
optimizer, scheduler, epoch,
logger, args)
logger.info(
f"train: epoch {epoch:0>3d}, cls_loss: {cls_losses:.2f}, reg_loss: {reg_losses:.2f}, loss: {losses:.2f}"
)
if epoch % 5 == 0 or epoch == args.epochs:
all_eval_result = validate(Config.val_dataset, model, decoder)
logger.info(f"eval done.")
if all_eval_result is not None:
logger.info(
f"val: epoch: {epoch:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
if all_eval_result[0] > best_map:
torch.save(model.module.state_dict(),
os.path.join(args.checkpoints, "best.pth"))
best_map = all_eval_result[0]
torch.save(
{
'epoch': epoch,
'best_map': best_map,
'cls_loss': cls_losses,
'reg_loss': reg_losses,
'loss': losses,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
}, os.path.join(args.checkpoints, 'latest.pth'))
logger.info(f"finish training, best_map: {best_map:.3f}")
training_time = (time.time() - start_time) / 3600
logger.info(
f"finish training, total training time: {training_time:.2f} hours")
if __name__ == '__main__':
args = parse_args()
logger = get_logger(__name__, args.log)
main(logger, args)
上で実現したのはnnです.DataParallelモードでのトレーニング、config.pyファイルとtrain.pyファイルの各スーパーパラメータは、以下のモデル評価におけるResNet 50-RetinNet-apex-aug項目のスーパーパラメータ設定に対応します.分布式訓練方法は次の文章で実現します.訓練を行うにはpython trainだけです.pyでいいです.
モデル再現状況評価
6つの文章の各方面のRetinaNetに対する再現方法によると、現在、論文のRetinaNetモデルの点数と3つの問題がある.
def compute_voc_ap(recall, precision, use_07_metric=True):
if use_07_metric:
# use voc 2007 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(recall >= t) == 0:
p = 0
else:
# get max precision for recall >= t
p = np.max(precision[recall >= t])
# average 11 recall point precision
ap = ap + p / 11.
else:
# use voc>=2010 metric,average all different recall precision as ap
# recall add first value 0. and last value 1.
mrecall = np.concatenate(([0.], recall, [1.]))
# precision add first value 0. and last value 0.
mprecision = np.concatenate(([0.], precision, [0.]))
# compute the precision envelope
for i in range(mprecision.size - 1, 0, -1):
mprecision[i - 1] = np.maximum(mprecision[i - 1], mprecision[i])
# to calculate area under PR curve, look for points where X axis (recall) changes value
i = np.where(mrecall[1:] != mrecall[:-1])[0]
# sum (\Delta recall) * prec
ap = np.sum((mrecall[i + 1] - mrecall[i]) * mprecision[i + 1])
return ap
def compute_ious(a, b):
"""
:param a: [N,(x1,y1,x2,y2)]
:param b: [M,(x1,y1,x2,y2)]
:return: IoU [N,M]
"""
a = np.expand_dims(a, axis=1) # [N,1,4]
b = np.expand_dims(b, axis=0) # [1,M,4]
overlap = np.maximum(0.0,
np.minimum(a[..., 2:], b[..., 2:]) -
np.maximum(a[..., :2], b[..., :2])) # [N,M,(w,h)]
overlap = np.prod(overlap, axis=-1) # [N,M]
area_a = np.prod(a[..., 2:] - a[..., :2], axis=-1)
area_b = np.prod(b[..., 2:] - b[..., :2], axis=-1)
iou = overlap / (area_a + area_b - overlap)
return iou
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_ap, mAP = evaluate_voc(val_dataset,
model,
decoder,
num_classes=20,
iou_thread=0.5)
return all_ap, mAP
def evaluate_voc(val_dataset, model, decoder, num_classes=20, iou_thread=0.5):
preds, gts = [], []
for index in tqdm(range(len(val_dataset))):
data = val_dataset[index]
img, gt_annot, scale = data['img'], data['annot'], data['scale']
gt_bboxes, gt_classes = gt_annot[:, 0:4], gt_annot[:, 4]
gt_bboxes /= scale
gts.append([gt_bboxes, gt_classes])
cls_heads, reg_heads, batch_anchors = model(img.cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
preds_scores, preds_classes, preds_boxes = decoder(
cls_heads, reg_heads, batch_anchors)
preds_scores, preds_classes, preds_boxes = preds_scores.cpu(
), preds_classes.cpu(), preds_boxes.cpu()
preds_boxes /= scale
# make sure decode batch_size=1
# preds_scores shape:[1,max_detection_num]
# preds_classes shape:[1,max_detection_num]
# preds_bboxes shape[1,max_detection_num,4]
assert preds_scores.shape[0] == 1
preds_scores = preds_scores.squeeze(0)
preds_classes = preds_classes.squeeze(0)
preds_boxes = preds_boxes.squeeze(0)
preds_scores = preds_scores[preds_classes > -1]
preds_boxes = preds_boxes[preds_classes > -1]
preds_classes = preds_classes[preds_classes > -1]
preds.append([preds_boxes, preds_classes, preds_scores])
print("all val sample decode done.")
all_ap = {}
for class_index in tqdm(range(num_classes)):
per_class_gt_boxes = [
image[0][image[1] == class_index] for image in gts
]
per_class_pred_boxes = [
image[0][image[1] == class_index] for image in preds
]
per_class_pred_scores = [
image[2][image[1] == class_index] for image in preds
]
fp = np.zeros((0, ))
tp = np.zeros((0, ))
scores = np.zeros((0, ))
total_gts = 0
# loop for each sample
for per_image_gt_boxes, per_image_pred_boxes, per_image_pred_scores in zip(
per_class_gt_boxes, per_class_pred_boxes,
per_class_pred_scores):
total_gts = total_gts + len(per_image_gt_boxes)
# one gt can only be assigned to one predicted bbox
assigned_gt = []
# loop for each predicted bbox
for index in range(len(per_image_pred_boxes)):
scores = np.append(scores, per_image_pred_scores[index])
if per_image_gt_boxes.shape[0] == 0:
# if no gts found for the predicted bbox, assign the bbox to fp
fp = np.append(fp, 1)
tp = np.append(tp, 0)
continue
pred_box = np.expand_dims(per_image_pred_boxes[index], axis=0)
iou = compute_ious(per_image_gt_boxes, pred_box)
gt_for_box = np.argmax(iou, axis=0)
max_overlap = iou[gt_for_box, 0]
if max_overlap >= iou_thread and gt_for_box not in assigned_gt:
fp = np.append(fp, 0)
tp = np.append(tp, 1)
assigned_gt.append(gt_for_box)
else:
fp = np.append(fp, 1)
tp = np.append(tp, 0)
# sort by score
indices = np.argsort(-scores)
fp = fp[indices]
tp = tp[indices]
# compute cumulative false positives and true positives
fp = np.cumsum(fp)
tp = np.cumsum(tp)
# compute recall and precision
recall = tp / total_gts
precision = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = compute_voc_ap(recall, precision)
all_ap[class_index] = ap
mAP = 0.
for _, class_mAP in all_ap.items():
mAP += float(class_mAP)
mAP /= num_classes
return all_ap, mAP
我々は訓練中に12個のepochを訓練し,5個のepochごとにモデル性能表現を試験し,訓練完了時にもモデル性能表現を試験した.完全なトレーニングとテストコードは以下のように実現される(ここではCOCOデータセット上のトレーニングとテストコードであり、VOCデータセット上でトレーニングとテストを行うには少し修正すればよい).
config.pyファイル:
import os
import sys
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
from public.path import COCO2017_path
from public.detection.dataset.cocodataset import CocoDetection, Resize, RandomFlip, RandomCrop, RandomTranslate
import torchvision.transforms as transforms
import torchvision.datasets as datasets
class Config(object):
log = './log' # Path to save log
checkpoint_path = './checkpoints' # Path to store checkpoint model
resume = './checkpoints/latest.pth' # load checkpoint model
evaluate = None # evaluate model path
train_dataset_path = os.path.join(COCO2017_path, 'images/train2017')
val_dataset_path = os.path.join(COCO2017_path, 'images/val2017')
dataset_annotations_path = os.path.join(COCO2017_path, 'annotations')
network = "resnet50_retinanet"
pretrained = False
num_classes = 80
seed = 0
input_image_size = 600
train_dataset = CocoDetection(image_root_dir=train_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="train2017",
transform=transforms.Compose([
RandomFlip(flip_prob=0.5),
RandomCrop(crop_prob=0.5),
RandomTranslate(translate_prob=0.5),
Resize(resize=input_image_size),
]))
val_dataset = CocoDetection(image_root_dir=val_dataset_path,
annotation_root_dir=dataset_annotations_path,
set="val2017",
transform=transforms.Compose([
Resize(resize=input_image_size),
]))
epochs = 12
batch_size = 15
lr = 1e-4
num_workers = 4
print_interval = 100
apex = True
train.pyファイル:
import sys
import os
import argparse
import random
import shutil
import time
import warnings
import json
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)
warnings.filterwarnings('ignore')
import numpy as np
from thop import profile
from thop import clever_format
from apex import amp
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from torchvision import transforms
from config import Config
from public.detection.dataset.cocodataset import COCODataPrefetcher, collater
from public.detection.models.loss import RetinaLoss
from public.detection.models.decode import RetinaDecoder
from public.detection.models.retinanet import resnet50_retinanet
from public.imagenet.utils import get_logger
from pycocotools.cocoeval import COCOeval
def parse_args():
parser = argparse.ArgumentParser(
description='PyTorch COCO Detection Training')
parser.add_argument('--network',
type=str,
default=Config.network,
help='name of network')
parser.add_argument('--lr',
type=float,
default=Config.lr,
help='learning rate')
parser.add_argument('--epochs',
type=int,
default=Config.epochs,
help='num of training epochs')
parser.add_argument('--batch_size',
type=int,
default=Config.batch_size,
help='batch size')
parser.add_argument('--pretrained',
type=bool,
default=Config.pretrained,
help='load pretrained model params or not')
parser.add_argument('--num_classes',
type=int,
default=Config.num_classes,
help='model classification num')
parser.add_argument('--input_image_size',
type=int,
default=Config.input_image_size,
help='input image size')
parser.add_argument('--num_workers',
type=int,
default=Config.num_workers,
help='number of worker to load data')
parser.add_argument('--resume',
type=str,
default=Config.resume,
help='put the path to resuming file if needed')
parser.add_argument('--checkpoints',
type=str,
default=Config.checkpoint_path,
help='path for saving trained models')
parser.add_argument('--log',
type=str,
default=Config.log,
help='path to save log')
parser.add_argument('--evaluate',
type=str,
default=Config.evaluate,
help='path for evaluate model')
parser.add_argument('--seed', type=int, default=Config.seed, help='seed')
parser.add_argument('--print_interval',
type=bool,
default=Config.print_interval,
help='print interval')
parser.add_argument('--apex',
type=bool,
default=Config.apex,
help='use apex or not')
return parser.parse_args()
def validate(val_dataset, model, decoder):
model = model.module
# switch to evaluate mode
model.eval()
with torch.no_grad():
all_eval_result = evaluate_coco(val_dataset, model, decoder)
return all_eval_result
def evaluate_coco(val_dataset, model, decoder):
results, image_ids = [], []
for index in range(len(val_dataset)):
data = val_dataset[index]
scale = data['scale']
cls_heads, reg_heads, batch_anchors = model(data['img'].cuda().permute(
2, 0, 1).float().unsqueeze(dim=0))
scores, classes, boxes = decoder(cls_heads, reg_heads, batch_anchors)
scores, classes, boxes = scores.cpu(), classes.cpu(), boxes.cpu()
boxes /= scale
# make sure decode batch_size=1
# scores shape:[1,max_detection_num]
# classes shape:[1,max_detection_num]
# bboxes shape[1,max_detection_num,4]
assert scores.shape[0] == 1
scores = scores.squeeze(0)
classes = classes.squeeze(0)
boxes = boxes.squeeze(0)
# for coco_eval,we need [x_min,y_min,w,h] format pred boxes
boxes[:, 2:] -= boxes[:, :2]
for object_score, object_class, object_box in zip(
scores, classes, boxes):
object_score = float(object_score)
object_class = int(object_class)
object_box = object_box.tolist()
if object_class == -1:
break
image_result = {
'image_id':
val_dataset.image_ids[index],
'category_id':
val_dataset.find_category_id_from_coco_label(object_class),
'score':
object_score,
'bbox':
object_box,
}
results.append(image_result)
image_ids.append(val_dataset.image_ids[index])
print('{}/{}'.format(index, len(val_dataset)), end='\r')
if not len(results):
print("No target detected in test set images")
return
json.dump(results,
open('{}_bbox_results.json'.format(val_dataset.set_name), 'w'),
indent=4)
# load results in COCO evaluation tool
coco_true = val_dataset.coco
coco_pred = coco_true.loadRes('{}_bbox_results.json'.format(
val_dataset.set_name))
coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
all_eval_result = coco_eval.stats
return all_eval_result
def train(train_loader, model, criterion, optimizer, scheduler, epoch, logger,
args):
cls_losses, reg_losses, losses = [], [], []
# switch to train mode
model.train()
iters = len(train_loader.dataset) // args.batch_size
prefetcher = COCODataPrefetcher(train_loader)
images, annotations = prefetcher.next()
iter_index = 1
while images is not None:
images, annotations = images.cuda().float(), annotations.cuda()
cls_heads, reg_heads, batch_anchors = model(images)
cls_loss, reg_loss = criterion(cls_heads, reg_heads, batch_anchors,
annotations)
loss = cls_loss + reg_loss
if cls_loss == 0.0 or reg_loss == 0.0:
optimizer.zero_grad()
continue
if args.apex:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
optimizer.step()
optimizer.zero_grad()
cls_losses.append(cls_loss.item())
reg_losses.append(reg_loss.item())
losses.append(loss.item())
images, annotations = prefetcher.next()
if iter_index % args.print_interval == 0:
logger.info(
f"train: epoch {epoch:0>3d}, iter [{iter_index:0>5d}, {iters:0>5d}], cls_loss: {cls_loss.item():.2f}, reg_loss: {reg_loss.item():.2f}, loss_total: {loss.item():.2f}"
)
iter_index += 1
scheduler.step(np.mean(losses))
return np.mean(cls_losses), np.mean(reg_losses), np.mean(losses)
def main(logger, args):
if not torch.cuda.is_available():
raise Exception("need gpu to train network!")
torch.cuda.empty_cache()
if args.seed is not None:
random.seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
cudnn.deterministic = True
gpus = torch.cuda.device_count()
logger.info(f'use {gpus} gpus')
logger.info(f"args: {args}")
cudnn.benchmark = True
cudnn.enabled = True
start_time = time.time()
# dataset and dataloader
logger.info('start loading data')
train_loader = DataLoader(Config.train_dataset,
batch_size=args.batch_size,
shuffle=True,
num_workers=args.num_workers,
collate_fn=collater)
logger.info('finish loading data')
model = resnet50_retinanet(**{
"pretrained": args.pretrained,
"num_classes": args.num_classes,
})
for name, param in model.named_parameters():
logger.info(f"{name},{param.requires_grad}")
flops_input = torch.randn(1, 3, args.input_image_size,
args.input_image_size)
flops, params = profile(model, inputs=(flops_input, ))
flops, params = clever_format([flops, params], "%.3f")
logger.info(f"model: '{args.network}', flops: {flops}, params: {params}")
criterion = RetinaLoss(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
decoder = RetinaDecoder(image_w=args.input_image_size,
image_h=args.input_image_size).cuda()
model = model.cuda()
optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
patience=3,
verbose=True)
if args.apex:
amp.register_float_function(torch, 'sigmoid')
amp.register_float_function(torch, 'softmax')
model, optimizer = amp.initialize(model, optimizer, opt_level='O1')
model = nn.DataParallel(model)
if args.evaluate:
if not os.path.isfile(args.evaluate):
raise Exception(
f"{args.resume} is not a file, please check it again")
logger.info('start only evaluating')
logger.info(f"start resuming model from {args.evaluate}")
checkpoint = torch.load(args.evaluate,
map_location=torch.device('cpu'))
model.load_state_dict(checkpoint['model_state_dict'])
all_eval_result = validate(Config.val_dataset, model, decoder)
if all_eval_result is not None:
logger.info(
f"val: epoch: {checkpoint['epoch']:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
return
best_map = 0.0
start_epoch = 1
# resume training
if os.path.exists(args.resume):
logger.info(f"start resuming model from {args.resume}")
checkpoint = torch.load(args.resume, map_location=torch.device('cpu'))
start_epoch += checkpoint['epoch']
best_map = checkpoint['best_map']
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
logger.info(
f"finish resuming model from {args.resume}, epoch {checkpoint['epoch']}, best_map: {checkpoint['best_map']}, "
f"loss: {checkpoint['loss']:3f}, cls_loss: {checkpoint['cls_loss']:2f}, reg_loss: {checkpoint['reg_loss']:2f}"
)
if not os.path.exists(args.checkpoints):
os.makedirs(args.checkpoints)
logger.info('start training')
for epoch in range(start_epoch, args.epochs + 1):
cls_losses, reg_losses, losses = train(train_loader, model, criterion,
optimizer, scheduler, epoch,
logger, args)
logger.info(
f"train: epoch {epoch:0>3d}, cls_loss: {cls_losses:.2f}, reg_loss: {reg_losses:.2f}, loss: {losses:.2f}"
)
if epoch % 5 == 0 or epoch == args.epochs:
all_eval_result = validate(Config.val_dataset, model, decoder)
logger.info(f"eval done.")
if all_eval_result is not None:
logger.info(
f"val: epoch: {epoch:0>5d}, IoU=0.5:0.95,area=all,maxDets=100,mAP:{all_eval_result[0]:.3f}, IoU=0.5,area=all,maxDets=100,mAP:{all_eval_result[1]:.3f}, IoU=0.75,area=all,maxDets=100,mAP:{all_eval_result[2]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAP:{all_eval_result[3]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAP:{all_eval_result[4]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAP:{all_eval_result[5]:.3f}, IoU=0.5:0.95,area=all,maxDets=1,mAR:{all_eval_result[6]:.3f}, IoU=0.5:0.95,area=all,maxDets=10,mAR:{all_eval_result[7]:.3f}, IoU=0.5:0.95,area=all,maxDets=100,mAR:{all_eval_result[8]:.3f}, IoU=0.5:0.95,area=small,maxDets=100,mAR:{all_eval_result[9]:.3f}, IoU=0.5:0.95,area=medium,maxDets=100,mAR:{all_eval_result[10]:.3f}, IoU=0.5:0.95,area=large,maxDets=100,mAR:{all_eval_result[11]:.3f}"
)
if all_eval_result[0] > best_map:
torch.save(model.module.state_dict(),
os.path.join(args.checkpoints, "best.pth"))
best_map = all_eval_result[0]
torch.save(
{
'epoch': epoch,
'best_map': best_map,
'cls_loss': cls_losses,
'reg_loss': reg_losses,
'loss': losses,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
}, os.path.join(args.checkpoints, 'latest.pth'))
logger.info(f"finish training, best_map: {best_map:.3f}")
training_time = (time.time() - start_time) / 3600
logger.info(
f"finish training, total training time: {training_time:.2f} hours")
if __name__ == '__main__':
args = parse_args()
logger = get_logger(__name__, args.log)
main(logger, args)
上で実現したのはnnです.DataParallelモードでのトレーニング、config.pyファイルとtrain.pyファイルの各スーパーパラメータは、以下のモデル評価におけるResNet 50-RetinNet-apex-aug項目のスーパーパラメータ設定に対応します.分布式訓練方法は次の文章で実現します.訓練を行うにはpython trainだけです.pyでいいです.
モデル再現状況評価
6つの文章の各方面のRetinaNetに対する再現方法によると、現在、論文のRetinaNetモデルの点数と3つの問題がある.
問題1については、現在検証できません.問題2については,次章で分散型トレーニング+カード間同期BNトレーニングを用いて解決する.質問3に対して、私は今もDetectronとDetectron 2のすべてのコードを読む精力がありません.皆さんの指摘を歓迎します.
COCOデータセットにおけるモデルの性能は以下のように表現される(入力解像度は600であり,RetinaNet論文における解像度450にほぼ等しい).
Network
batch
gpu-num
apex
epoch5-mAP-loss
epoch10-mAP-loss
epoch12-mAP-loss
one-epoch-training-times
ResNet50-RetinaNet
16
2
no
0.251,0.60
0.266,0.49
0.272,0.46
2h38min
ResNet50-RetinaNet
15
1
yes
0.251,0.59
0.272,0.48
0.276,0.45
2h31min
ResNet50-RetinaNet-aug
15
1
yes
0.254,0.62
0.274,0.53
0.283,0.51
2h31min
上記の結果はnnである.DataParallelモードでのトレーニング結果は,いずれもクロスカード同期BN(nn.DataParallelモードではクロスカード同期BNは使用できない)は使用されず,すべての実験トレーニングではRandomFlip+Resizeを用いてデータが増強され,テスト時に直接Resizeされた.バンド−augは訓練時にさらにRandomCropとRandomTranslateデータ増強を用いたことを示した.GPUは全てRTX 2080 tiを用いた.0.251,0.60はmAPが0.251であることを示し、このときの総lossは0.60である.2 h 38 minは2時間38分を表す.
その結果,同じデータ増強の場合に私のコードで訓練されたRetinaNet(0.276)は論文より3.5ポイント低い(論文中解像度450で点数推定は0.311程度であるべきである),これはSGDオプティマイザの代わりにAdamオプティマイザを用いたこと,および上面で提起された問題1,2,3による点数差のためであるはずである.次の文章では、RetinaNetを訓練するために分布式の訓練方法を行い、問題を解決することができます.