【PyTorch】手話の数字をPyTorchで転移学習してtfliteに変換してみた ※前編


手話の数字(1~10)をPyTorchで転移学習してtfliteに変換してみた ※前編

前回、機械学習用にmediapipeのhand_trackを使用して手の画像を取り出しました。今回はPyTorchを使用して手話の数字(1~10)を分類するモデルを作成してみました。作成したモデルはmediapipeで使用したいので、pytorch →onnx →tensorflow →tflite と変換しております(記事が長くなりましたので前編、後編に分割しました)。

環境

  • Colaboratory - Google Colab ※以下、2020/12/19時点
  • torch : 1.7.0
  • torchvision : 0.8.1
  • matplotlib : 3.2.2
  • onnx : 1.8.0
  • onnx_tf : 1.7.0
  • tensorflow : 2.4.0
  • tf-nightly-gpu : 2.5.0

参考にさせて頂いたページ・資料

事前作業

前回、mediapipeで手の画像を取り出せるようにしましたので、youtubeで手話の1~10の動画を探してデータソースにいたしました。1~10のフォルダを「train」と「val」に分けて学習用データとしました。「train」が2156枚(フォルダ毎に200枚強)、「val」が250枚(フォルダ毎に25枚)です。

コードと結果サマリ

※Colab ノートブックで実行、「ランタイムのタイプを変更」からGPUを選択してください

!pip install onnx
!pip install onnx-tf
!pip install tf-nightly-gpu
from google.colab import drive
drive.mount('/content/drive')

from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os 
import copy

plt.ion()
os.chdir('/content/drive/My Drive/任意のパスに書き換えてください')
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

出力:cuda:0 

data_transforms = {
    'train': transforms.Compose([
        #transforms.RandomResizedCrop(224),
        transforms.Resize(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'images/'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

※「RandomResizedCrop」を使うと手の形が分からない画像になるためコメントアウト

print('-'*10, 'image_datasets','-'*10,'\n', image_datasets)
print()
print('-'*10,'train dataset','-'*10,'\n', image_datasets['train'])
print()
print('-'*10,'label','-'*10,'\n', image_datasets['train'].classes)

データセットとラベル

---------- image_datasets ----------
{'train': Dataset ImageFolder
Number of datapoints: 2156
Root location: images/train
StandardTransform
Transform: Compose(
Resize(size=224, interpolation=PIL.Image.BILINEAR)
RandomHorizontalFlip(p=0.5)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
), 'val': Dataset ImageFolder
Number of datapoints: 250
Root location: images/val
StandardTransform
Transform: Compose(
Resize(size=224, interpolation=PIL.Image.BILINEAR)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)}

---------- train dataset ----------
Dataset ImageFolder
Number of datapoints: 2156
Root location: images/train
StandardTransform
Transform: Compose(
Resize(size=224, interpolation=PIL.Image.BILINEAR)
RandomHorizontalFlip(p=0.5)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)

---------- label ----------
['1', '10', '2', '3', '4', '5', '6', '7', '8', '9']

def imshow(inp, title=None):
  "imshow for Tensor"
  inp = inp.numpy().transpose((1,2,0))
  mean = np.array([0.485, 0.456, 0.406])
  std = np.array([0.229, 0.224, 0.225])
  inp = std * inp + mean
  inp = np.clip(inp, 0, 1)
  plt.imshow(inp)
  if title is not None:
    plt.title(title)
  plt.pause(0.001)

inputs, classes = next(iter(dataloaders['train']))

out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

※imshowの出力は以下の感じです、BGR→RBGの補正をかけておりません

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 10)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
lr_scheduler = torch.optim.lr_scheduler
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)


for x in list(model_ft.children()):
  print(x, '\n')
def train_model(model, criterion, optimizer, scheduler, num_epochs=10):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    #tensor(max, max_indices)なのでpredは0,1のラベル
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

# training
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=10)

エポック毎の出力

----------Epoch 0/24----------
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
train Loss: 1.1866 Acc: 0.5914
val Loss: 0.5993 Acc: 0.8080

----------Epoch 1/24----------
train Loss: 0.4804 Acc: 0.8465
val Loss: 0.6612 Acc: 0.8240

----------Epoch 2/24----------
train Loss: 0.3129 Acc: 0.9109
val Loss: 0.5297 Acc: 0.8400

----------Epoch 3/24----------
train Loss: 0.1777 Acc: 0.9481
val Loss: 0.6338 Acc: 0.8240

----------Epoch 4/24----------
train Loss: 0.1711 Acc: 0.9494
val Loss: 0.4457 Acc: 0.8800

----------Epoch 5/24----------
train Loss: 0.1360 Acc: 0.9592
val Loss: 0.4951 Acc: 0.8560

----------Epoch 6/24----------
train Loss: 0.0726 Acc: 0.9787
val Loss: 0.4095 Acc: 0.9000

----------Epoch 7/24----------
train Loss: 0.0687 Acc: 0.9791
val Loss: 0.3880 Acc: 0.9040

----------Epoch 8/24----------
train Loss: 0.0545 Acc: 0.9856
val Loss: 0.3809 Acc: 0.9040

----------Epoch 9/24----------
train Loss: 0.0592 Acc: 0.9828
val Loss: 0.3716 Acc: 0.8960

----------Epoch 10/24----------
train Loss: 0.0507 Acc: 0.9870
val Loss: 0.3494 Acc: 0.9040

----------Epoch 11/24----------
train Loss: 0.0397 Acc: 0.9912
val Loss: 0.3674 Acc: 0.9000

----------Epoch 12/24----------
train Loss: 0.0404 Acc: 0.9889
val Loss: 0.3840 Acc: 0.8960

----------Epoch 13/24----------
train Loss: 0.0379 Acc: 0.9912
val Loss: 0.3682 Acc: 0.8960

----------Epoch 14/24----------
train Loss: 0.0290 Acc: 0.9921
val Loss: 0.3791 Acc: 0.9000

----------Epoch 15/24----------
train Loss: 0.0526 Acc: 0.9852
val Loss: 0.4144 Acc: 0.8920

----------Epoch 16/24----------
train Loss: 0.0492 Acc: 0.9865
val Loss: 0.4065 Acc: 0.8960

----------Epoch 17/24----------
train Loss: 0.0381 Acc: 0.9903
val Loss: 0.3675 Acc: 0.8920

----------Epoch 18/24----------
train Loss: 0.0452 Acc: 0.9893
val Loss: 0.3857 Acc: 0.9080

----------Epoch 19/24----------
train Loss: 0.0374 Acc: 0.9893
val Loss: 0.3788 Acc: 0.8920

----------Epoch 20/24----------
train Loss: 0.0453 Acc: 0.9879
val Loss: 0.3743 Acc: 0.9000

----------Epoch 21/24----------
train Loss: 0.0327 Acc: 0.9935
val Loss: 0.3516 Acc: 0.9040

----------Epoch 22/24----------
train Loss: 0.0431 Acc: 0.9903
val Loss: 0.3771 Acc: 0.9040

----------Epoch 23/24----------
train Loss: 0.0314 Acc: 0.9926
val Loss: 0.3856 Acc: 0.9040

----------Epoch 24/24----------
train Loss: 0.0412 Acc: 0.9889
val Loss: 0.4018 Acc: 0.8880

Training complete in 9m 22s
Best val Acc: 0.908000

def tensor_to_np(inp):
  "imshow for Tensor"
  inp = inp.numpy().transpose((1,2,0))
  mean = np.array([0.485, 0.456, 0.406])
  std = np.array([0.229, 0.224, 0.225])
  inp = std * inp + mean
  inp = np.clip(inp, 0, 1)
  return inp

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = fig.add_subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title('predicted: {}  label: {}'
                             .format(class_names[preds[j]], class_names[labels[j]]))
                ax.imshow(tensor_to_np(inputs.cpu().data[j]))

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

visualize_model(model_ft)

※最終的に精度は85~90%くらいでした、imshowの出力は以下の感じです

#torchモデルの保存:GPU
torch_model = 'model/jsl_one_to_ten_gpu.pth'
torch.save(model_ft.state_dict(), torch_model)

#torchモデルの保存:CPU
torch_model = 'model/jsl_one_to_ten_cpu.pth'
torch.save(model_ft.to('cpu').state_dict(), torch_model)

後編

後編はモデルの変換と保存 →ロードしたモデルでの動作確認の内容になっております。