石原さとみ検出器を作る Part2

13808 ワード

DeepLearning MachineLearning Python TensorFlow ObjectDetection Python テキストリンク

概要

Deep Learning を使って、石原さとみ検出器（より正確には、石原さとみの”顔”検出器）を作る。

環境

OS: Windows 10
GPU: GeForce GTX 950M
Python: 3.5.4
Tensorflow-GPU: 1.5.0

前回の反省点

2週間ほど前、石原さとみ検出器を作ろうとしたら、単なる顔検出器ができあがってしまい絶望した。そのような結果になってしまった原因はおそらく以下の２点。

検出するクラスが石原さとみオンリー
Training Data のほとんどが石原さとみのソロ写真

モデルは比較対象（不正解データ）がない環境下で Training を行ったため、石原さとみの特徴を学習することができなかった。

新たな Training Data の収集

前回の反省点を踏まえ、石原さとみ以外の女性を不正解データとして Training Data に追加した。石原さとみ以外の女性には、Not_Satomiというラベルを付与した。

Google Image Searchで「日本女優 -石原さとみ」で検索し、石原さとみ以外の女優の画像を大量にスクレーピング
それらを石原さとみの画像と結合
結合後の画像に対して Annotate を施し、Bounding Box の座標を含むXMLファイルを生成

1~3の作業を繰り返し、以下ような「石原さとみ＋他の女性」という構成の画像と、それに対応するXMLファイルのセットを300個生成した。かなり骨の折れる作業だった、、、

sample.xml

<annotaion>
  <folder>original</folder>
  <filename>0fe95ac3db4b4a33897d178f0fce828ec0dfa9525e9d4e5eaf9b42c43569a849.jpg</filename>
  <src>https://contents.oricon.co.jp/upimg/news/20150123/2047614_201501230361924001421981909c.jpg</src>
  <segmented>0</segmented>
  <size>
    <width>597</width>
    <height>400</height>
    <depth>3</depth>
  </size>
  <object>
    <name>satomi_ishihara</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>405</xmin>
      <ymin>93</ymin>
      <xmax>508</xmax>
      <ymax>216</ymax>
    </bndbox>
  </object>
  <object>
    <name>not_satomi</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>150</xmin>
      <ymin>75</ymin>
      <xmax>290</xmax>
      <ymax>247</ymax>
    </bndbox>
  </object>
</annotaion>

Training Data 水増し

Training Dataの水増しでは、コントラスト調整、ノイズ付与、左右反転、回転を行った。

コントラスト調整、ノイズ付与に関しては、以下の記事を参考にさせて頂いた。感謝致します。
機械学習のデータセット画像枚数を増やす方法

左右反転と回転に関しては、画像だけではなく Bounding Box の座標も変換する必要があるため、関数を自作した。

transform_bbox.py

def rotate_bbox(bbox, img_size, rot_angle):
    img_h, img_w = img_size
    center = tuple(map(lambda x: int(x / 2), img_size))
    rot_matrix = cv2.getRotationMatrix2D(center, rot_angle, scale=1)
    bbox_w = bbox[2] - bbox[0]
    bbox_h = bbox[3] - bbox[1]
    rect = ((bbox[0] + bbox_w / 2, bbox[1] + bbox_h / 2), (bbox_w, bbox_h), 0)
    box = cv2.boxPoints(rect)
    pts = np.int0(cv2.transform(np.array([box]), rot_matrix))[0]
    pts[pts < 0] = 0
    pts[pts[:, 0] > img_w, 0] = img_w
    pts[pts[:, 1] > img_h, 1] = img_h
    x, y, w, h = cv2.boundingRect(pts)

    return x, y, x + w, y + h

def hflip_bbox(bbox, img_size):
    h, w = img_size
    xmin, ymin, xmax, ymax = bbox

    return (w - xmax, ymin, w - xmin, ymax)

左が変換前、右が変換後（左右反転＋回転＋Gaussian Noise）

水増し後の Training Data の数は2400セット。

Training

使用したモデル前回と同じくTensorflow detection model zooのfaster_rcnn_inception_v2_coco。参考にしたチュートリアルに従って、このモデルを使用しているが、YOLOやSSDも試してみたい。約50k Step 経過したところで Training 終了。

Test

前回は「人間」＝「石原さとみ」というカオスな状態になっていたが果たして今回はどうだろうか？
以下が出力結果（緑色の枠が石原さとみ、それ以外は水色の枠）

比較的高い Confidence で石原さとみとそれ以外の人物を区別できている。ただし、以下のように誤検出してしまっているケースも幾つか見られた。

今後行うこと

前回よりは確実に進歩したものの、誤検出の数をもっと減らす必要がある。

Training Data の数を更に増やす
Faster-RCNN 以外のモデルも試してみる

戯言

高性能な GPU マシンが欲しい

最後に

本投稿に目を通して頂きありがとうございます。

Author And Source

この問題について(石原さとみ検出器を作る Part2), 我々は、より多くの情報をここで見つけました https://qiita.com/harupy/items/89c6acd81658399a014e

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .