maxnet学習(1):image関数


reference:https://mxnet.incubator.apache.org/api/python/image/image.html#mxnet.image.imread
注意gluonの関数ではありません
1.mxnet.image.imdecodeとmxnet.image.imreadの違い
どちらもC++のopencvを使用して画像を処理し、imdecodeは画像をNDarrayに復号し、これまでは画像を読み込む必要があった.imreadは画像を直接読み込んで復号します.どちらもflag=0で階調画像を読み込み、to_を設定できますrgb=0は、元のbgrフォーマット(opencv)を維持します.
読み込んだ画像はいずれも0~255,shape=(H,W,C)である.一方gluonを入力するネットワークには0~1,shape=(n,C,H,W)が必要であり,ネットワークを入力する前に変換する必要がある.
img = mxnet.image.imdecode(open("dog.jpg", "rb").read())
img = mxnet.image.imread("dog.jpg")
def transform(data): # Imagenet pretrained model 
    data = data.transpose((2, 0, 1)).expand_dims(axis = 0)
    rgb_mean = nd.array([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1)
    rgb_std = nd.array([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1)
    return (data.astype('float32') / 255 - rgb_mean) / rgb_std
input_image = transform(img)#          gluon  

2.cv2.imreadとmxnet.image.imreadの違い
前者はcv 2、後者はc++バージョンのopencvを使用します.前者の読取結果はnumpy array、後者の読取結果はndである.array.前者のチャネルはbgrであり、後者のデフォルトはrgbである.
3.mxnet.image.resize, mxnet.image.resize_short
前者は強制resize、後者は短辺resizeを指定したサイズにスケールします.
4.mxnet.image.scale_down
cropの場合、w/hがピクチャのw/hより大きい場合は、cropのサイズを比例して調整します.
5.mxnet.image.color_normalize(src, mean, std = None)
画像をmeanとstdでnormalize,RGB順のNDarray
6.class mxnet.image.ImageIter()
class mxnet.image.ImageIter(
                            batch_size,
                            data_shape, #   3  RGB
                            label_width=1, 
                            path_imgrec=None,
                            path_imglist=None, 
                            path_root=None, 
                            path_imgidx=None, 
                            shuffle=False, 
                            part_index=0, 
                            num_parts=1, 
                            aug_list=None, 
                            imglist=None, 
                            data_name ='data', 
                            label_name ='softmax_label', 
                            dtype='float32', 
                            last_batch_handle='pad', 
                            **kwargs
                            )

これはaugmentation操作が大量にあるdata iteratorであり、それはサポートされている.recファイルまたは元のピクチャ読み出しデータpath_imgrecパラメータload.recファイルを使用し、path_imglistパラメータload元のピクチャデータを使用します.path_imgidxパラメータを指定してデータ分散トレーニングまたはshufflingを使用
リファレンス
http://mxnet.incubator.apache.org/versions/master/api/python/image/image.html#mxnet.image.ImageIter https://blog.csdn.net/u014380165/article/details/74906061
使用例
data_iter = mx.image.ImageIter(batch_size=4, data_shape=(3, 227, 227),
                              path_imgrec="./data/caltech.rec",
                              path_imgidx="./data/caltech.idx" )

# data_iter    mxnet.image.ImageIter
#reset()      :resents the iterator to the beginning of the data
data_iter.reset()

#batch    mxnet.io.DataBatch,  next()        DataBatch
batch = data_iter.next()

#data   NDArray,     batch    ,     batch_size   4,  data size 4*3*227*227
data = batch.data[0]

#  for        batch         
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(data[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
plt.show()
mx.image.CreateAugmenter()を使用して画像augmentationを行います.
train = mx.image.ImageIter(
        batch_size            = args.batch_size,
        data_shape          = (3,224,224),
        label_width           = 1,
        path_imglist          = args.data_train,
        path_root              = args.image_train,
        part_index            = rank,
        shuffle                  = True,
        data_name           = 'data',
        label_name           = 'softmax_label',
        aug_list                 = mx.image.CreateAugmenter((3,224,224),resize=224,rand_crop=True,rand_mirror=True,mean=True))

image.CreateAugmenter関連の設定とパラメータ
image.CreateAugmenter(
                data_shape,
                resize=0,
                rand_crop=False,
                rand_resize=False,
                rand_mirror=False,
                mean=None,#     True,  imagenet   
                std=None,#  
                brightness=0,
                contrast=0,
                saturation=0,
                hue=0,
                pca_noise=0,
                rand_gray=0,
                inter_method=2
                )
#Creates an augmenter list.

Parameters:
  • data_shape (tuple of int) – Shape for output data
  • resize (int) – Resize shorter edge if larger than 0 at the begining
  • rand_crop (bool) – Whether to enable random cropping other than center crop
  • rand_resize (bool) – Whether to enable random sized cropping, require rand_crop to be enabled
  • rand_gray (float) – [0, 1], probability to convert to grayscale for all channels, the number of channels will not be reduced to 1
  • rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5
  • mean (np.ndarray or None) – Mean pixel values for [r, g, b]
  • std (np.ndarray or None) – Standard deviations for [r, g, b]
  • brightness (float) – Brightness jittering range (percent)
  • contrast (float) – Contrast jittering range (percent)
  • saturation (float) – Saturation jittering range (percent)
  • hue (float) – Hue jittering range (percent)
  • pca_noise (float) – Pca noise level (percent)
  • inter_method (int, default=2(Area-based)) – Interpolation method for all resizing operations Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).