ディープラーニングジョージ-CS 231 nラーニング(図5)

7179 ワード

勉強する前に最後の復習!温度調節が出来ないミラクル朝友達の尚賢と一緒に
日差しが明るくて、私たちの肌は青白くて、目の底は黒いです......
健康を保つためにMiracleの朝が始まった.ああ一古ああ一古...

Lecture 5 - Convolutional Neural Networks


(1) History of Neural Networks


1.


2.


3. Backpropogation (1986)


4. Reinvigorated research in Deep Learning (2006)


5. First Strong Results of NNs (2012)

  • Speech Recognition, Image Recognition, ... introduced first convolutional NNs and dramatically reduced errors & loss
  • (2) History of Image Processing


    1. Hubel & Wiesel


    Topographical mapping in the cortex : nearby cells in cortex represent nearby regions in the visual field
    Discovery of Hierarchical Organization :

    2. Neocognitron - Fukushima (1980)


    first example of network architectrue/model that had idea of simple and complex cells (Hubel & Wiesle).
    Alternating layers of simple and complex cells - simple (modifiable parameters) and complex on top (pooling - invariant to different minor modifications from simple cells)

    3. Gradient-based learning applied to Document Recognition - LeCun et al., (1998)


    applying of backpropogation and gradient-based learning to train convolutional NNs, which did well in recognizing documents especially zip codes (actually used in postal services!)
    しかし複雑なデータはまだありません

    4. "Alexnet" ImageNet Classification with Deep Concolutional NNs (2012)


    5. Today

  • -ConvNets used everywhere ! Detecting images, and segmentations
    (labeling every pixel and outlining)
  • ex) face-recognition, video classification, pose recognition (joints, ...), street sign recognition, aerial maps (segmenting streets, buildings), image captioning (writing sentence description of image), artwork created by NN

    (3) Convolutional Layer


    1. Fully Connected Layer



    input - stretched out 32x32 image to 3072x1 vector
    weight matrix - 10x3072 weights
    activation/output layer - 1x10

    2. Convolutional Layer


    1. instead of stretching out the image to a one long vector, we keep the dimensions.


    これはどちらがいいですか.構造的に違うだけじゃないですか?
  • 高効率
  • 入力-寸法は変わらず、32 x 32 x 3画像(RGBですか?)
  • filter - filters always extend the full depth of input volume, 5x5x3
  • 2. Dot production of this filter, and a chunk of image



  • first - overlay the filter on top of a spatial location of image,

  • second - do the dot product - multoplication of each element of filter with each corresponding element on spatial location of image

  • number of multiplications : 5 x 5 x 3

  • (W transpose * X) + bias
    		Questions
    1. when we do the dot product, do we turn the 5x5x3 to vector?
    Yes, you can think of it as plopping the filter on and doing the element-wise multiplication at each location. But stretching out the filter and stretching out the input volume will give you same results.
      2. any intuition for why this is a W transpose? 
      No intuition. Just a notation to make the math work as a dot product - 1D vector
      
      3. is W not 5x5x3 but 75x1? 
      Yes, stretching out is needed before dot product multiplication - 
      
  • 3. Overlaying filter on top of image - convolve (slide) over all spatial locations


  • start from upper-lef corner and center the fulter on top of every pixel in this input volume
  • each filter is looking for a certain type of template or concept in the input volume
  • このフィルターはactivationfunctionに似ていますか?さもないと直線層ですか?ラヤーじゃない1つのレイヤで複数のフィルタと呼ばれます.
    では、分類されたフィルタのために?eX)画像被巻き確率のフィルタ,画像被巻き確率のフィルタ...
    activation mapが28 x 28 x 1の理由は、filterがエラーを避けるために並べ替えを繰り返すため、32-5-1?そうですか.これでもいいですか.各要素にはweightがありますが、少ない要素もありますか?私の理解は間違っていますか...ははは
    これは
    do this for multiple filters!
  • number of activation maps = number of filters !
  • 4. Preview of Convnet



    Example of Activation Maps


  • top : row of 5x5 filter
  • filter with red box - oriented edge template. Sliding over a image, white value (high value) on location of edges (where this template is more present in image)
  • filter slides over image, computes dot product at each location
  • Example of car image processing



    3. Spatial Dimensions - Closer Look of Sliding


    7x7 image with 3x3 filter



    streepn-たぶんn格が簡単なら
    stride 1 : 5x5 output size
    stride 2 : 3x3 output size
    stride 3 : doesn't fit! asymmetrical outputs - not all designs are possible

    Output Sizes



    Zero Padding


    if you pad your pixels-フレームを描いて出力を調整します!さっき好奇心があった!
    Maintaining output size, applying filters to edges

    After padding、9 x 9になり、
    ここに3 x 3を挿入すると、7 x 7が出力されます.
    	Question
        1. What's the actual number of outputs?
        In this case, 7x7x (number of filters) 
        
        depth 어디 갔지ㅣ.. 
        
        2. how does this connect to input with depth?
        this example은 보기 쉬우라고 2D로 한거고, depth 도 곱하면 되세요~
        3. other ways thatn zero-padding exists
        
        4. non-square images?
        
        
  • size shorting shortfastは、情報を失っていることを意味します.これはよくありません:(.画質が少し低い).
    これを解決したのはpadding?保護膜を縮小する.
    保護フィルムを縮小する
  • padding size? whatever fits your model/image size&filter size
  • with padding - filter can be bigger than image
  • アクティブ化図は画像より大きくてもいいですか?O O O Oパズルでいいです!!!
  • Calculating Output Volume Size



    Size : 32x32(zero-padding)x10(filters)
    Number of Parameters : ((5x5x3) +1) x 10 = 760
    (5x5x3 weights + 1 biases ) x 10 filters

    1 x 1 Convolution Layes? YES!



    複数のピクセルから特性を奪うという意味は1 x 1ですか?2次元では意味がありません.
    でも意味があるのは「depth」から始まる3次元!
    四角形

    (4) ConvNet


    Poyoung=downsampling(最大プール)
  • 個の
  • のみ
  • pooling filter
  • は小さくなりますが、効果的に意味のある情報が含まれています.
    畳み込みNNの利点−1 D−脈絡ではなく2 Dから2 Dを示す
    6強になる前に考えなければならないことは?
    なぜactivization関数を使用するのですか?