U-stage day 7

3950 ワード

1.授業内容


[DL Basic]Optimization


  • Bagging vs Boosting
    Bagging: Multiple models are being trained with boostrappinng. ex) Ensemble
    Boosting: It focuses on those specific training samples that are hard to classify.

  • Gradient Descent

    First-order iterative optimization algorithm for finding a local minimum of a differentiable function.

  • Gradient Descent Methods
    Stochastic gradient descent, Momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSprop, Adam
    アダムについてもっと知っていれば
    Adam: Adaptive Moment Estimation leverages both past gradients and squared gradients

    Adam effectively combines momentum with adaptive learning rate approach.

  • Regularization
  • 1)Early Stopping

    2)Parameter Norm Penalty

    3)Data Augmentation

    4)Noise robustness : Add random noises inputs or weights.
    5)Label smoothing
    Mix-up constructs augmented training examples by mixing both input and output of two randomly selected training data.
    CutMix constructs augmented training examples by mixing inputs with cut and paste and outputs with soft labels of two randomly selected training data.

    6)Dropout : In each forward pass, randomly set some neurons to zero.
    7)Batch normalization
    Batch normalization compute the empirical mean and variance independently for each dimension (layers) and normalize. There are different variances of normalizations.
  • 実習
    必修科目2内容
  • 2.課題実行過程/成果整理

  • Define MLP model
  • class Model(nn.Module):
        def __init__(self,name='mlp',xdim=1,hdims=[16,16],ydim=1):
            super(Model, self).__init__()
            self.name = name
            self.xdim = xdim
            self.hdims = hdims
            self.ydim = ydim
    
            self.layers = []
            prev_hdim = self.xdim
            for hdim in self.hdims:
                self.layers.append(nn.Linear(
                    prev_hdim, hdim, bias = True
                ))
                self.layers.append(nn.Tanh())  # activation
                prev_hdim = hdim
            # Final layer (without activation)
            self.layers.append(nn.Linear(prev_hdim,self.ydim,bias=True))
    
            # Concatenate all layers 
            self.net = nn.Sequential()
            for l_idx,layer in enumerate(self.layers):
                layer_name = "%s_%02d"%(type(layer).__name__.lower(),l_idx)
                self.net.add_module(layer_name,layer)
    
            self.init_param() # initialize parameters
        
        def init_param(self):
            for m in self.modules():
                if isinstance(m,nn.Conv2d): # init conv
                    nn.init.kaiming_normal_(m.weight)
                    nn.init.zeros_(m.bias)
                elif isinstance(m,nn.Linear): # lnit dense
                    nn.init.kaiming_normal_(m.weight)
                    nn.init.zeros_(m.bias)
        
        def forward(self,x):
            return self.net(x)

    3.ピアセッション


    学習内容の共有


    1.課題コードコメント

  • 必修2特性に基づいて
  • をスキップ

    1.授業内容と深化内容を討論する


    [DL Basic]Optimization

    3.論文のコメント


    1. VGG
    2. Batch Normalization

    4.学習回顧


    李ゴヨンのギットの話を聞いた
    Gitは確かに理解しにくいが、なぜ開発者の必須機能なのかを知ることができる.
    論文の評論を通して,あるモデルから別のモデルへの過程を理解した.
    day 9で私の論文を発表する予定ですが、初めてなので、時間をかけて見に来ます.