Deeper Look at GD


学習目標


傾斜降下法をより詳細に理解する.

コアキー


仮定関数
へいきんにじゅうごさ
けいしゃこうかほう

最適なモデル


  • H(x) = x

  • W=1が一番いい数字
  • ! pip install  torchvision
    
    import numpy as np
    import torch
    
    
    # Dummy data : Input = Output
    
    x_train = torch.FloatTensor([[1], [2], [3]])
    y_train = torch.FloatTensor([[1], [2], [3]])
    
    #Simpler Hypothesis Function
    
    W = torch.zeros(1, requires_grad = True)
    
    # b = torch.zeros(1, requires_grad = True)
    
    hypothesis = x_train * W
    
    
    print(hypothesis)
    

    Cost function: Intuition
    上記の関数では、モデルの予測値と実際のデータの違いを表示するためにパフォーマンスを確認できます.

  • W = 1, cost = 0

  • 1から遠いほど、高くなります.

  • コストが低ければ低いほど、勉強がよくなる.

  • Cost function: MSE
    cost = torch.mean((hypothesis - y_train) ** 2)
    
    print(cost)

    Gradient Descent: Intuition


  • 曲線に降下

  • 傾斜が大きいほど、距離が遠くなります

  • こうばいけいさん


  • これにより、コストを最小限に抑えることができます.

  • 負の値を傾けてWを大きくする

  • 傾斜抽水、W減少

  • 傾斜が大きく、Wが大きく、コストが高い

  • 傾斜が緩やかであればあるほどコストが0に近いため、Wを少し変える
  • Gradient Descent: The Math
    単純微分

    Gradient Descent: Code
  • αlr
  • gradient = 2 * torch.mean((W * x_train - y_train) * x_train)
    lr = 0.1
    W = torch.tensor([1.0, 2.0, 3.0]) # leaf variable
    W -= lr * gradient
    
    #에러: 값을 지정하지 않고 빼려고 했기 때문입니다.
    #RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
    
    #해결: 값을 지정하니 됨
    
    print(W)

    GD with torch.optim


  • torch.OptimによるGD
  • Optimizer定義
  • optimizer.勾配はzero grad()=0初期化
  • cost.backward()を使用して勾配を計算し
  • optimizer.step()gd
  • #optimizer설정(학습 가능한 변수와 learning weight알아야함)
    W = torch.tensor([1.0, 2.0, 3.0]) # leaf variable
    optimizer = torch.optim.SGD([W], lr=0.15)
    
    #cost로 H(x)계산
    
    #W의 gradient저장
    
    #W의 값을 gradient에 맞게 업데이트
    
    
    optimizer.zero_grad() # optimizer저장되어 있는 모든 학습 가능한 변수의 gradient를 0으로 초기화
    
    cost.backward() #cost function미분 후 각 변수의 gradient채우기
    optimizer.step() #저장된 gradient값으로 gd실행
    
    print(W)

    Pythorchはleaf変数がなく、オプティマイザもだめです.

    Deeper Look at GD (Full code)

    ! pip install torchvision
    
    ! pip install --upgrade pip
    
    import numpy as np
    import torch
    
    #데이터
    
    x_train = torch.FloatTensor([[1], [2], [3]])
    y_train = torch.FloatTensor([[1], [2], [3]])
    
    #모델 초기화
    
    W = torch.zeros(1)
    
    #lr설정
    
    lr = 0.1
    
    nb_epochs = 10 #데이터 학습 횟수
    
    for epoch in range(nb_epochs + 1):
        
    #학습하면서, 1에 수렴하는 W와 줄어드는 cost
        
        #H(x)계산
        
        hypothesis = x_train * W
        
        #cost gradient 계산
        
        cost = torch.mean((hypothesis - y_train) ** 2)
        gradient = torch.sum((W * x_train - y_train) * x_train)
        
        print('Epoch {:4d}/{} W: {:.3f}, Cost: {:.6f}'.format(
            epoch, nb_epochs, W.item(), cost.item()
        ))
        
        # cost gradient로 H(x)계산
        
        W -= lr * gradient
        
        
    #Full Code with torch.optim
    #데이터
    
    a_train = torch.FloatTensor([[1], [2], [3]])
    b_train = torch.FloatTensor([[1], [2], [3]])
    
    #모델 초기화
    
    w = torch.zeros(1)
    
    #lr설정
    
    lr = 0.1
    
    nb_epochs = 20 #데이터 학습 횟수
    
    for epoch in range(nb_epochs + 1):
        
    #학습하면서, 1에 수렴하는 W와 줄어드는 cost
        
        #H(x)계산
        
        hypothesis = a_train * w
        
        #cost gradient 계산
        
        cost = torch.mean((hypothesis - b_train) ** 2)
        gradient = torch.sum((w * a_train - b_train) * a_train)
        
        print('Epoch {:4d}/{} w: {:.3f}, Cost: {:.6f}'.format(
            epoch, nb_epochs, w.item(), cost.item()
        ))
        
        # cost gradient로 H(x)계산
        
        w -= lr * gradient