ニューラルネットワークを作る

12366 ワード

深く勉強する. ニューラルネットワークテキストリンク

直接符号化により深さ学習構造を簡単にする

まずkerasが提供するmnistファイルを呼び出しました.

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# MNIST 데이터를 로드. 다운로드하지 않았다면 다운로드까지 자동으로 진행됩니다. 
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train_norm, x_test_norm = x_train / 255.0, x_test / 255.0
x_train_reshaped = x_train_norm.reshape(-1, x_train_norm.shape[1]*x_train_norm.shape[2])
x_test_reshaped = x_test_norm.reshape(-1, x_test_norm.shape[1]*x_test_norm.shape[2])

ニューラルネットワーク構造では、入力値に各重み付け値を乗算して計算します.マトリクス演算の特徴を利用して簡単な試みを行う.

y = WX + B

# 테스트를 위해 x_train_reshaped의 앞 5개의 데이터를 가져온다.
X = x_train_reshaped[:5]
print(X.shape)

(5, 784)


weight_init_std = 0.1
input_size = 784
hidden_size=50

# 가중치는 랜덤한 값으로 하겠습니다.
W1 = weight_init_std * np.random.randn(input_size, hidden_size)  
# 편향 B는 일단 0으로 하겠습니다.
b1 = np.zeros(hidden_size)

a1 = np.dot(X, W1) + b1   # 은닉층 출력

print(W1.shape)
print(b1.shape)
print(a1.shape)

(784, 50)
(50,)
(5, 50)
以上のコードはy=WX+bを計算した.a 1の結果を見てみましょう.

a1[0]

array([ 0.48653043, 1.21548756, 0.52309409, 0.93320544, -0.19513847,
1.28226922, 0.21049477, 0.24713435, 1.50274588, 0.31687977,
ここまではPerceptronとも言え,ニューラルネットワークとPerceptronの違いを活性化関数に乗じている.ここで行うアクティブ化関数は信号です.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))  


z1 = sigmoid(a1)
print(z1[0])

[0.61928875 0.77126847 0.62787098 0.71772515 0.4513696 0.7828358
0.55243025 0.56147104 0.81798366 0.57856364 0.78485329 0.48944915
0.34330287 0.3729736 0.40342849 0.38269088 0.46182855 0.72459828

信号関数は、上記の計算値を0～1に変更します。関数をアクティブにして非線形特性を加えることで、表現力をより強くしますか?そう思える

これまで行った順方向ニューラルネットワーク構造を2層に積層し,実行コードを生成する.

まず、1つのレイヤ単位で移動する関数を作成します.


def affine_layer_forward(X, W, b):
    y = np.dot(X, W) + b
    cache = (X, W, b)
    return y, cache

次のコードはW重み付けランダムをBに0に設定します.

行列の特性は、その形状がペアでなければならないことに注意してください.

input_size = 784
hidden_size = 50
output_size = 10

W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

a1, cache1 = affine_layer_forward(X, W1, b1)
z1 = sigmoid(a1)
a2, cache2 = affine_layer_forward(z1, W2, b2)    # z1이 다시 두번째 레이어의 입력이 됩니다. 

print(a2[0])

[ 0.00249933 0.01344503 -0.42532385 0.05259049 -0.70140225 0.58711998
-0.49198317 -0.30481471 0.30129486 0.56424935]
ここではsoftmaxを適用します.softmaxは0から1を返す特性があり、返されるすべての値を加算すると1になります.すべての損失を含む.

def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) # 오버플로 대책
    return np.exp(x) / np.sum(np.exp(x))

y_hat = softmax(a2)
y_hat[0]

array([0.09568883, 0.09674197, 0.0623821 , 0.10060408, 0.04733264,
0.17169545, 0.05835932, 0.07037145, 0.12901093, 0.16781323])
ソフトmaxで値を取得した場合、最大値を取得した値は正しいです.しかし,ニューラルネットワークでは,これらの損失値は前に計算した重みを修正する.このためloss関数を用いた.ここではクロスエントロピーを用いる
クロスエントロピーは,2つの確率分布間の類似度が高いほど差が小さくなる.さらに,エントロピー関数にソフトMax値を加えることで,各回答との相違を測定した.正解を答えたときの不安度を一つの値と見なすことができる.

[245779152]参照
crossenropyを使用するには、labelを0と1で表す熱符号化が必要です.

def _change_one_hot_label(X, num_category):
    T = np.zeros((X.size, num_category))
    for idx, row in enumerate(T):
        row[X[idx]] = 1
        
    return T

Y_digit = y_train[:5]
t = _change_one_hot_label(Y_digit, 10)
t

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
クロスエントロピー式に基づいてloss値を定義し、検索します.


def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # 훈련 데이터가 원-핫 벡터라면 정답 레이블의 인덱스로 반환
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size

Loss = cross_entropy_error(y_hat, t)
Loss

2.2142990931042563

batch_num = y_hat.shape[0]
dy = (y_hat - t) / batch_num
dy

rray([[ 0.01913777, 0.01934839, 0.01247642, 0.02012082, 0.00946653,
-0.16566091, 0.01167186, 0.01407429, 0.02580219, 0.03356265],
パラメータWに従って変化する誤差Lの変化量を解く.


batch_num = y_hat.shape[0]
dy = (y_hat - t) / batch_num
dy

array([[ 0.01913777, 0.01934839, 0.01247642, 0.02012082, 0.00946653,
-0.16566091, 0.01167186, 0.01407429, 0.02580219, 0.03356265],
[https://deepnotes.io/softmax-crossentropy]
傾きと信号から得られた値にdy変化量を乗じる.


dW2 = np.dot(z1.T, dy)    
dW2

全てのパラメータW 1,b 1,W 2,b 2を傾ける

dW2 = np.dot(z1.T, dy)
db2 = np.sum(dy, axis=0)

中間セグメントごとに信号が使用されるため,活性化関数の勾配も考慮される.

def sigmoid_grad(x):
    return (1.0 - sigmoid(x)) * sigmoid(x)

dz1 = np.dot(dy, W2.T)
da1 = sigmoid_grad(a1) * dz1
dW1 = np.dot(X.T, da1)
db1 = np.sum(dz1, axis=0)

パラメータを更新する関数を考慮し、learning rateを考慮します.

learning_rate = 0.1

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2
    return W1, b1, W2, b2

逆関数を作成してループを作成しましょう

def affine_layer_backward(dy, cache):
    X, W, b = cache
    dX = np.dot(dy, W.T)
    dW = np.dot(X.T, dy)
    db = np.sum(dy, axis=0)
    return dX, dW, db

# 파라미터 초기화
W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

# Forward Propagation
a1, cache1 = affine_layer_forward(X, W1, b1)
z1 = sigmoid(a1)
a2, cache2 = affine_layer_forward(z1, W2, b2)

# 추론과 오차(Loss) 계산
y_hat = softmax(a2)
t = _change_one_hot_label(Y_digit, 10)   # 정답 One-hot 인코딩
Loss = cross_entropy_error(y_hat, t)

print(y_hat)
print(t)
print('Loss: ', Loss)
        
dy = (y_hat - t) / X.shape[0]
dz1, dW2, db2 = affine_layer_backward(dy, cache2)
da1 = sigmoid_grad(a1) * dz1
dX, dW1, db1 = affine_layer_backward(da1, cache1)

# 경사하강법을 통한 파라미터 업데이트    
learning_rate = 0.1
W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

[[0.07850856 0.05053692 0.22863595 0.08858019 0.10341642 0.07413686
0.15236807 0.08707045 0.06696473 0.06978184][0.07063963 0.05589779 0.22244935 0.09845158 0.09501542 0.07559829
0.15462916 0.08653334 0.07542828 0.06535716]
[0.06562364 0.07036418 0.1837287 0.09718998 0.12400164 0.07974449
0.13354298 0.09022088 0.09394088 0.06164263][0.06374149 0.06032105 0.19419075 0.1038898 0.10028428 0.06944635
0.18646018 0.08004788 0.08655625 0.05506198]
[0.0527867 0.055662 0.20487245 0.1102281 0.12172459 0.06421484
0.18759931 0.07394098 0.0826122 0.04635883]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss: 2.6437769325086995

ソフトmaxは正しいラベル上のonehot符号化分布と類似していることが望ましい。

私たちは上記のプロセスを1つの符号化で完了します.

W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

def train_step(X, Y, W1, b1, W2, b2, learning_rate=0.1, verbose=False):
    a1, cache1 = affine_layer_forward(X, W1, b1)
    z1 = sigmoid(a1)
    a2, cache2 = affine_layer_forward(z1, W2, b2)
    y_hat = softmax(a2)
    t = _change_one_hot_label(Y, 10)
    Loss = cross_entropy_error(y_hat, t)

    if verbose:
        print('---------')
        print(y_hat)
        print(t)
        print('Loss: ', Loss)
        
    dy = (y_hat - t) / X.shape[0]
    dz1, dW2, db2 = affine_layer_backward(dy, cache2)
    da1 = sigmoid_grad(a1) * dz1
    dX, dW1, db1 = affine_layer_backward(da1, cache1)
    
    W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)
    
    return W1, b1, W2, b2, Loss

X = x_train_reshaped[:5]
Y = y_train[:5]

# train_step을 다섯 번 반복 돌립니다.
for i in range(5):
    W1, b1, W2, b2, _ = train_step(X, Y, W1, b1, W2, b2, learning_rate=0.1, verbose=True)

5回学習したパラメータに基づいて精度を測定する。

def predict(W1, b1, W2, b2, X):
    a1 = np.dot(X, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    y = softmax(a2)

    return y

# X = x_train[:100] 에 대해 모델 추론을 시도합니다. 
X = x_train_reshaped[:100]
Y = y_test[:100]
result = predict(W1, b1, W2, b2, X)
result[0]

array([0.15564588, 0.1326148 , 0.03664334, 0.05631935, 0.1105738 ,
0.19383691, 0.04018136, 0.0572725 , 0.03751335, 0.17939871])

def accuracy(W1, b1, W2, b2, x, y):
    y_hat = predict(W1, b1, W2, b2, x)
    y_hat = np.argmax(y_hat, axis=1)

    accuracy = np.sum(y_hat == y) / float(x.shape[0])
    return accuracy
acc = accuracy(W1, b1, W2, b2, X, Y)

t = _change_one_hot_label(Y, 10)
print(result[0])
print(t[0])
print(acc)

[0.15564588 0.1326148 0.03664334 0.05631935 0.1105738 0.19383691
0.04018136 0.0572725 0.03751335 0.17939871][0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
0.06
現在の精度は10%未満です何度も繰り返す必要がありますよね?


def init_params(input_size, hidden_size, output_size, weight_init_std=0.01):

    W1 = weight_init_std * np.random.randn(input_size, hidden_size)
    b1 = np.zeros(hidden_size)
    W2 = weight_init_std * np.random.randn(hidden_size, output_size)
    b2 = np.zeros(output_size)

    print(W1.shape)
    print(b1.shape)
    print(W2.shape)
    print(b2.shape)
    
    return W1, b1, W2, b2

10,000回繰り返す5回より何度も繰り返す

# 하이퍼파라미터
iters_num = 10000  # 반복 횟수를 적절히 설정한다.
train_size = x_train.shape[0]
batch_size = 100   # 미니배치 크기
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

# 1에폭당 반복 수
iter_per_epoch = max(train_size / batch_size, 1)

W1, b1, W2, b2 = init_params(784, 50, 10)

for i in range(iters_num):
    # 미니배치 획득
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train_reshaped[batch_mask]
    y_batch = y_train[batch_mask]
    
    W1, b1, W2, b2, Loss = train_step(x_batch, y_batch, W1, b1, W2, b2, learning_rate=0.1, verbose=False)

    # 학습 경과 기록
    train_loss_list.append(Loss)
    
    # 1에폭당 정확도 계산
    if i % iter_per_epoch == 0:
        print('Loss: ', Loss)
        train_acc = accuracy(W1, b1, W2, b2, x_train_reshaped, y_train)
        test_acc = accuracy(W1, b1, W2, b2, x_test_reshaped, y_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

最後に、グラフを使用すると、トレーニングプロセスをより簡単に表示できます。損失と精度の変化を可視化します。


# Loss 그래프 그리기
x = np.arange(len(train_loss_list))
plt.plot(x, train_loss_list, label='train acc')
plt.xlabel("epochs")
plt.ylabel("Loss")
plt.ylim(0, 3.0)
plt.legend(loc='best')
plt.show()

マトリックス計算により簡単なニューラルネットワーク構造を検証した.Mnistはニューラルネットワークで解析するのに適しているように見えるので,簡単なニューラルネットワーク構造も精度がよい.

研究所AIFFEL Woozicheol制作の教育参考になります！(メールがないので、、、、単独で連絡網を共有することができません、ううう

Reference

この問題について(ニューラルネットワークを作る), 我々は、より多くの情報をここで見つけました https://velog.io/@hwanython/신경망-넘파이로-만들어-보기

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

expressドライバビュー

Alpine Linuxコンフィギュレーションテクニック【5 Mのみのオペレーティングシステム(回転)】