ディープラーニングジョージ-CS 231 nラーニング(図4)


日曜日の午前中、また流行病が私の日常生活を襲った.
しばらく静かに過ごすと、お小遣いの喫茶店で夜の社員の感染が.
先週自己隔離を解除してKF 94で食べたくない
本人は風邪を引かないと言っていたのに、それぞれ少し離れた食事をして、感染してしまいました.
私が伝染したのか、それとも他の場所からの伝染なのか分かりません.
急に夜のバイトをした.
最近の健康的な生活パターンは正常化する必要があるからだ.
朝早く起きて美善はカフェで勉強しました.
幸い今日までに提出した課題も十分にこなしており、授業内容をゆっくり復習するためにカフェ開業前に近くの他のカフェで勉強していました.

Review of Lecture 1 - 3


(1) Defining a Classifier

s = f(x;W) = Wx  
function f
parameterized by weights W
taking data x as input
vector of scores s as output for each of the classes (in classification)

(2) Loss Function (SVM)


Loss function : quantifies how happy/unhappy we are with the scores (output)
Li = Sigma ~~. 이거 다시 적기


combination of data term L
regularization term that expresses how simple out model is (preference for simpler models for better generization)
parameters
we want parameters w - that correspond to lowest loss (minimizing loss function)
for this, we must find gradient L with respect to W :

ここでWについては概ねWのLの傾きについてです.
概ねx,y軸図と考えれば,x軸に異なるw値がある場合,y軸Lの最大値を見つけることができる.
難しい概念ではありませんが、英語で考えるとちょっと混同します

(3) Optimization


iteratively taking steps in direction of steepest descent (negative of gradient) ,
to find the point with lowest loss.
なぜ-勾配損失が最小ですか?減らすって言ってるの?
gradient! この特定の要素がfinal outputに与える影響を表示できます.
Gradient descent

Numerical Gradient : slow, approximate, easy to write
Analytic Gradient : fast, exact, error-prone :(
In practice -> Derive analytic gradient, check your implementation with numerical gradient
それぞれの長所と短所を考慮して、完璧な方向に進みます!

Lecture 4 - Back Propogation & Neural Networks


(1) Back Propogation


1. Computational Graph



Using a graph to represent any function, as nodes - steps of functions we go through
  • : f=Wx
    hinge loss : computing data loss Li
    Regularization : R(W)
    L : sum of regularization term & data term
  • with this, back propogation is possible!

    2. Back Propogation


    revursively using the chain rule in order to compute the gradient with respect to every variable in the computational graph


    input has to go through many transitions through deep learning models to

    how it works - example (1)


    represent function f with a computational graph first node : x + y second node : (firt node's output / intermediate value : x+y)*z given the values of the variables, computational graph give every intermediate variable a name q : intermediate variable after the plus node = x+y f = qz q右側はxとyに対するqの勾配である fの右側では,fのzとqに対する勾配 what we want - gradients of f (the entire computation) / x, y, z Backprop - recursive application of chain rule start from the back, compute all the gradients along the way backwards. second node backprop : df/dz = q, df/dq = z first node backprop : dq/dx = 1, dq/dy = 1 df/dq = -4 df/dy = ? y is not connected directly to f. to find the effect of y on f, we leverage the chain rule. df/dy = df/dq dq/dy = df/dq 1 = z * 1 = z what we can notice? the change of y - effect on q will be 1 (same) - effect on f will be -4! df/dy = -4 df/dx = ? same procedure - df/dx = df/dq dq/dx = df/dq 1 = z * 1 = z df/dx = -4 basic idea of back propagation each node is only aware of its immediate surrounding at initial state local inputs connected to nodes and direct output from node from this, "local gradient" are found. さっきの例では、mediate valueq of firstノードはx、y、secondノードに直接接続されています。 従って勾配dq/dx,dq/dyは局所図である ... 微分です~もちろん..。後で数式が複雑になると、そのまま微分では解決しにくいので、コンピュータで書かなければなりませんね。 during back prop - at each node we have the upstream gradients coming back. how df/dq was done first, and passed back gradient of Final loss L, with respect to just before the node is calculated at every node's direct back propogation つまり、ずっと後ろに乗る~~chain rule,so EZ how it works - example (2) write out as computational graph make forward pass, and make up the values of at every stage of computation backward propogate using calculus & chain rule 3. Patterns in Backward Flow add gate : gradient distributor - takes the upstream gradient and passes it on . ex) addition node max gate : gradient router - one takes the full value of gradient just passed back (1) , the other takes gradient of zero (0) why? In forward pass, only the maximum value gets passed down to rest of computation and thus affecting the function computation - In passing back, we want to adjust back propogation, to flow it through that same branch of computation mul gate : gradient switcher - takes upstream gradient and switch/scale it by the value of the other branch local gradient is just the value of the other variable 4. Back Propogation and Optimization By being able to compute gradients, we can apply it in optimization to update our parameters (weights, biases ... ) ! df/dx = sum of df/dq*dq/dx (q being intermediate nodes local output values) 5. Back Propogation with Vectors everything is same - except, the gradients will be Jacobian Matrices how it works - example (1) x : n-dimensional W : n * n Computational Graph q = intermediate value after first node () backprop - calculus & local derivative (1) intermediate node - derivative of f with respect to qi (2) W - derivative of qk with respect to W chain rule! derivative of f with respect to W (3) x - derivative of q with respect to xi, chain rule! derivative of f with respect to xi 6. Modularized Implementation implement forward pass, cache the values implement backward pass (2) Neural Networks 1. 2-layer Neural Network Linear Score Function multiple-layer Neural Network - 2-layer NN a neural network - with a form of 2 linear score functions stacked together input - matrix 1 = x*W1, linear first layer - (matrix 1 = x* W1, value of scores h) non-linearity implemented directly before h h - max(x*W1) second layer - (h, s) ex/max of zero output - score function, linear Classifying Horses Example W1 - high score for left-facing horses ? W2 - ...何を言っているのか説明します。 3-layer NN like this... deep NNs can be made ~ yay 2. Biological Inspiration of Deep NNs - Neurons neuron neurons connected together Dendrites : Impulses are received (inputs) Cell Body : Integrates Impulses/Signals (Computing) Axon : Integrated Impulses are carried away from the Cell Body to Downstream Neurons neural networks Synapse connects Multiple Neurons - Dendrites integrate all the information together in cell body - Output carried on the Output Layer - 衝撃波...Neurons act most similarly to ReLUs... (ReLU non-linearity)