I helped by StatQuest Youtube Playlist

Purpose
1. Assumption of Backprop
In calculus, Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k
f(x+Δx)=k=0∑Nk!f(k)(x)Δxk
The first-order Taylor polynomial is the linear approximation of the function, and the second-order Taylor polynomial is often referred to as the quadratic approximation.
Motivation : Linear approximation
If a real-valued function
f(x) is differentiable at the point
x=a, then it has a linear approximation near this point. This means that there exists a function h1(x) such that
f(x)=f(a)+f′(a)(x−a)+h1(x)(x−a),x→alimh1(x)=0
Taylor's theorem
Let k ≥ 1 be an integer
and let the function f : R → R be k times differentiable at the point a ∈ R.
Then there exists a function hk : R → R such that
f(x)=f(a)+f′(a)(x−a)+2!f′′(a)(x−a)2+⋯+k!f(k)(a)(x−a)k+hk(x)(x−a)k
and
x→alimhk(x)=0
2. how to use?
backprop 알고리즘은 다음의 조건에서 쉽게 사용 가능하다.
- 임의의 feed-forward 신경망
- 임의의 미분 가능한 비선형 활성 함수
예를 들어 i.i.d 데이터를 이용한 MLE 함수를 사용할 수 있다.
이 때 에러 함수는 모든 관찰 데이터의 에러의 합으로 표현 가능하다.
Etotal(w)=n=1∑NEn(w)