Day 29

AI Engineering Course Log·2023년 6월 19일

Descriptive Feature ID3 Information Gain ai engineering course cart decision tree entropy impurity metrics

0

road to AI Engineering

목록 보기

27/83

Decision Tree
root node에서 선택되는 feature가 뭔지에 따라 깊이가 달라짐(tree 길이). 어떤 피쳐를 가장 먼저 쓰느냐가 depth를 결정하는 큰 요소.
가장 중요한것: root node에서 어떤 descriptive feature 사용하느냐.
Entropy:
불확실성을 판단하는 척도
엔트로피가 클수록 불확실성이 높은것.
*정보량이 크다: 일어나는 빈도수가 작은 정보
of surprise??

I(x) = log21/p(x) = -log2p(x)

Impurity Metrics
impurity, heterogineity의 측정
decision tree의 분기는 impurity가 작은 방향으로 진행된다.

가지치고 나간 데이터셋에 가중치(weighting) 부여

information gain
클수록 인포메이션 게인이 크게 작용했다는 것이니까.
탑다운으로 적용이 됨
-> Entropy of Original dataset is 1.
-> a's E = Entropy after dataset classified with "Suspicious Words" as Root Node = -0.0
b's E = Entropy after dataset classified with "Unknown Sender" as Root Node = 0.9182958340544896
c's E = Entropy after dataset classified with "Contains Images" as Root Node = 1.0

-> a's I = Entropy of Original Dataset - a's E
b's I = Entropy of Original Dataset - b's E
c's I = Entropy of Original Dataset - c's E

ID3 Algorithm
*Iterative Dichotomizer3
CART
Entropy based I.

<Deciding which desriptive feature should be used as the Root Node>

-> Choosing Elevation since it has smallest information gain

<Deciding which desriptive feature should be used as the First Interior Node>

-> Choosing "Slope" as the first interior node for "High" since it has smallest information gain, and "Stream" as the first interior node for "Medium".

<Final Decision Tree for the vegetation dataset>

AI Engineering Course Log

이전 포스트

Day 28

다음 포스트

Day 30

0개의 댓글