[ML] XGBoost

yozzum·2025년 2월 20일
0

Machine Learning

목록 보기
6/30

Overview

  • Parallel processing, data binning for efficient computing
  • Overfitting prevention: Regularization settings available
  • Constructs level-wise trees.

Spliting Thresholds

  • By default, XGBoost considers all the features for each split. However, by using hyperparameters such as colsample_bytree, colsample_bylevel, and colsample_bynode, it is possible to sample only a subset of features.
  • When searching for a split point (threshold), instead of examining every possible value individually, the data is divided into several bins, and the optimal split point is determined. This method is called the Histogram-based Split Finding Algorithm.
  • Spliting point is decided based on Gain which is calculated from Similarity Scores at each node.

Steps

  1. Residuals 계산
  2. Predicted value 계산 (initial default = 0.5)
  3. Split point 찾기
  4. "Similarity Score" for each node 계산
  5. "Gain" for each split using the similarity score 계산
  6. "Gain"이 가장 큰 feature와 threshold 계산
  7. Prune with "Gamma" (Gain - Gamma)
  8. Update the predicted probability for each observation
    • Convert the previous predicted probability into an output value using log(odds)
    • Update the output value for each node
    • Convert the output value into the predicted probability using logistic function
    • Build the second tree using the new predicted probability

Hyperparameters

Lambda(Regularization parametre)

  • reduces the similarity score → lower value of gain → prone to pruning with Gamma
  • reduces the output value → lower predicted probability

Alpha(Learning rate)

  • used to update the output value

Cover(min # residuals in a leaf)

  • used to prune a tree

Gamma

  • if Gain - Gamma is negative, prune the parent node

Steps in detail

profile
yozzum

0개의 댓글