Gradientboost:
Adaboost was further developed as a numerical optimization problem where the objective is to minimize the loss of the model by adding weak learners using a gradient descent like procedure.These class of algorithms were described as a stage-wise additive model. This is because one new weak learner is added at a time and existing weak learners in the model are frozen and left unchanged.Gradient boosting involves three elements:
1. A loss function to be optimized. The loss function used depends on the type of problem being solved. It must be differentiable, but many standard loss functions are supported and you can define your own.
2. A weak learner to make predictions. Decision trees are used as the weak learner in gradient boosting. Trees are constructed in a greedy manner, choosing the best split points based on purity scores like Gini or to minimize the loss.
3. An additive model to add weak learners to minimize the loss function. Trees are added one at a time, and existing trees in the model are not changed. A gradient descent procedure is used to minimize the loss when adding trees. After calculating the loss, to perform the gradient descent procedure, we must add a tree to the model that reduces the loss (i.e. follow the gradient).
Tuned Parameters:
- n_estimators=100
- learning_rate=0.1
- max_depth=4
- loss='ls'
What do these parameters mean?
loss: {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’) loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).
learning_rate: float, optional (default=0.1) learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators: int (default=100) The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
No comments:
Post a Comment