Gradient-Boosted Trees (GBTs) is a learning algorithm for regression. It supports both continuous and categorical features.
This operation is ported from Spark ML.
For a comprehensive introduction, see Spark documentation.
For scala docs details, see org.apache.spark.ml.regression.GBTRegressor documentation.
Since: Seahorse 1.0.0
This operation does not take any input.
Port | Type Qualifier | Description |
---|---|---|
0 | Estimator | An Estimator that can be used in a Fit operation. |
Name | Type | Description |
---|---|---|
regression impurity |
SingleChoice |
The criterion used for information gain calculation. Possible values: ["variance"] |
loss function |
SingleChoice |
The loss function which GBT tries to minimize. Possible values: ["squared", "absolute"] |
max bins |
Numeric |
The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be >= 2 and >= number of categories in any categorical feature. |
max depth |
Numeric |
The maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. |
max iterations |
Numeric |
The maximum number of iterations. |
min information gain |
Numeric |
The minimum information gain for a split to be considered at a tree node. |
min instances per node |
Numeric |
The minimum number of instances each child must have after split. If a split causes the left or right child to have fewer instances than the parameter's value, the split will be discarded as invalid. |
seed |
Numeric |
The random seed. |
step size |
Numeric |
The step size to be used for each iteration of optimization. |
subsampling rate |
Numeric |
The fraction of the training data used for learning each decision tree. |
label column |
SingleColumnSelector |
The label column for model fitting. |
features column |
SingleColumnSelector |
The features column for model fitting. |
prediction column |
String |
The prediction column created during model scoring. |