GBT Classifier

Gradient-Boosted Trees (GBTs) is a learning algorithm for classification. It supports binary labels, as well as both continuous and categorical features. Note: Multiclass labels are not currently supported.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see documentation.

Since: Seahorse 1.0.0


This operation does not take any input.


Port Type Qualifier Description
0EstimatorAn Estimator that can be used in a Fit operation.


Name Type Description
classification impurity SingleChoice The criterion used for information gain calculation. Possible values: ["entropy", "gini"]
loss function SingleChoice The loss function which GBT tries to minimize. Possible values: ["logistic"]
max bins Numeric The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be >= 2 and >= number of categories in any categorical feature.
max depth Numeric The maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
max iterations Numeric The maximum number of iterations.
min information gain Numeric The minimum information gain for a split to be considered at a tree node.
min instances per node Numeric The minimum number of instances each child must have after split. If a split causes the left or right child to have fewer instances than the parameter's value, the split will be discarded as invalid.
seed Numeric The random seed.
step size Numeric The step size to be used for each iteration of optimization.
subsampling rate Numeric The fraction of the training data used for learning each decision tree.
label column SingleColumnSelector The label column for model fitting.
features column SingleColumnSelector The features column for model fitting.
prediction column String The prediction column created during model scoring.