Decision Tree Regression

Creates a decision tree regression model. It supports both continuous and categorical features.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see documentation.

Since: Seahorse 1.1.0


This operation does not take any input.


Port Type Qualifier Description
0EstimatorAn Estimator that can be used in a Fit operation.


Name Type Description
max depth Numeric The maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
max bins Numeric The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be >= 2 and >= number of categories in any categorical feature.
min instances per node Numeric The minimum number of instances each child must have after split. If a split causes the left or right child to have fewer instances than the parameter's value, the split will be discarded as invalid.
min information gain Numeric The minimum information gain for a split to be considered at a tree node.
max memory Numeric Maximum memory in MB allocated to histogram aggregation.
cache node ids Boolean The caching nodes IDs. Can speed up training of deeper trees.
checkpoint interval Numeric The checkpoint interval. E.g. 10 means that the cache will get checkpointed every 10 iterations.
seed Numeric The random seed.
regression impurity SingleChoice The criterion used for information gain calculation. Possible values: ["variance"]
label column SingleColumnSelector The label column for model fitting.
features column SingleColumnSelector The features column for model fitting.
prediction column String The prediction column created during model scoring.