Random Forest Classifier

Creates a random forest classification model.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.classification.RandomForestClassifier documentation.

Since: Seahorse 1.1.0


This operation does not take any input.


Port Type Qualifier Description
0EstimatorAn Estimator that can be used in a Fit operation.


Name Type Description
max depth Numeric The maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
max bins Numeric The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be >= 2 and >= number of categories in any categorical feature.
min instances per node Numeric The minimum number of instances each child must have after split. If a split causes the left or right child to have fewer instances than the parameter's value, the split will be discarded as invalid.
min information gain Numeric The minimum information gain for a split to be considered at a tree node.
max memory Numeric Maximum memory in MB allocated to histogram aggregation.
cache node ids Boolean The caching nodes IDs. Can speed up training of deeper trees.
checkpoint interval Numeric The checkpoint interval. E.g. 10 means that the cache will get checkpointed every 10 iterations.
classification impurity SingleChoice The criterion used for information gain calculation. Possible values: ["entropy", "gini"]
subsampling rate Numeric The fraction of the training data used for learning each decision tree.
seed Numeric The random seed.
num trees Numeric The number of trees to train.
feature subset strategy SingleChoice The number of features to consider for splits at each tree node. Possible values: ["auto", "onethird", "sqrt", "log2"]
label column SingleColumnSelector The label column for model fitting.
features column SingleColumnSelector The features column for model fitting.
probability column String The column for predicted class conditional probabilities.
raw prediction column String The raw prediction (confidence) column.
prediction column String The prediction column created during model scoring.