Creates a random forest classification model.
This operation is ported from Spark ML.
For a comprehensive introduction, see Spark documentation.
For scala docs details, see org.apache.spark.ml.classification.RandomForestClassifier documentation.
Since: Seahorse 1.1.0
This operation does not take any input.
Port | Type Qualifier | Description |
---|---|---|
0 | Estimator | An Estimator that can be used in a Fit operation. |
Name | Type | Description |
---|---|---|
max depth |
Numeric |
The maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. |
max bins |
Numeric |
The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be >= 2 and >= number of categories in any categorical feature. |
min instances per node |
Numeric |
The minimum number of instances each child must have after split. If a split causes the left or right child to have fewer instances than the parameter's value, the split will be discarded as invalid. |
min information gain |
Numeric |
The minimum information gain for a split to be considered at a tree node. |
max memory |
Numeric |
Maximum memory in MB allocated to histogram aggregation. |
cache node ids |
Boolean |
The caching nodes IDs. Can speed up training of deeper trees. |
checkpoint interval |
Numeric |
The checkpoint interval. E.g. 10 means that the cache will get checkpointed every 10 iterations. |
classification impurity |
SingleChoice |
The criterion used for information gain calculation. Possible values: ["entropy", "gini"] |
subsampling rate |
Numeric |
The fraction of the training data used for learning each decision tree. |
seed |
Numeric |
The random seed. |
num trees |
Numeric |
The number of trees to train. |
feature subset strategy |
SingleChoice |
The number of features to consider for splits at each tree node. Possible values: ["auto", "onethird", "sqrt", "log2"] |
label column |
SingleColumnSelector |
The label column for model fitting. |
features column |
SingleColumnSelector |
The features column for model fitting. |
probability column |
String |
The column for predicted class conditional probabilities. |
raw prediction column |
String |
The raw prediction (confidence) column. |
prediction column |
String |
The prediction column created during model scoring. |