The Grid Search
is a powerful way to optimize the process of fitting
Estimators.
Parameters that describe how the learning process should be performed are vital
to the quality of the resulting model. Unfortunately, it’s often difficult to guess
what the best values for them are.
The Grid Search
operation allows us to specify a set of values for the parameters of the input
Estimator
. The operation then goes through every combination of parameters from specified sets
and for each one the Estimator
is fitted and the resulting trained model is evaluated
by means of cross validation.
The goal of Grid Search
is to choose the best combination of parameters, where “best”
is defined as having received the highest grade from the Evaluator.
In order to grade a particular combination of parameters, the Estimator
is fitted
number of folds
times. In each “round” of training, the input dataset is divided
into training and test parts. The model fitted on the training data is used to score
the test part of the dataset. This score is evaluated and the final grade of the
parameter combination is the average score from all folds.
The result of the Grid Search
operation is a Report in which
every combination of parameters is graded by the Evaluator
.
Parameters of the Grid Search
operation mirror the parameters of its input Estimator
, but some
of them accept multiple, comma-separated values. These special parameters are marked with
, as in the following example:
Note that the Grid Search
is an expensive operation. Selecting 5 values for 5 parameters results
in 25 models being cross validated.
Since: Seahorse 1.0.0
In the following case, the Grid Search
operation is used to determine the best parameters
for training a Random Forest Regression model.
number of folds
has to be at least 2
, but higher values make model evaluation more accurate.
In the PARAMETERS OF INPUT ESTIMATOR section of Grid Search
’s parameters, we specify the parameter values.
Let’s set max depth
to 10, 20, 30
, max bins
to 32, 40, 50
and num trees
to 10, 50, 100
.
This yields 27 distinct combinations of parameters.
In the report below, every parameter combination is listed along with its grade.
Port | Type Qualifier | Description |
---|---|---|
0
|
Estimator
|
The Estimator to be fitted and evaluated. |
1
|
DataFrame
|
The DataFrame on which the estimator will be fitted and evaluated. |
2
|
Evaluator
|
The Evaluator that evaluates the fitted model. |
Port | Type Qualifier | Description |
---|---|---|
0
|
Report
|
A Report about the search. Contains a score for every set of parameters that the search went through. |
Name | Type | Description |
---|---|---|
number of folds
|
Numeric
|
A property of Grid Search 's internal cross validator.
Describes how many times the input dataset should be partitioned into training and test datasets.
|