Grid Search

The Grid Search is a powerful way to optimize the process of fitting Estimators. Parameters that describe how the learning process should be performed are vital to the quality of the resulting model. Unfortunately, it’s often difficult to guess what the best values for them are.

The Grid Search operation allows us to specify a set of values for the parameters of the input Estimator. The operation then goes through every combination of parameters from specified sets and for each one the Estimator is fitted and the resulting trained model is evaluated by means of cross validation.

The goal of Grid Search is to choose the best combination of parameters, where “best” is defined as having received the highest grade from the Evaluator.

In order to grade a particular combination of parameters, the Estimator is fitted number of folds times. In each “round” of training, the input dataset is divided into training and test parts. The model fitted on the training data is used to score the test part of the dataset. This score is evaluated and the final grade of the parameter combination is the average score from all folds.

The result of the Grid Search operation is a Report in which every combination of parameters is graded by the Evaluator.

Parameters of the Grid Search operation mirror the parameters of its input Estimator, but some of them accept multiple, comma-separated values. These special parameters are marked with , as in the following example:

Note that the Grid Search is an expensive operation. Selecting 5 values for 5 parameters results in 25 models being cross validated.

Since: Seahorse 1.0.0


In the following case, the Grid Search operation is used to determine the best parameters for training a Random Forest Regression model.

number of folds has to be at least 2, but higher values make model evaluation more accurate.

In the PARAMETERS OF INPUT ESTIMATOR section of Grid Search’s parameters, we specify the parameter values.

Let’s set max depth to 10, 20, 30, max bins to 32, 40, 50 and num trees to 10, 50, 100. This yields 27 distinct combinations of parameters.

In the report below, every parameter combination is listed along with its grade.


Port Type Qualifier Description
0 Estimator The Estimator to be fitted and evaluated.
1 DataFrame The DataFrame on which the estimator will be fitted and evaluated.
2 Evaluator The Evaluator that evaluates the fitted model.


Port Type Qualifier Description
0 Report A Report about the search. Contains a score for every set of parameters that the search went through.


Name Type Description
number of folds Numeric A property of Grid Search's internal cross validator. Describes how many times the input dataset should be partitioned into training and test datasets.