Subscribe to our newsletter
📬 Receive new lessons straight to your inbox (once a month) and join 40K+ developers in learning how to responsibly deliver value with ML.
Hyperparameter tuning is the process of discovering a set of performant parameter values for our model. It can be a computationally involved process depending on the number of parameters, search space and model architectures. Hyperparameters don't just include the model's parameters but could also include parameters related to preprocessing, splitting, etc. When we look at all the different parameters that can be tuned, it quickly becomes a very large search space. However, just because something is a hyperparameter doesn't mean we need to tune it.
We want to optimize our hyperparameters so that we can understand how each of them affects our objective. By running many trials across a reasonable search space, we can determine near ideal values for our different parameters.
There are many options for hyperparameter tuning (Ray tune, Optuna, Hyperopt, etc.). We'll be using Ray Tune with it's HyperOpt integration for it's simplicity and general popularity. Ray Tune also has a wide variety of support for many other tune search algorithms (Optuna, Bayesian, etc.).
There are many factors to consider when performing hyperparameter tuning. We'll be conducting a small study where we'll tune just a few key hyperparameters across a few trials. Feel free to include additional parameters and to increase the number trials in the tuning experiment.
1
2 | # Number of trials (small sample)
num_runs = 2
|
We'll start with some the set up, data and model prep as we've done in previous lessons.
1
2
3
4
5 | from ray import tune
from ray.tune import Tuner
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.hyperopt import HyperOptSearch
|
1
2 | # Set up
set_seeds()
|
1
2
3 | # Dataset
ds = load_data()
train_ds, val_ds = stratify_split(ds, stratify="tag", test_size=test_size)
|
1
2
3
4
5
6 | # Preprocess
preprocessor = CustomPreprocessor()
train_ds = preprocessor.fit_transform(train_ds)
val_ds = preprocessor.transform(val_ds)
train_ds = train_ds.materialize()
val_ds = val_ds.materialize()
|
1
2
3
4
5
6
7
8
9 | # Trainer
trainer = TorchTrainer(
train_loop_per_worker=train_loop_per_worker,
train_loop_config=train_loop_config,
scaling_config=scaling_config,
datasets={"train": train_ds, "val": val_ds},
dataset_config=dataset_config,
preprocessor=preprocessor,
)
|
1
2
3
4
5 | # MLflow callback
mlflow_callback = MLflowLoggerCallback(
tracking_uri=MLFLOW_TRACKING_URI,
experiment_name=experiment_name,
save_artifact=True)
|
We can think of tuning as training across different combinations of parameters. For this, we'll need to define several configurations around when to stop tuning (stopping criteria), how to define the next set of parameters to train with (search algorithm) and even the different values that the parameters can take (search space).
We'll start by defining our CheckpointConfig and RunConfig as we did for training:
1
2
3
4
5
6 | # Run configuration
checkpoint_config = CheckpointConfig(num_to_keep=1, checkpoint_score_attribute="val_loss", checkpoint_score_order="min")
run_config = RunConfig(
callbacks=[mlflow_callback],
checkpoint_config=checkpoint_config
)
|
Notice that we use the same mlflow_callback from our experiment tracking lesson so all of our runs will be tracked to MLflow automatically.
Next, we're going to set the initial parameter values and the search algorithm (HyperOptSearch) for our tuning experiment. We're also going to set the maximum number of trials that can be run concurrently (ConcurrencyLimiter) based on the compute resources we have.
1
2
3
4 | # Hyperparameters to start with
initial_params = [{"train_loop_config": {"dropout_p": 0.5, "lr": 1e-4, "lr_factor": 0.8, "lr_patience": 3}}]
search_alg = HyperOptSearch(points_to_evaluate=initial_params)
search_alg = ConcurrencyLimiter(search_alg, max_concurrent=2)
|
Tip
It's a good idea to start with some initial parameter values that you think might be reasonable. This can help speed up the tuning process and also guarantee at least one experiment that will perform decently well.
Next, we're going to define the parameter search space by choosing the parameters, their distribution and range of values. Depending on the parameter type, we have many different distributions to choose from.
1
2
3
4
5
6
7
8
9 | # Parameter space
param_space = {
"train_loop_config": {
"dropout_p": tune.uniform(0.3, 0.9),
"lr": tune.loguniform(1e-5, 5e-4),
"lr_factor": tune.uniform(0.1, 0.9),
"lr_patience": tune.uniform(1, 10),
}
}
|
Next, we're going to define a scheduler to prune unpromising trials. We'll be using AsyncHyperBandScheduler (ASHA), which is a very popular and aggressive early-stopping algorithm. Due to our aggressive scheduler, we'll set a grace_period to allow the trials to run for at least a few epochs before pruning and a maximum of max_t epochs.
1
2
3
4
5 | # Scheduler
scheduler = AsyncHyperBandScheduler(
max_t=train_loop_config["num_epochs"], # max epoch (<time_attr>) per trial
grace_period=5, # min epoch (<time_attr>) per trial
)
|
Finally, we're going to define a TuneConfig that will combine the search_alg and scheduler we've defined above.
1
2
3
4
5
6
7
8 | # Tune config
tune_config = tune.TuneConfig(
metric="val_loss",
mode="min",
search_alg=search_alg,
scheduler=scheduler,
num_samples=num_runs,
)
|
And now, we'll pass in our trainer object with our configurations to create a Tuner object that we can run.
1
2
3
4
5
6
7 | # Tuner
tuner = Tuner(
trainable=trainer,
run_config=run_config,
param_space=param_space,
tune_config=tune_config,
)
|
1
2 | # Tune
results = tuner.fit()
|
1
2 | # All trials in experiment
results.get_dataframe()
|
And on our MLflow dashboard, we can create useful plots like a parallel coordinates plot to visualize the different hyperparameters and their values across the different trials.
And from these results, we can extract the best trial and its hyperparameters:
1
2
3 | # Best trial's epochs
best_trial = results.get_best_result(metric="val_loss", mode="min")
best_trial.metrics_dataframe
|
1
2 | # Best trial's hyperparameters
best_trial.config["train_loop_config"]
|
And now we'll load the best run from our experiment, which includes all the runs we've done so far (before and including the tuning runs).
1
2
3 | # Sorted runs
sorted_runs = mlflow.search_runs(experiment_names=[experiment_name], order_by=["metrics.val_loss ASC"])
sorted_runs
|
From this we can load the best checkpoint from the best run and evaluate it on the test split.
1
2
3
4
5
6 | # Evaluate on test split
run_id = sorted_runs.iloc[0].run_id
best_checkpoint = get_best_checkpoint(run_id=run_id)
predictor = TorchPredictor.from_checkpoint(best_checkpoint)
performance = evaluate(ds=test_ds, predictor=predictor)
print (json.dumps(performance, indent=2))
|
And, just as we did in previous lessons, use our model for inference.
1
2 | # Preprocessor
preprocessor = predictor.get_preprocessor()
|
1
2
3
4
5 | # Predict on sample
title = "Transfer learning with transformers"
description = "Using transformers for transfer learning on text classification tasks."
sample_df = pd.DataFrame([{"title": title, "description": description, "tag": "other"}])
predict_with_proba(df=sample_df, predictor=predictor)
|
Now that we're tuned our model, in the next lesson, we're going to perform a much more intensive evaluation on our model compared to just viewing it's overall metrics on a test set.
Upcoming live cohorts
Sign up for our upcoming live cohort, where we'll provide live lessons + QA, compute (GPUs) and community to learn everything in one day.
To cite this content, please use:
1
2
3
4
5
6 | @article{madewithml,
author = {Goku Mohandas},
title = { Tuning - Made With ML },
howpublished = {\url{https://madewithml.com/}},
year = {2023}
}
|