The minimalist’s guide to experiment tracking with DVC | by Eryk Lewinson | May, 2023

A.I. Black GuyMay 15, 2023

0 1 5 minutes read

The minimalist’s guide to experiment tracking with DVC | by Eryk Lewinson | May, 2023

The bare minimum guide to get you started with experiment tracking

This article is the third part of a series demonstrating how to utilize DVC and its VS Code extension for ML experimentation. In the first part, I illustrated the entire setup of an ML project and demonstrated how to track and evaluate experiments within VS Code. In the second part, I showed how to use different types of plots, including live-plots, for experiment evaluation.

After reading these articles, you may be interested in using DVC for your next project. However, you may have thought that setting it up would require a lot of work, for example, with defining pipelines and versioning data. Perhaps for your next quick experiment, this would be an overkill, and you decided not to give it a try. That would be a pity!

And while there is a very good reason for having all of those steps there — your project will be fully tracked and reproducible —I understand that sometimes we are under a lot of pressure and need to experiment and iterate quickly. That is why in this article I will show you the bare minimum that is required to start tracking your experiments with DVC.

Before we dive into coding, I wanted to provide a bit more context about the toy example we will be using. The goal is to build a model that will identify fraudulent credit card transactions. The dataset (available on Kaggle) can be considered highly imbalanced, with only 0.17% of the observations belonging to the positive class.

As I promised in the introduction, we will cover the bare minimum scenario in which you can almost immediately start tracking your experiments. Besides some standard libraries, we will be using the dvc and dvclive libraries, as well as the DVC VS Code extension. The last one is not a hard requirement. We can inspect the tracked experiments from the command line. However, I prefer to use the special tabs integrated into the IDE.

Let’s start by creating the bare-bones script. In this short script, we load the data, split it into training and test sets, fit the model, and evaluate its performance on the test set. You can see the entire script in the snippet below.

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import f1_score, precision_score, recall_score

# set the paramstrain_params = {“n_estimators”: 10,”max_depth”: 10,}

# load datadf = pd.read_csv(“data/creditcard.csv”)X = df.drop(columns=[“Time”]).copy()y = X.pop(“Class”)

# train-test splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# fit-predictmodel = RandomForestClassifier(random_state=42, **train_params)model.fit(X_train, y_train)y_pred = model.predict(X_test)

# evaluateprint(“recall”, recall_score(y_test, y_pred))print(“precision”, precision_score(y_test, y_pred))print(“f1_score”, f1_score(y_test, y_pred))

Running the script returns the following output:

recall 0.7755102040816326precision 0.926829268292683f1_score 0.8444444444444446

I don’t think I need to convince you that writing down those numbers on a piece of paper or in a spreadsheet is not the best way to track your experiments. This is especially true because we not only need to track the output, but it also crucial to know which code and potentially hyperparameters resulted in that score. Without knowing that, we can never reproduce the results of our experiments.

Having said that, let’s implement experiment tracking with DVC. First, we need to initialize DVC. We can do so by running the following code in the terminal (within our project’s root directory).

dvc initgit add -Agit commit -m “initialize DVC”

Then, we need to slightly modify our code using dvclive.

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import f1_score, precision_score, recall_scorefrom dvclive import Live

# set the paramstrain_params = {“n_estimators”: 10,”max_depth”: 10,}

# load datadf = pd.read_csv(“data/creditcard.csv”)X = df.drop(columns=[“Time”]).copy()y = X.pop(“Class”)

# train-test splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# fit-predictmodel = RandomForestClassifier(random_state=42, **train_params)model.fit(X_train, y_train)y_pred = model.predict(X_test)

# evaluatewith Live(save_dvc_exp=True) as live:for param_name, param_value in train_params.items():live.log_param(param_name, param_value)live.log_metric(“recall”, recall_score(y_test, y_pred))live.log_metric(“precision”, precision_score(y_test, y_pred))live.log_metric(“f1_score”, f1_score(y_test, y_pred))

The only part that has changed is the evaluation. Using the Live context, we are logging the parameters of the model (stored in the train_params dictionary) and the same scores that we have printed before. We can track other things as well, for example, plots or images. To help you get started even faster, you can find a lot of useful code snippets in the documentation of dvclive or on the Setup screen of the DVC extension.

Before looking into the results, it makes sense to mention that dvclive expects each run to be tracked by Git. This means that it will save each run to the same path and overwrite the results each time. We specified save_dvc_exp=True to auto-track as a DVC experiment. Behind the scenes, DVC experiments are Git commits that DVC can identify, but at the same time, they do not clutter our Git history or create extra branches.

After running our modified script, we can inspect the results in the Experiments panel of the DVC extension. As we can see, the scores match the ones we have manually printed into the console.

To clearly see the benefits of setting up our tracking, we can quickly run another experiment. For example, let’s say we believe that we should decrease the max_depth hyperparameter to 5. To do this, we simply change the value of the hyperparameter in the train_params dictionary and run the script again. We can then immediately see the results of the new experiment in the summary table. Additionally, we can see which combination of hyperparameters resulted in that score.

Nice and simple! Naturally, the simplified example we have presented can be easily extended. For example, we could:

Track plots and compare the experiments using, for example, their ROC curves.Add a DVC pipeline to ensure the reproducibility of each step of our project (loading data, processing, splitting, etc.).Use a params.yaml file to parameterize all steps in our pipeline, including the training of an ML model.Use DVC callbacks. In our example, we have manually stored information about the model’s hyperparameters and its performance. For frameworks such as XGBoost, LightGBM, or Keras, we could use callbacks that store all that information for us automatically.

In this article, we explored the simplest experimentation tracking setup using DVC. I know that it can be daunting to start a project and already think about data versioning, reproducible pipelines, and so on. However, using the approach described in this article, we can start tracking our experiments with as little overhead as possible. While for larger projects, I would still highly encourage using all the tools that ensure reproducibility, for smaller ad-hoc experiments, this approach is definitely more appealing.

As always, any constructive feedback is more than welcome. You can reach out to me on Twitter or in the comments. You can find all the code used for this article in this repository.

Liked the article? Become a Medium member to continue learning by reading without limits. If you use this link to become a member, you will support me at no extra cost to you. Thanks in advance and see you around!

You might also be interested in one of the following:

Source link