โ๏ธ Exercise: Intro to MLFlow - Part IIยถ
Once we've learned how to log metrics, parameters, and artifacts, we can use MLFlow to track our experiments and compare different models. In this exercise, we'll use some fake data to train a linear regression model. We'll then use MLFlow to track the performance of the model and some relevant information about the training process.
In this part we will cover the following topics:
- Create some Fake Data
- Plot the Data using Matplotlib.
- Split the data into training and testing sets.
- Train a linear regression model.
- Compute the accuracy the model.
- Log the model using MLFlow.
- Log the accuracy of the model using MLFlow.
- Log the plotted data using MLFlow.
First we need some data to work with. Let's generate some fake data.
import numpy as np
# Mocked data
X = np.random.rand(100, 1) # Independent variable
y = 2 * X + np.random.randn(100, 1) # Dependent variable with some noise
Exercise I: Plot the Data using Matplotlibยถ
ยฟDo you remember how we can plot data using Matplotlib? Let's do it! ๐:
- ๐ We have X (our input) and y (our output). So we can simply plot the data using
plt.scatter. - ๐ Then we can save the plot using
plt.savefig.
import matplotlib.pyplot as plt
# ๐ Add the relevant code below to plot the data
Exercise II: Split the Data into Train and Test Setsยถ
๐ก Remember that we need to split our data into train and test sets. We can use the train_test_split function from sklearn.model_selection to do this. We should store the split into X_train, y_train, X_test, y_test.
from sklearn.model_selection import train_test_split
# ๐ Add the relevant code below to split the data into training and testing sets
Exercise III: Train a Linear Regression Modelยถ
Then, train a linear regression model using the scikit-learn library.
- ๐ Initialize the model calling the
LinearRegressionclass. - ๐ Train the model using the
fitmethod.
from sklearn.linear_model import LinearRegression
# Add code to train the model ๐
Exercise IV: Compute the Accuracy of the Modelยถ
Finally, compute the accuracy of the model using the mean_squared_error function from the sklearn.metrics module.
- ๐ Compute the predictions by passing the
X_testto thepredictmethod of the model. - ๐ Compute the accuracy using the
mean_squared_errorfunction and passing they_testand thepredictionsas arguments. - ๐ Print the accuracy.
from sklearn.metrics import mean_squared_error
# Add code to calculate the mean squared error ๐
Exercise V: Create a Run and log the model and metrics.ยถ
- ๐ Think. We've computed the mse of the model. ยฟWould you log it as a parameter or as a metric?
- ๐ Think. We've created a plot. ยฟWhat kind of data is it? ยฟHow would you log it?
- ๐ Log the model using the
mlflow.sklearn.log_modelfunction. - ๐ Extra: Log the signature of the model.
import mlflow
EXPERIMENT_NAME = "intro-to-mlflow"
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
with mlflow.start_run(
experiment_id=experiment.experiment_id,
) as run:
# Add code to log the model, the mean squared error, and the model parameters ๐
pass