โ๏ธ Exercise: Intro to MLFlow - Part IIยถ
Once we've learned how to log metrics, parameters, and artifacts, we can use MLFlow to track our experiments and compare different models. In this exercise, we'll use some fake data to train a linear regression model. We'll then use MLFlow to track the performance of the model and some relevant information about the training process.
In this part we will cover the following topics:
- Create some Fake Data
- Plot the Data using Matplotlib.
- Split the data into training and testing sets.
- Train a linear regression model.
- Compute the accuracy the model.
- Log the model using MLFlow.
- Log the accuracy of the model using MLFlow.
- Log the plotted data using MLFlow.
First we need some data to work with. Let's generate some fake data.
import numpy as np
# Mocked data
X = np.random.rand(100, 1) # Independent variable
y = 2 * X + np.random.randn(100, 1) # Dependent variable with some noise
Exercise I: Plot the Data using Matplotlibยถ
ยฟDo you remember how we can plot data using Matplotlib
? Let's do it! ๐:
- ๐ We have X (our input) and y (our output). So we can simply plot the data using
plt.scatter
. - ๐ Then we can save the plot using
plt.savefig
.
import matplotlib.pyplot as plt
# ๐ Add the relevant code below to plot the data
Exercise II: Split the Data into Train and Test Setsยถ
๐ก Remember that we need to split our data into train and test sets. We can use the train_test_split
function from sklearn.model_selection
to do this. We should store the split into X_train
, y_train
, X_test
, y_test
.
from sklearn.model_selection import train_test_split
# ๐ Add the relevant code below to split the data into training and testing sets
Exercise III: Train a Linear Regression Modelยถ
Then, train a linear regression model using the scikit-learn library.
- ๐ Initialize the model calling the
LinearRegression
class. - ๐ Train the model using the
fit
method.
from sklearn.linear_model import LinearRegression
# Add code to train the model ๐
Exercise IV: Compute the Accuracy of the Modelยถ
Finally, compute the accuracy of the model using the mean_squared_error
function from the sklearn.metrics
module.
- ๐ Compute the predictions by passing the
X_test
to thepredict
method of the model. - ๐ Compute the accuracy using the
mean_squared_error
function and passing they_test
and thepredictions
as arguments. - ๐ Print the accuracy.
from sklearn.metrics import mean_squared_error
# Add code to calculate the mean squared error ๐
Exercise V: Create a Run and log the model and metrics.ยถ
- ๐ Think. We've computed the mse of the model. ยฟWould you log it as a parameter or as a metric?
- ๐ Think. We've created a plot. ยฟWhat kind of data is it? ยฟHow would you log it?
- ๐ Log the model using the
mlflow.sklearn.log_model
function. - ๐ Extra: Log the signature of the model.
import mlflow
EXPERIMENT_NAME = "intro-to-mlflow"
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
with mlflow.start_run(
experiment_id=experiment.experiment_id,
) as run:
# Add code to log the model, the mean squared error, and the model parameters ๐
pass