✍️ Exercise: Intro to MLFlow - Part II¶

Once we've learned how to log metrics, parameters, and artifacts, we can use MLFlow to track our experiments and compare different models. In this exercise, we'll use some fake data to train a linear regression model. We'll then use MLFlow to track the performance of the model and some relevant information about the training process.

In this part we will cover the following topics:

Create some Fake Data
Plot the Data using Matplotlib.
Split the data into training and testing sets.
Train a linear regression model.
Compute the accuracy the model.
Log the model using MLFlow.
Log the accuracy of the model using MLFlow.
Log the plotted data using MLFlow.

First we need some data to work with. Let's generate some fake data.

In [1]:

Copied!

import numpy as np

# Mocked data
X = np.random.rand(100, 1)  # Independent variable
y = 2 * X + np.random.randn(100, 1)  # Dependent variable with some noise
import numpy as np

# Mocked data
X = np.random.rand(100, 1)  # Independent variable
y = 2 * X + np.random.randn(100, 1)  # Dependent variable with some noise

Exercise I: Plot the Data using Matplotlib¶

¿Do you remember how we can plot data using Matplotlib? Let's do it! 🚀:

👉 We have X (our input) and y (our output). So we can simply plot the data using plt.scatter.
👉 Then we can save the plot using plt.savefig.

In [4]:

Copied!

import matplotlib.pyplot as plt

# 👇 Add the relevant code below to plot the data
import matplotlib.pyplot as plt

# 👇 Add the relevant code below to plot the data

Exercise II: Split the Data into Train and Test Sets¶

💡 Remember that we need to split our data into train and test sets. We can use the train_test_split function from sklearn.model_selection to do this. We should store the split into X_train, y_train, X_test, y_test.

In [ ]:

Copied!

from sklearn.model_selection import train_test_split

# 👇 Add the relevant code below to split the data into training and testing sets
from sklearn.model_selection import train_test_split

# 👇 Add the relevant code below to split the data into training and testing sets

Exercise III: Train a Linear Regression Model¶

Then, train a linear regression model using the scikit-learn library.

👉 Initialize the model calling the LinearRegression class.
👉 Train the model using the fit method.

In [3]:

Copied!

from sklearn.linear_model import LinearRegression

# Add code to train the model 👇
from sklearn.linear_model import LinearRegression

# Add code to train the model 👇

Exercise IV: Compute the Accuracy of the Model¶

Finally, compute the accuracy of the model using the mean_squared_error function from the sklearn.metrics module.

👉 Compute the predictions by passing the X_test to the predict method of the model.
👉 Compute the accuracy using the mean_squared_error function and passing the y_test and the predictions as arguments.
👉 Print the accuracy.

In [9]:

Copied!

from sklearn.metrics import mean_squared_error

# Add code to calculate the mean squared error 👇
from sklearn.metrics import mean_squared_error

# Add code to calculate the mean squared error 👇

Exercise V: Create a Run and log the model and metrics.¶

👉 Think. We've computed the mse of the model. ¿Would you log it as a parameter or as a metric?
👉 Think. We've created a plot. ¿What kind of data is it? ¿How would you log it?
👉 Log the model using the mlflow.sklearn.log_model function.
👉 Extra: Log the signature of the model.

In [ ]:

Copied!

import mlflow

EXPERIMENT_NAME = "intro-to-mlflow"

experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

with mlflow.start_run(
    experiment_id=experiment.experiment_id,
) as run:
    
    # Add code to log the model, the mean squared error, and the model parameters 👇

    pass
import mlflow

EXPERIMENT_NAME = "intro-to-mlflow"

experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

with mlflow.start_run(
    experiment_id=experiment.experiment_id,
) as run:
    
    # Add code to log the model, the mean squared error, and the model parameters 👇

    pass