{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Full Model Training" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the notebook that will guide you in using MLflow to register and track the training of a logistic regression model with the Titanic dataset. In this tutorial, we will explore how MLflow, an open-source platform for the machine learning lifecycle, can facilitate experiment tracking, metric monitoring, and model management in a reproducible environment.\n", "\n", "The Titanic dataset is a classic in the data science community, containing detailed information about passengers, such as their age, gender, ticket class, and whether they survived the disaster. We will use this information to build a logistic regression model capable of predicting the survival probability of a passenger based on their features.\n", "\n", "Throughout this notebook, we will not only focus on training the logistic regression model but also learn how to leverage the capabilities of MLflow. We will start by exploring the Titanic dataset, performing analysis and visualizations to better understand the distribution of variables and patterns present.\n", "\n", "Next, we will move on to the data preprocessing stage, where we will clean and transform the data to make it suitable for modeling. During this process, we will use MLflow to log the preprocessing steps, allowing us to have a complete and organized record of the transformations applied to the data.\n", "\n", "Then, we will dive into training the logistic regression using the training dataset. Here, MLflow will play a crucial role in logging training details, including the hyperparameters used, performance metrics, and other relevant aspects of the model. This will enable us to have a comprehensive overview of the process and facilitate comparison between different experiments and configurations.\n", "\n", "Once we have trained our model, we will use MLflow to register the model on the platform. This will allow us to have a complete record of the trained models, including the details of each model, such as the hyperparameters used, performance metrics, and associated source code. Additionally, MLflow will enable us to export the model in a standard format, making it easier to deploy in different environments.\n", "\n", "In summary, this notebook will provide you with the opportunity to work with MLflow and discover how this powerful tool can simplify and enhance the training process of a logistic regression model. Throughout the tutorial, you will learn to log experiments, metrics, and model details, allowing you to have a comprehensive and reproducible tracking of the entire process. So get ready to delve into Titanic data analysis while exploring the capabilities of MLflow in logistic regression training!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "\n", "from sklearn.preprocessing import LabelEncoder \n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", "import seaborn as sns\n", "import pandas as pd\n", "import mlflow\n", "\n", "from mlops_course.settings import config" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Get the Data\n", "\n", "We will load the dataset from a CSV file and display the first few rows to get an understanding of the data." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Read the data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "