📉 Scikit-learn Models¶
In this tutorial, we'll explore some of the most commonly used machine learning models in Scikit-learn. We'll cover the following models:
- Linear Regression
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Machine (SVM) Classifier
Let's get started!
1. Load the data¶
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
data = load_iris()
X = data.data # Features
y = data.target # Target variable
# Split the data into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Create the Linear Regression model
linear_model = LinearRegression()
# Train the model on the training data
linear_model.fit(X_train, y_train)
# Make predictions on the test data
y_pred = linear_model.predict(X_test)
# Calculate the mean squared error to evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
Mean Squared Error: 0.04
2.2 Decision Tree Classifier¶
Decision Trees are widely used for classification tasks. They recursively split the data based on features to create a tree-like model for decision-making.
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Create the Decision Tree Classifier
dt_classifier = DecisionTreeClassifier()
# Train the classifier on the training data
dt_classifier.fit(X_train, y_train)
# Make predictions on the test data
y_pred = dt_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Accuracy: 1.00
2.3 Random Forest Classifier¶
Random Forest is an ensemble learning method that builds multiple decision trees and combines their predictions for improved performance.
from sklearn.ensemble import RandomForestClassifier
# Create the Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100)
# Train the classifier on the training data
rf_classifier.fit(X_train, y_train)
# Make predictions on the test data
y_pred = rf_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Accuracy: 1.00
2.4 Support Vector Machine (SVM) Classifier¶
SVM is a powerful algorithm used for classification tasks. It tries to find the best hyperplane that separates different classes in the feature space.
from sklearn.svm import SVC
# Create the SVM Classifier
svm_classifier = SVC(kernel='linear')
# Train the classifier on the training data
svm_classifier.fit(X_train, y_train)
# Make predictions on the test data
y_pred = svm_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Accuracy: 1.00
Remember that these are just basic examples to introduce you to the models. In real-world scenarios, you should perform hyperparameter tuning and cross-validation to get the best results.
Feel free to explore more models and algorithms available in Scikit-learn, such as Gradient Boosting, Neural Networks (using other libraries like TensorFlow or PyTorch), and more. Each model has its strengths and weaknesses, so understanding the problem you're trying to solve and selecting the right model is crucial for successful machine learning projects.
Keep experimenting and practicing with various datasets and algorithms to sharpen your machine learning skills! Happy coding!