Call a Model API¶

Once your model is deployed, you can call it using the requests library. MLFlow builds an API endpoint for your model, the endpoints are the following:

POST /invocations: An inference endpoint that accepts POST requests with input data and returns predictions.
GET /ping: Used for health checks.
GET /health: Same as /ping
GET /version: Returns the MLflow version.

Check the model status¶

First, we need to check if the model is running. We can do this by calling the /ping endpoint.

In [1]:

Copied!

import requests

ENDPOINT = "http://localhost:5001/ping"

response = requests.get(ENDPOINT)
response
import requests

ENDPOINT = "http://localhost:5001/ping"

response = requests.get(ENDPOINT)
response

Out[1]:

<Response [200]>

Check the model version¶

We can also check the model version by calling the /version endpoint.

In [2]:

Copied!

import requests

ENDPOINT = "http://127.0.0.1:5001/version"

response = requests.get(ENDPOINT)
response.text
import requests

ENDPOINT = "http://127.0.0.1:5001/version"

response = requests.get(ENDPOINT)
response.text

Out[2]:

'2.17.0'

Call the model¶

The model is ready to receive requests. We can call the /invocations endpoint with the input data to get the predictions. Let's go step by step:

1. Define the endpoint URL.¶

In [3]:

Copied!

# Define the URL and payload (JSON data)
ENDPOINT = 'http://localhost:5001/invocations'
# Define the URL and payload (JSON data)
ENDPOINT = 'http://localhost:5001/invocations'

2. Prepare the data to be sent¶

Now we prepare the data to be sent to the model. When sending data remember we have 2 parts:

Body: also called payload, it is the data we want to send to the model. The data should be in JSON format (in python, it is a dictionary). The JSON has a single key inputs and the value is a list of vectors. Each vector has 8 elements, those are the features of the model. The model will return a list of predictions, one for each input vector. The body is
Headers: we need to specify the content type of the data we are sending. In this case, it is application/json.

In [4]:

Copied!





# we build de body (payload) of the request
headers = {'Content-Type': 'application/json'}
features = [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]]  # list of lists (vectors)
body = {'inputs': features}
# we build de body (payload) of the request
headers = {'Content-Type': 'application/json'}
features = [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]]  # list of lists (vectors)
body = {'inputs': features}

3. Convert the data to JSON¶

We need to convert the data to JSON format. We can use the json library to do this.

In [5]:

Copied!

import json

# Convert the payload to JSON format
body_json = json.dumps(body)
import json

# Convert the payload to JSON format
body_json = json.dumps(body)

4. Send the request¶

Now we send the request (POST) to the model using the requests library. The response will be a JSON with the predictions.

In [6]:

Copied!

# Make a POST request
response = requests.post(ENDPOINT, headers=headers, data=body_json)
response.json()
# Make a POST request
response = requests.post(ENDPOINT, headers=headers, data=body_json)
response.json()

Out[6]:

{'predictions': [-37.34314240221075]}

We can check that the status code of the response is 200, which means the request was successful.

In [7]:

Copied!

# Check the response
response.status_code
# Check the response
response.status_code

Out[7]: