Call a Model API¶
Once your model is deployed, you can call it using the requests
library. MLFlow builds an API endpoint for your model, the endpoints are the following:
POST /invocations
: An inference endpoint that accepts POST requests with input data and returns predictions.GET /ping
: Used for health checks.GET /health
: Same as /pingGET /version
: Returns the MLflow version.
Check the model status¶
First, we need to check if the model is running. We can do this by calling the /ping
endpoint.
In [1]:
Copied!
import requests
ENDPOINT = "http://localhost:5001/ping"
response = requests.get(ENDPOINT)
response
import requests
ENDPOINT = "http://localhost:5001/ping"
response = requests.get(ENDPOINT)
response
Out[1]:
<Response [200]>
Check the model version¶
We can also check the model version by calling the /version
endpoint.
In [2]:
Copied!
import requests
ENDPOINT = "http://127.0.0.1:5001/version"
response = requests.get(ENDPOINT)
response.text
import requests
ENDPOINT = "http://127.0.0.1:5001/version"
response = requests.get(ENDPOINT)
response.text
Out[2]:
'2.17.0'
In [3]:
Copied!
# Define the URL and payload (JSON data)
ENDPOINT = 'http://localhost:5001/invocations'
# Define the URL and payload (JSON data)
ENDPOINT = 'http://localhost:5001/invocations'
2. Prepare the data to be sent¶
Now we prepare the data to be sent to the model. When sending data remember we have 2 parts:
- Body: also called payload, it is the data we want to send to the model. The data should be in JSON format (in python, it is a dictionary). The JSON has a single key
inputs
and the value is a list of vectors. Each vector has 8 elements, those are the features of the model. The model will return a list of predictions, one for each input vector. The body is - Headers: we need to specify the content type of the data we are sending. In this case, it is
application/json
.
In [4]:
Copied!
# we build de body (payload) of the request
headers = {'Content-Type': 'application/json'}
features = [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]] # list of lists (vectors)
body = {'inputs': features}
# we build de body (payload) of the request
headers = {'Content-Type': 'application/json'}
features = [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]] # list of lists (vectors)
body = {'inputs': features}
3. Convert the data to JSON¶
We need to convert the data to JSON format. We can use the json
library to do this.
In [5]:
Copied!
import json
# Convert the payload to JSON format
body_json = json.dumps(body)
import json
# Convert the payload to JSON format
body_json = json.dumps(body)
4. Send the request¶
Now we send the request (POST) to the model using the requests
library. The response will be a JSON with the predictions.
In [6]:
Copied!
# Make a POST request
response = requests.post(ENDPOINT, headers=headers, data=body_json)
response.json()
# Make a POST request
response = requests.post(ENDPOINT, headers=headers, data=body_json)
response.json()
Out[6]:
{'predictions': [-37.34314240221075]}
We can check that the status code of the response is 200, which means the request was successful.
In [7]:
Copied!
# Check the response
response.status_code
# Check the response
response.status_code
Out[7]:
200