Exporting models for serving using TorchServe

Learn via video courses
Topics Covered

Overview

TorchServe is a framework developed by PyTorch to serve machine learning models to users in a seamless and performant manner. This article is a tutorial that teaches the creation and deployment of our own PyTorch models in production using TorchServe.

We will be following the structure as laid below.

  • Firstly, we will look at what TorchServe is and understand the features it provides.

  • After this, we will learn to install TorchServe and go through the Java dependency required for the same.

  • We will use a pre-trained model from hugging face to demonstrate the various components involved in serving models using TorchServe.

  • We will learn to create all the required JSON files, along with the model handler class containing all the required methods with sufficient explanations as necessary.

  • After this, we will learn to create the model archive file (.mar file), launch the server using TorchServe, check its status, and stop the running server.

Introduction

Model serving is one of the most crucial steps in developing an MLOps pipeline. Model serving is the process of hosting machine-learning models to make them available via an API so that these can be incorporated by applications into their AI-based systems or used by end users via the API itself.

To this end, lately, there has been a lot of interest in developing tools that facilitate the model serving part of a machine learning project pipeline. As we know, PyTorch is one of the most widely used libraries for developing deep learning models; hence, the PyTorch team has developed TorchServe, a tool for serving deep neural models developed in PyTorch to the users as an API.

Let us next look at what TorchServe is.

What is TorchServe?

TorchServe is an easy-to-use tool for serving deep learning models trained using PyTorch. It supports models trained using both the eager mode and the TorchScript mode. After deploying the models, TorchServe provides a seamless and performant way to make the models accessible through an API.

In a few simple steps, as listed below, TorchServe can serve models developed with PyTorch.

  • Develop a PyTorch model and export it
  • Create necessary files for the model along with a model handler
  • Generate the model archive
  • Use TorchServe to Serve the model
  • Model Monitoring and managing

Installing TorchServe

First, we must install Java on our machine as TorchServe is implemented in Java.

Use the Oracle official website to download and install the JDK (Java Development Kit).

To verify the correct installation on the machine, use the following code from the command line -

Output

If you are using google colab, you can run the command by beginning it with a ! like so -

Since PyTorch is the library we are working with, ensure it is correctly installed on your system with a stable version. Refer to the installation commands here, or read this short blog to learn how to install PyTorch on your system properly with stable versions.

After properly installing PyTorch on the system, we will need to install TorchServe and Torch Model archiver using the following command -

Output

Since we are using a pre-trained model from hugging face, we will install their transformers library using the below command, like so -

Output

Exporting the Model to the Local

As we have discussed, we will use a pretrained model from the hugging face ecosystem to demonstrate the use of TorchServe.

Let us use this model from the Cardiff NLP group. The RoBERTa-base model is trained on ~124M tweets from January 2018 to December 2021 and fine-tuned for sentiment analysis with the TweetEval benchmark. The labels it is trained on are as follows -

0 -> Negative; 1 -> Neutral; 2 -> Positive

To be able to use this model, we will first instantiate the tokenizer associated with it and then create a model instance using the following code -

Let us now save the tokenizer and model instances we created like so -

Save the Model

The directory structure in google colab and otherwise should now look like the following -

JSON Request & Label Mapper file

To tell TorchServe how to process the data sent to the model endpoint later on after deploying, we will need to create a script that does just that. We will create a sample json file that will be sent to the model endpoint.

An example of our case could be the following -

Let us also create another json file called index_to_name.json, which will contain the mapping between the labels and the integers associated with them for model training purposes. That is, this file contains the encoding info between the model output and the description of that output.

For our model, the label mapper is as follows -

Create the model handler

The model handler is responsible for using the input data received over the network via an HTTP request and converting it to output by passing it through our model.

There are many default handlers provided by TorchServe that can be direct. However, creating our custom handler according to the model and dataset might be desirable.

Creating our custom model handler involves the creation of a class that inherits from the base class basehandler provided by TorchServe.

Let us first look at some code for our custom handler, after which we will get a detailed walkthrough of all the methods inside the class, along with their intended behaviors.

Not - all of this code should be created inside a file called handler.py.

First of all, we created a logger to be able to use it to print information later on after the model is served.

The initialize function deals with context which is an object of type JSON and contains information about the model artifacts parameters. Note that the code shown above for initialize is yet not completed as before moving further, we will inspect 2 important attributes of the JSON object context namely, system_properties and manifest.

To do that, let us first serve our model using this code and inspect the information contained in the two attributes context.system_properties and context through the logs. manifest.

Serving the Model in the Localhost

Till now, the directory structure looks like the following -

To serve the model in our local host, we need to create a single file called the .mar model archive file that will hold the model artifacts and complementary assets in a single file.

This file is what is now used to register our model to TorchServe. This shareable file contains all the necessary components to serve the model.

To create the .mar file, create a folder called model_store, navigate to the current working directory and run the following command -

This command uses torch-model-archiver and creates a file named in the folder model_store. To get info on the required arguments, we can use the below command -

After generating the model archive file, we are now ready to register our model using TorchServe. We will first create a config.properties file with the following content -

The following command can now be used to register the model and serve it using TorchServe -

This command starts the model server in your local host. --model-store argument is used to set the location from which the models will be loaded, and --models MODEL_NAME=<PATH_TO_MAR_FILE> is used to register the model.

Upon a successful run of this command, the following logs are printed on the terminal -

The Inference address, Management address, and Metrics address store the URLs used to generate predictions from the model for inferencing, managing the models, and accessing the model metrics, respectively.

Completing the Model Handler

As already discussed, the initialization function of the model handler class still needs to be completed as we were now serving the model to inspect the two attributes context.system_properties and context. Manifest of the context object (an argument to the init function) looks like.

Upon investigating the logs, we can see that context. Manifest provides us with the details about our model and context.system_properties lists the info about the name of our model directory and can also be used to set our compute platform.

The model directory contains all the additional files we passed as arguments when generating the model archive.

Initialization Function

Let us now complete the initialization function and use it to utilize any available GPUs and load the model, the tokenizer instance, and the mapping file.

The complete function looks like the following -

Preprocessing Function

Another function in the custom model handler class is the Preprocessing function. This function handles the incoming request objects, preprocesses, and tokenizes the data using the tokenizer instance to be able to send it to the model for generating outputs.

To specifically look at the contents of the request object, we can implement logging in our model handler and send a POST request containing the sample input file we created earlier called sample_input.json to localhost:8000/predictions/<MODEL_NAME>. The request object as seen in the logs is as follows -

The request object is a list containing a dictionary {'body' (or 'data'): sample_input.json}.

We can now create the preprocessing function that will take care of unpacking the incoming input data and tokenize it, like so -

Hence, the preprocess function prepares the input data for modeling. Next up, we have the inference function.

The Inference Function

The inference function feeds the tokenized tensors into the model and returns the model outputs. The function is defined as below -

Post Processing Function

This function maps the integer output generated by the model into string labels, like so -

Create .mar File

We are completing our model handler class and can serve our model exactly like we did in earlier steps.

Let us create the model archive file again using the following command -

Launch Model Server

After this we register the model and serve it using TorchServe using the command below -

The config file is created as before.

Test & Stop the Server

Let us now check if the model was correctly served without any errors. We can use the following command to check the availability of the deployed TorchServe API.

Output:

This command sends an HTTP GET request to the Inference API deployed by default in the 8000 port. Check the config file config.properties that specifies the “inference_address” including the port in case a different address was used.

If the above command prints the status as healthy, the model was successfully served. Otherwise, you must check the printed logs to look for the errors produced.

Finally, the following command can be used to stop the server -

Conclusion

With this, we are now done learning how to export models for serving using TorchServe. Let us now review in points what we learned and implemented in this article -

  • We first learned about what TorchServe is and how it caters to one of the most important steps in the MLOps pipeline called serving of ML models.
  • After this, we loaded a pretrained model from the hugging face hub and its tokenizer and saved them both for serving.
  • We learned to create the model handler class with the required functions - initialize, preprocess, inference, and post-process. We also walked through serving a model by creating all the required files.
  • Finally, we learned to create the model archive file .mar file and registered and served the model using TorchServe.
  • we also learned how to test and stop the server.