Inference with C# BERT NLP Deep Learning and ONNX Runtime
In this tutorial we will learn how to do inferencing for the popular BERT Natural Language Processing deep learning model in C#.
In order to be able to preprocess our text in C# we will leverage the open source BERTTokenizers that includes tokenizers for most BERT models. See below for supported models.
- BERT Base
- BERT Large
- BERT German
- BERT Multilingual
- BERT Base Uncased
- BERT Large Uncased
There are many models (including the one for this tutorial) that have been fine tuned based on these base models. The tokenizer for the model is still the same as the base model that it was fine tuned from.
Contents
- Prerequisites
- Use Hugging Face to download the BERT model
- Understanding the model in Python
- Inference with C#
- Deploy with Azure Web App
- Next steps
Prerequisites
This tutorial can be run locally or by leveraging Azure Machine Learning compute.
To run locally:
To run in the cloud with Azure Machine Learning:
Use Hugging Face to download the BERT model
Hugging Face has a great API for downloading open source models and then we can use python and Pytorch to export them to ONNX format. This is a great option when using an open source model that is not already part of the ONNX Model Zoo.
Steps to download and export our model in Python
Use the transformers API to download the BertForQuestionAnswering model named bert-large-uncased-whole-word-masking-finetuned-squad
Now that we have downloaded the model we need to export it to an ONNX format. This is built into Pytorch with the torch.onnx.export function.
-
The inputs variable indicates what the input shape will be. You can either create a dummy input like below, or use a sample input from testing the model.
-
Set the opset_version to the highest and compatible version with the model. Learn more about the opset versions here.
-
Set the input_names and output_names for the model.
-
Set the dynamic_axes for the dynamic length input because the sentence and context variables will be of different lengths for each question inferenced.
Understanding the model in Python
When taking a prebuilt model and operationalizing it, its useful to take a moment and understand the models pre and post processing, and the input/output shapes and labels. Many models have sample code provided in Python. We will be inferencing our model with C# but first lets test it and see how its done in Python. This will help us with our C# logic in the next step.
-
The code to test out the model is provided in this tutorial. Check out the source for testing and inferencing this model in Python. Below is a sample input sentence and a sample output from running the model.
-
Sample input
- Here is what the output should look like for the above question. You can use the input_ids to validate the tokenization in C#.
Inference with C#
Now that we have tested the model in Python its time to build it out in C#. The first thing we need to do is to create our project. For this example we will be using a Console App however you could use this code in any C# application.
- Open Visual Studio and Create a Console App
Install the Nuget Packages
- Install the Nuget packages BERTTokenizers, Microsoft.ML.OnnxRuntime, Microsoft.ML.OnnxRuntime.Managed, Microsoft.MLdotnet add package Microsoft.ML.OnnxRuntime --version 1.16.0 dotnet add package Microsoft.ML.OnnxRuntime.Managed --version 1.16.0 dotnet add package Microsoft.ML dotnet add package BERTTokenizers --version 1.1.0
Create the App
- Import the packages
- Add the namespace, class and Main function.
Create the BertInput class for encoding
- Add the BertInput struct
Tokenize the sentence with the BertUncasedLargeTokenizer
- Create a sentence (question and context) and tokenize the sentence with the BertUncasedLargeTokenizer. The base model is the bert-large-uncased therefore we use the BertUncasedLargeTokenizer from the library. Be sure to check what the base model was for your BERT model to confirm you are using the correct tokenizer.
Create the inputs of name -> OrtValue pairs as required for inference
- Get the model, create 3 OrtValues on top of the input buffers and wrap them into a Dictionary to feed into a Run(). Beware that almost all of the Onnxruntime classes wrap native data structures, and, therefore, must be disposed to prevent memory leaks.
Run Inference
- Create the InferenceSession, run the inference and print out the result.
Postprocess the output and print the result
- Here we get the index for the start position (startLogit) and end position (endLogits). Then we take the original tokens of the input sentence and get the vocabulary value for the token ids predicted.
Deploy with Azure Web App
In this example we created a simple console app however this could easily be implemented in something like a C# Web App. Check out the docs on how to Quickstart: Deploy an ASP.NET web app.
Next steps
There are many different BERT models that have been fine tuned for different tasks and different base models you could fine tune for your specific task. This code will work for most BERT models, just update the input, output and pre/postprocessing for your specific model.
For documentation questions, please file an issue.