PyTorchモデルのTensorRTはどうやって実現されますか？

14430 ワード

PyTorch モデル TensorRT

変換手順の概要

モデル定義ファイル（.pyファイル）を準備します。

トレーニングが完了したウエイトファイル（.pthまたは.pth.tar）を準備します。

Onxとonxruntime

をインストールします。

.訓練されたモデルを.onnxフォーマット

に変換する。

テンソル

をインストールします。
環境パラメータ


ubuntu-18.04
PyTorch-1.8.1
onnx-1.9.0
onnxruntime-1.7.2
cuda-11.1
cudnn-8.2.0
TensorRT-7.2.3.4

PyTorch回転ONX
Step 1：ONXとONXUNTIMEをインストールする
ネットで見つけたインストール方法はpipを通じて


pip install onnx
pip install onnxruntime

Anaconda環境を使うなら、condaをインストールしてもいいです。


conda install -c conda-forge onnx
conda install -c conda-forge onnxruntime

Step 2：netronの取り付け
netronはネットワーク構造を可視化するためのもので、debugに便利である。


pip install netron

Step 3 PyTorchからONxに切り替える
インストールが完了したら、下記のコードに従って変換できます。


#--*-- coding:utf-8 --*--
import onnx 
#       onnx    torch    ，     segmentation fault
import torch
import torchvision 

from model import Net

model= Net(args).cuda()#     
checkpoint = torch.load(checkpoint_path)
net.load_state_dict(checkpoint['state_dict'])#          
print ("Model and weights LOADED successfully")

export_onnx_file = './net.onnx'
x = torch.onnx.export(net,
					torch.randn(1,1,224,224,device='cuda'), #           dummy input
					export_onnx_file,
					verbose=False, #             
					input_names = ["inputs"]+["params_%d"%i for i in range(120)],#       ，        list，list                ，      
					output_names = ["outputs"],#        
					opset_version  = 10,#onnx      operator set,    pytorch    
					do_constant_folding = True,
					dynamic_axes = {"inputs":{0:"batch_size"}, 2:"h", 3:"w"}, "outputs":{0: "batch_size"},})

net = onnx.load('./erfnet.onnx') #  onnx    
onnx.checker.check_model(net) #           
onnx.helper.printable_graph(net.graph) #  onnx

dynamic_axesは入力、出力の可変次元を指定します。入出力のbatch_サイゼはここで可変に設定されています。入力の第2と第3次元も可変に設定されています。
Step 4：ONXモデルを検証する
次のオンxモデルを可視化しながら、モデルが正しく動作するかどうかをテストします。


import netron
import onnxruntime
import numpy as np
from PIL import Image
import cv2

netron.start('./net.onnx')
test_image = np.asarray(Image.open(test_image_path).convert('L'),dtype='float32') /255.
test_image = cv2.resize(np.array(test_image),(224,224),interpolation = cv2.INTER_CUBIC)
test_image = test_image[np.newaxis,np.newaxis,:,:]
session = onnxruntime.InferenceSession('./net.onnx')
outputs = session.run(None, {"inputs": test_image})
print(len(outputs))
print(outputs[0].shape)
#        outputs[0],        ，

ONXトランジット
Step 1：NVIDIAからTensorRTをダウンロードするインストールパッケージhttps://developer.nvidia.com/tensorrt
自分のキューダバージョンによって選んだのは、TensorRT 7.2.3です。


cd download_path
dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.1-trt7.2.3.4-ga-20210226_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt

NVIDIAの公式インストールチュートリアルhttps://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#installを調べましたが、TensorRT Python APIを呼び出す必要があるかもしれませんので、PyCUDAをインストールする必要があります。ここにPyCUDAのインストールを入れてください。


pip install 'pycuda<2021.1'

何か問題があったら、公式説明を参考にしてください。
Python 3.Xを使用したら、以下のインストールを実行してください。


sudo apt-get install python3-libnvinfer-dev

ONX graphhsurgonが必要な場合やPythonモジュールを使用する場合は、以下のコマンドを実行します。


sudo apt-get install onnx-graphsurgeon

インストールが成功したか確認します。


dpkg -l | grep TensorRT

上の図のような結果が得られました。インストールに成功しました。
問題：pythonでimport tenssorrtは、ModuleNotFoundError：No module named'tenssorrt'のエラーメッセージを得ます。
インターネットで調べましたが、dpkgでインストールされているtenssorrtはAnaconda環境のpythonではなく、標準的にシステムpythonにインストールされています。システムのデフォルトのpythonは3.6ですが、Anacondaでは3.8.8を使っています。export PYTHONSPATHの方式によって、pythonバージョンの不一致が発生します。
改めて検索してみましたが、どうやってアナコンダ環境にtenssorRTをインストールすればいいですか？


pip3 install --upgrade setuptools pip
pip install nvidia-pyindex
pip install nvidia-tensorrt

これがAnconda環境のpythonであることを確認してください。import tenssorrtができますか？


import tensorrt
print(tensorrt.__version__)
#  8.0.0.3

Step 2:ONXステアリング
先に言ってください。このステップで***AttributeErrorに出会いました。『tenssorrt.tenssorrt.Buider'object hasのatribute'max_。ワークスペースsizeのエラーメッセージ。ネットで調べたら、8.0.0.3バージョンのバグです。7.2.3.4に戻ります。
ミリ…


pip unintall nvidia-tensorrt #  8.0.0.3     
pip install nvidia-tensorrt==7.2.* --index-url https://pypi.ngc.nvidia.com #   7.2.3.4banben

コードを変換


import pycuda.autoinit 
import pycuda.driver as cuda
import tensorrt as trt
import torch 
import time 
from PIL import Image
import cv2,os
import torchvision 
import numpy as np
from scipy.special import softmax

### get_img_np_nchw h postprocess_the_output          

TRT_LOGGER = trt.Logger()

def get_img_np_nchw(img_path):
	img = Image.open(img_path).convert('L')
	img = np.asarray(img, dtype='float32')
	img = cv2.resize(np.array(img),(224, 224), interpolation = cv2.INTER_CUBIC)
	img = img / 255.
	img = img[np.newaxis, np.newaxis]
	return image
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        """host_mom  cpu  ，device_mem  GPU  
        """
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:
" + str(self.host) + "
Device:
" + str(self.device)

    def __repr__(self):
        return self.__str__()

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="",fp16_mode=False, int8_mode=False,save_engine=False):
    """
    params max_batch_size:                 
    params onnx_file_path:      onnx    
    params engine_file_path:                  
    params fp16_mode:               FP16
    params int8_mode:               INT8
    params save_engine:               
    returns:                    ICudaEngine
    """
    #               ，         cudaEngine
    if os.path.exists(engine_file_path):
        print("Reading engine from file: {}".format(engine_file_path))
        with open(engine_file_path, 'rb') as f, \
            trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())  #     
    else:  #  onnx  cudaEngine
        
        #   logger    builder 
        # builder        INetworkDefinition
        explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
        # In TensorRT 7.0, the ONNX parser only supports full-dimensions mode, meaning that your network definition must be created with the explicitBatch flag set. For more information, see Working With Dynamic Shapes.

        with trt.Builder(TRT_LOGGER) as builder, \
            builder.create_network(explicit_batch) as network,  \
            trt.OnnxParser(network, TRT_LOGGER) as parser, \
            builder.create_builder_config() as config: #   onnx         ，            
            profile = builder.create_optimization_profile()
            profile.set_shape("inputs", (1, 1, 224, 224),(1,1,224,224),(1,1,224,224))
            config.add_optimization_profile(profile)

            config.max_workspace_size = 1<<30  #            , ICudaEngine   GPU       
            builder.max_batch_size = max_batch_size #           batchsize
            builder.fp16_mode = fp16_mode
            builder.int8_mode = int8_mode

            if int8_mode:
                # To be updated
                raise NotImplementedError

            #   onnx  ，     
            if not os.path.exists(onnx_file_path):
                quit("ONNX file {} not found!".format(onnx_file_path))
            print('loading onnx file from path {} ...'.format(onnx_file_path))
            # with open(onnx_file_path, 'rb') as model: #            
            #     print("Begining onnx file parsing")
            #     parser.parse(model.read())  #   onnx  
            parser.parse_from_file(onnx_file_path) # parser         onnx   

            print("Completed parsing of onnx file")
            #         ，   builder       CudaEngine
            print("Building an engine from file{}' this may take a while...".format(onnx_file_path))

            #################
            # import pdb;pdb.set_trace()
            print(network.get_layer(network.num_layers-1).get_output(0).shape)
            # network.mark_output(network.get_layer(network.num_layers -1).get_output(0))
            engine = builder.build_engine(network,config)  #   ，   network INetworkDefinition  ，        
            print("Completed creating Engine")
            if save_engine:  #  engine           
                with open(engine_file_path, 'wb') as f:
                    f.write(engine.serialize())  #    
            return engine

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer data from CPU to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

def postprocess_the_outputs(outputs, shape_of_output):
    outputs = outputs.reshape(*shape_of_output)
    out = np.argmax(softmax(outputs,axis=1)[0,...],axis=0)
    # import pdb;pdb.set_trace()
    return out
#   TensorRT      
onnx_model_path = './Net.onnx'
max_batch_size = 1
# These two modes are dependent on hardwares
fp16_mode = False
int8_mode = False
trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
# Build an engine
engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode , save_engine=True)
# Create the context for this engine
context = engine.create_execution_context()
# Allocate buffers for input and output
inputs, outputs, bindings, stream = allocate_buffers(engine)  # input, output: host # bindings

# Do inference
img_np_nchw = get_img_np_nchw(img_path)
inputs[0].host = img_np_nchw.reshape(-1)
shape_of_output = (max_batch_size, 2, 224, 224)

# inputs[1].host = ... for multiple input
t1 = time.time()
trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream) # numpy data
t2 = time.time()
feat = postprocess_the_outputs(trt_outputs[0], shape_of_output)

print('TensorRT ok')
print("Inference time with the TensorRT engine: {}".format(t2-t1))

https://wiki.tiker.net/PyCuda/Installation/Linux/#step-1-download-and-unpack-pycuda文章の方法によって、変換時に以下のエラーを報告します。
在这里插入图片描述

リンクの中の代理購入によって変換されたのですが、修正されました。私の文の変換コードによっては大丈夫です。
修正箇所は以下の通りです。


with trt.Builder(TRT_LOGGER) as builder, \
            builder.create_network(explicit_batch) as network,  \
            trt.OnnxParser(network, TRT_LOGGER) as parser, \
            builder.create_builder_config() as config: #   onnx         ，            
            profile = builder.create_optimization_profile()
            profile.set_shape("inputs", (1, 1, 224, 224),(1,1,224,224),(1,1,224,224))
            config.add_optimization_profile(profile)

            config.max_workspace_size = 1<<30  #            , ICudaEngine   GPU       
            engine = builder.build_engine(network,config)

リンク中の該当コードを修正または追加すれば、この問題はなくなります。
ここで、PyTorchモデルのTensorRT転向についてはどうやって実現されますか？この記事を紹介します。PyTorchモデルのTensorRTの内容については以前の記事を検索してください。または次の関連記事を引き続きご覧ください。これからもよろしくお願いします。