I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript
I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript.
My code:
import tensorflow # to workaround a protobuf version conflict issue
import torch
import torch.neuron
import torch.nn.functional as F
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('./data/tokenizer')
model = AutoModelForSequenceClassification.from_pretrained('data/model', return_dict=False, torchscript=True)
parsed_json_list = [
{"premise": "I have a dog", "hypotheses": ["I love dogs", "I hate dogs"]}
]
model_inputs = tokenizer(
[
parsed_json["premise"]
for parsed_json in parsed_json_list
for _ in parsed_json["hypotheses"]
],
[
hypothesis
for parsed_json in parsed_json_list
for hypothesis in parsed_json["hypotheses"]
],
return_tensors='pt',
padding=True,
truncation=True,
)
pred = model(**model_inputs)
example_inputs= model_inputs['input_ids'], model_inputs['attention_mask'], model_inputs['token_type_ids']
model_neuron = torch.neuron.trace(model, example_inputs, verbose=1)
Neuron log after torch.neuron.trace:
INFO:Neuron:Number of neuron graph operations 121 did not match traced graph 101 - using heuristic matching of hierarchical information
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2131, compiled = 1992, percent compiled = 93.48%
INFO:Neuron:The neuron partitioner created 51 sub-graphs
INFO:Neuron:Neuron successfully compiled 50 sub-graphs, Total fused subgraphs = 51, Percent of model sub-graphs successfully compiled = 98.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 398
INFO:Neuron: => aten::ScalarImplicit: 2
INFO:Neuron: => aten::add: 82
INFO:Neuron: => aten::arange: 2
INFO:Neuron: => aten::bmm: 47
INFO:Neuron: => aten::clamp: 22
INFO:Neuron: => aten::contiguous: 71
INFO:Neuron: => aten::detach: 144
INFO:Neuron: => aten::div: 60
INFO:Neuron: => aten::expand: 22
INFO:Neuron: => aten::gelu: 13
INFO:Neuron: => aten::layer_norm: 26
INFO:Neuron: => aten::linear: 97
INFO:Neuron: => aten::mul: 38
INFO:Neuron: => aten::neg: 11
INFO:Neuron: => aten::permute: 71
INFO:Neuron: => aten::select: 1
INFO:Neuron: => aten::size: 460
INFO:Neuron: => aten::slice: 27
INFO:Neuron: => aten::sqrt: 36
INFO:Neuron: => aten::squeeze: 23
INFO:Neuron: => aten::sub: 1
INFO:Neuron: => aten::to: 96
INFO:Neuron: => aten::transpose: 47
INFO:Neuron: => aten::unsqueeze: 29
INFO:Neuron: => aten::view: 166
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 35 [supported]
INFO:Neuron: => aten::__and__: 1 [supported]
INFO:Neuron: => aten::abs: 1 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::bmm: 1 [supported]
INFO:Neuron: => aten::ceil: 1 [supported]
INFO:Neuron: => aten::clamp: 2 [supported]
INFO:Neuron: => aten::contiguous: 1 [supported]
INFO:Neuron: => aten::detach: 2 [supported]
INFO:Neuron: => aten::div: 2 [supported]
INFO:Neuron: => aten::embedding: 1 [not supported]
INFO:Neuron: => aten::expand: 2 [supported]
INFO:Neuron: => aten::gather: 24 [not supported]
INFO:Neuron: => aten::gt: 1 [supported]
INFO:Neuron: => aten::le: 1 [supported]
INFO:Neuron: => aten::linear: 1 [supported]
INFO:Neuron: => aten::log: 2 [supported]
INFO:Neuron: => aten::lt: 1 [supported]
INFO:Neuron: => aten::mul: 2 [supported]
INFO:Neuron: => aten::neg: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::repeat: 24 [not supported]
INFO:Neuron: => aten::sign: 1 [not supported]
INFO:Neuron: => aten::size: 10 [supported]
INFO:Neuron: => aten::slice: 2 [supported]
INFO:Neuron: => aten::squeeze: 2 [supported]
INFO:Neuron: => aten::to: 5 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::type_as: 2 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 2 [supported]
INFO:Neuron: => aten::where: 2 [not supported]
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 468 did not match traced graph 599 - using heuristic matching of hierarchical information
CPU times: user 1min 50s, sys: 13.1 s, total: 2min 3s
Wall time: 7min 37s
After i try save my model:
model_neuron.save('test.pt')
error log:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_17776/1444584018.py in <module>
----> 1 model_neuron.save('test.pt')
~/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_script.py in save(self, f, **kwargs)
691 See :func:`torch.jit.save <torch.jit.save>` for details.
692 """
--> 693 return self._c.save(str(f), **kwargs)
694
695 def _save_for_lite_interpreter(self, *args, **kwargs):
RuntimeError:
Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/native_ops/prim.py(46): PythonOp
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(330): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
<timed exec>(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magics/execution.py(1335): time
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magic.py(187): <lambda>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/decorator.py(232): fun
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2473): run_cell_magic
/tmp/ipykernel_17776/2573155944.py(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
Thanks in advance.
- 最新
- 投票最多
- 评论最多
We have not seen this exact serialization issue before, but this model is unlikely to perform well since it has unsupported operators (aten::gather
and aten::repeat
) evenly distributed throughout the model. When operators are unsupported, they are executed on CPU. For this model, this causes 51 Neuron subgraphs to be produced (with CPU operators in-between) which will most likely cause performance issues due to excessive data transfer between CPU/NeuronCores. We have previously seen an issue like this when using the DeBERTa model which uses gathers/repeats in each model layer.
We are looking into the possibility of improving the performance of these models, but currently this will not work well on Inferentia.
Thank for your help