Questions tagged with AWS Neuron
Content language: English
Sort by most recent
Hello,
We are testing the pipeline mode for neuron/inferentia, but can not get a model running for multi-core. The single core compiled model loads fine and is able to run inference on inferentia without issue. However, after compiling a model for multi-core using `compiler-args=['--neuroncore-pipeline-cores', '4']` (which takes ~16hrs on a r6a.16xl) the model errors out while loading into memory on the inferentia box. Here's the error message:
```
2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 589824
2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:copy_and_stage_mr_one_channel Failed to allocate aligned (0) buffer in MLA DRAM for W10-t of size 589824 bytes, channel 0
2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:kbl_model_add copy_and_stage_mr() error
2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 16777216
2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:dma_ring_alloc Failed to allocate RX ring
2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:drs_create_data_refill_rings Failed to allocate pring for data refill dma
2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:kbl_model_add create_data_refill_rings() error
2022-Nov-22 22:29:26.0116 20764:20764 ERROR TDRV:remove_model Unknown model: 1001
2022-Nov-22 22:29:26.0116 20764:20764 ERROR TDRV:kbl_model_remove Failed to find and remove model: 1001
2022-Nov-22 22:29:26.0117 20764:20764 ERROR TDRV:remove_model Unknown model: 1001
2022-Nov-22 22:29:26.0117 20764:20764 ERROR TDRV:kbl_model_remove Failed to find and remove model: 1001
2022-Nov-22 22:29:26.0117 20764:20764 ERROR NMGR:dlr_kelf_stage Failed to load subgraph
2022-Nov-22 22:29:26.0354 20764:20764 ERROR NMGR:stage_kelf_models Failed to stage graph: kelf-a.json to NeuronCore
2022-Nov-22 22:29:26.0364 20764:20764 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: 1.11.7.0+aec18907e-/tmp/tmpab7oth00, err: 4
Traceback (most recent call last):
File "infer_test.py", line 34, in <module>
model_neuron = torch.jit.load('model-4c.pt')
File "/root/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/jit_load_wrapper.py", line 13, in wrapper
script_module = jit_load(*args, **kwargs)
File "/root/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: Could not load the model status=4 message=Allocation Failure
```
Any help would be appreciated.
I am trying to load a neuron compiled model generated as given in https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html . I am still a newbie so please excuse my mistakes. This is my code for loading a neuron compiled model.It is almost entirely based on the code in the page referred earlier.
from transformers import pipeline
import tensorflow as tf
import tensorflow.neuron as tfn
class TFBertForSequenceClassificationDictIO(tf.keras.Model):
def __init__(self, model_wrapped):
super().__init__()
self.model_wrapped = model_wrapped
self.aws_neuron_function = model_wrapped.aws_neuron_function
def call(self, inputs):
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
logits = self.model_wrapped([input_ids, attention_mask])
return [logits]
class TFBertForSequenceClassificationFlatIO(tf.keras.Model):
def __init__(self, model):
super().__init__()
self.model = model
def call(self, inputs):
input_ids, attention_mask = inputs
output = self.model({'input_ids': input_ids, 'attention_mask': attention_mask})
return output['logits']
string_inputs = [
'I love to eat pizza!',
'I am sorry. I really want to like it, but I just can not stand sushi.',
'I really do not want to type out 128 strings to create batch 128 data.',
'Ah! Multiplying this list by 32 would be a great solution!',
]
string_inputs = string_inputs * 32
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
neuron_pipe = pipeline('sentiment-analysis', model=model_name, framework='tf')
example_inputs = neuron_pipe.tokenizer(string_inputs)
pipe = pipeline('sentiment-analysis', model=model_name, framework='tf')
reloaded_model = tf.keras.models.load_model('./distilbert_b128_2')
model_wrapped = TFBertForSequenceClassificationFlatIO(pipe.model)
example_inputs_list = [example_inputs['input_ids'], example_inputs['attention_mask']]
model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list)
rewrapped_model = TFBertForSequenceClassificationDictIO(model_wrapped_traced)
This is the stacktrace
2022-11-05 02:46:55.553817: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.
All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_39']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
Traceback (most recent call last):
File "inferencesmall.py", line 40, in <module>
model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow_neuron/python/_trace.py", line 167, in trace
func = func.get_concrete_function(*example_inputs)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize
*args, **kwds))
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3140, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
tensorflow.python.autograph.impl.api.StagingError: in user code:
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 987, in error_handler *
return fn(*args, **kwargs)
StagingError: Exception encountered when calling layer "tf_bert_for_sequence_classification_flat_io" (type TFBertForSequenceClassificationFlatIO).
in user code:
File "inferencesmall.py", line 22, in call *
output = self.model({'input_ids': input_ids, 'attention_mask': attention_mask})
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
StagingError: Exception encountered when calling layer "tf_distil_bert_for_sequence_classification_1" (type TFDistilBertForSequenceClassification).
in user code:
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/models/distilbert/modeling_tf_distilbert.py", line 798, in call *
distilbert_output = self.distilbert(
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
StagingError: Exception encountered when calling layer "distilbert" (type TFDistilBertMainLayer).
in user code:
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/models/distilbert/modeling_tf_distilbert.py", line 423, in call *
inputs = input_processing(
File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/modeling_tf_utils.py", line 372, in input_processing *
output[parameter_names[i]] = input
IndexError: list index out of range
Thanks in advance
Ajay
Hi,
This link https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.html mentions how to compile using tensorflow 1. Can anyone let me know the steps to neuron compile a BERT large model for running inference on inferentia using tensorflow v2??
Thanks in advance Ajay
P.S This is what my log looks like while compiling on tf1 INFO:tensorflow:fusing subgraph {subgraph neuron_op_e76ab3d9bc74f09f with input tensors ["<tf.Tensor 'bert/encoder/ones0/_0:0' shape=(1, 512, 1) dtype=float32>", "<tf.Tensor 'bert/encoder/Cast0/_1:0' shape=(1, 1, 512) dtype=float32>", "<tf.Tensor 'bert/embeddings/LayerNorm/batchnorm/add_10/_2:0' shape=(1, 512, 1024) dtype=float32>"], output tensors ["<tf.Tensor 'bert/pooler/dense/Tanh:0' shape=(1, 1024) dtype=float32>", "<tf.Tensor 'bert/encoder/layer_23/output/LayerNorm/batchnorm/add_1:0' shape=(1, 512, 1024) dtype=float32>"]} with neuron-cc . Compiler status ERROR WARNING:tensorflow:11/03/2022 04:28:48 AM ERROR 9932 [neuron-cc]: Failed to parse model /tmp/tmpbyvnmr6h/neuron_op_e76ab3d9bc74f09f/graph_def.pb: The following operators are not implemented: {'Einsum'} (NotImplementedError)
INFO:tensorflow:Number of operations in TensorFlow session: 7427 INFO:tensorflow:Number of operations after tf.neuron optimizations: 2901 INFO:tensorflow:Number of operations placed on Neuron runtime: 0
WARNING:tensorflow:Converted /home/ubuntu/bert_repo/patent_model/ to ./bert-saved-model-neuron_tf1.15 but no operator will be running on AWS machine learning accelerators. This is probably not what you want. Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)! {'OnNeuronRatio': 0.0}
---I assume the OnNeuronRatio being 0 means that I wont be able to make use of Inferentia hardware acceleration. Is that correct?
I just started using Neuron on Inf1 and I'm following the examples. I did the [resnet50](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/resnet50.html) example, no problems. Then I tried to follow the [BERT](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html) example and I got the following error. I followed [these](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html) troubleshooting steps - Neuron is installed and I haven't seen any of the other errors.
```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_27071/3834338657.py in <module>
35
36 # Verify the TorchScript works on both example inputs
---> 37 paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)
38 not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)
39
~/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(372): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(548): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
/tmp/ipykernel_27071/3834338657.py(34): <module>
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
to your system logs. See the Neuron Runtime's troubleshooting guide for help on this
topic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/
```
When I searched the system logs from **EC2 > Instances > ... > Get system log** for "neuron" I got these results:
```
[ 264.104973] neuron: loading out-of-tree module taints kernel.
[ 264.107388] neuron: module verification failed: signature and/or required key missing - tainting kernel
[ 264.113673] Neuron Driver Started with Version:2.3.26.0-67ad286904ed6cc43a8761d89c8477de0ba961e1
[ 264.119896] neuron:nr_reset_thread_fn: nd0: initiating reset
[ 264.139198] neuron:mpset_constructor: reserved 134217728 bytes of host memory
[ 271.253090] neuron:nr_reset_thread_fn: nd0: reset completed
[ 2010.629871] neuron:npid_attach: neuron:npid_attach: pid=25574, slot=0
[ 2069.531541] neuron:npid_detach: neuron:npid_detach: pid=25574, slot=0
[ 2093.038724] neuron:npid_attach: neuron:npid_attach: pid=25666, slot=0
[ 2266.011350] neuron:npid_detach: neuron:npid_detach: pid=25666, slot=0
[ 2271.008477] neuron:npid_attach: neuron:npid_attach: pid=25777, slot=0
[ 4139.513592] neuron:npid_detach: neuron:npid_detach: pid=25777, slot=0
```
I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript.
My code:
```
import tensorflow # to workaround a protobuf version conflict issue
import torch
import torch.neuron
import torch.nn.functional as F
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('./data/tokenizer')
model = AutoModelForSequenceClassification.from_pretrained('data/model', return_dict=False, torchscript=True)
parsed_json_list = [
{"premise": "I have a dog", "hypotheses": ["I love dogs", "I hate dogs"]}
]
model_inputs = tokenizer(
[
parsed_json["premise"]
for parsed_json in parsed_json_list
for _ in parsed_json["hypotheses"]
],
[
hypothesis
for parsed_json in parsed_json_list
for hypothesis in parsed_json["hypotheses"]
],
return_tensors='pt',
padding=True,
truncation=True,
)
pred = model(**model_inputs)
example_inputs= model_inputs['input_ids'], model_inputs['attention_mask'], model_inputs['token_type_ids']
model_neuron = torch.neuron.trace(model, example_inputs, verbose=1)
```
Neuron log after torch.neuron.trace:
```
INFO:Neuron:Number of neuron graph operations 121 did not match traced graph 101 - using heuristic matching of hierarchical information
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2131, compiled = 1992, percent compiled = 93.48%
INFO:Neuron:The neuron partitioner created 51 sub-graphs
INFO:Neuron:Neuron successfully compiled 50 sub-graphs, Total fused subgraphs = 51, Percent of model sub-graphs successfully compiled = 98.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 398
INFO:Neuron: => aten::ScalarImplicit: 2
INFO:Neuron: => aten::add: 82
INFO:Neuron: => aten::arange: 2
INFO:Neuron: => aten::bmm: 47
INFO:Neuron: => aten::clamp: 22
INFO:Neuron: => aten::contiguous: 71
INFO:Neuron: => aten::detach: 144
INFO:Neuron: => aten::div: 60
INFO:Neuron: => aten::expand: 22
INFO:Neuron: => aten::gelu: 13
INFO:Neuron: => aten::layer_norm: 26
INFO:Neuron: => aten::linear: 97
INFO:Neuron: => aten::mul: 38
INFO:Neuron: => aten::neg: 11
INFO:Neuron: => aten::permute: 71
INFO:Neuron: => aten::select: 1
INFO:Neuron: => aten::size: 460
INFO:Neuron: => aten::slice: 27
INFO:Neuron: => aten::sqrt: 36
INFO:Neuron: => aten::squeeze: 23
INFO:Neuron: => aten::sub: 1
INFO:Neuron: => aten::to: 96
INFO:Neuron: => aten::transpose: 47
INFO:Neuron: => aten::unsqueeze: 29
INFO:Neuron: => aten::view: 166
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 35 [supported]
INFO:Neuron: => aten::__and__: 1 [supported]
INFO:Neuron: => aten::abs: 1 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::bmm: 1 [supported]
INFO:Neuron: => aten::ceil: 1 [supported]
INFO:Neuron: => aten::clamp: 2 [supported]
INFO:Neuron: => aten::contiguous: 1 [supported]
INFO:Neuron: => aten::detach: 2 [supported]
INFO:Neuron: => aten::div: 2 [supported]
INFO:Neuron: => aten::embedding: 1 [not supported]
INFO:Neuron: => aten::expand: 2 [supported]
INFO:Neuron: => aten::gather: 24 [not supported]
INFO:Neuron: => aten::gt: 1 [supported]
INFO:Neuron: => aten::le: 1 [supported]
INFO:Neuron: => aten::linear: 1 [supported]
INFO:Neuron: => aten::log: 2 [supported]
INFO:Neuron: => aten::lt: 1 [supported]
INFO:Neuron: => aten::mul: 2 [supported]
INFO:Neuron: => aten::neg: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::repeat: 24 [not supported]
INFO:Neuron: => aten::sign: 1 [not supported]
INFO:Neuron: => aten::size: 10 [supported]
INFO:Neuron: => aten::slice: 2 [supported]
INFO:Neuron: => aten::squeeze: 2 [supported]
INFO:Neuron: => aten::to: 5 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::type_as: 2 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 2 [supported]
INFO:Neuron: => aten::where: 2 [not supported]
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 468 did not match traced graph 599 - using heuristic matching of hierarchical information
CPU times: user 1min 50s, sys: 13.1 s, total: 2min 3s
Wall time: 7min 37s
```
After i try save my model:
```
model_neuron.save('test.pt')
```
error log:
```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_17776/1444584018.py in <module>
----> 1 model_neuron.save('test.pt')
~/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_script.py in save(self, f, **kwargs)
691 See :func:`torch.jit.save <torch.jit.save>` for details.
692 """
--> 693 return self._c.save(str(f), **kwargs)
694
695 def _save_for_lite_interpreter(self, *args, **kwargs):
RuntimeError:
Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/native_ops/prim.py(46): PythonOp
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(330): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
<timed exec>(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magics/execution.py(1335): time
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magic.py(187): <lambda>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/decorator.py(232): fun
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2473): run_cell_magic
/tmp/ipykernel_17776/2573155944.py(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
```
Thanks in advance.
I followed user guide on updating torch neuron and then started compiling the model to neuron.
But got an error, from which I don't understand what's wrong.
In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU.
The error:
```
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0%
INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information
INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35'
..............................................................................INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
Traceback (most recent call last):
File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter
item, inputs, compiler_workdir=sg_workdir, **kwargs)
File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace
'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 942 [supported]
INFO:Neuron: => aten::_convolution: 107 [supported]
INFO:Neuron: => aten::add: 104 [supported]
INFO:Neuron: => aten::batch_norm: 1 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 4 [supported]
INFO:Neuron: => aten::div: 104 [supported]
INFO:Neuron: => aten::dropout: 208 [supported]
INFO:Neuron: => aten::feature_dropout: 1 [supported]
INFO:Neuron: => aten::flatten: 60 [supported]
INFO:Neuron: => aten::gelu: 52 [supported]
INFO:Neuron: => aten::layer_norm: 161 [supported]
INFO:Neuron: => aten::linear: 264 [supported]
INFO:Neuron: => aten::matmul: 104 [supported]
INFO:Neuron: => aten::mul: 52 [supported]
INFO:Neuron: => aten::permute: 210 [supported]
INFO:Neuron: => aten::relu: 1 [supported]
INFO:Neuron: => aten::reshape: 262 [supported]
INFO:Neuron: => aten::select: 104 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::size: 278 [supported]
INFO:Neuron: => aten::softmax: 52 [supported]
INFO:Neuron: => aten::transpose: 216 [supported]
INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported]
INFO:Neuron: => aten::view: 52 [supported]
Traceback (most recent call last):
File "to_neuron.py", line 14, in <module>
model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()])
File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace
cu.stats_post_compiler(neuron_graph)
File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler
"No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
```
I'm following some guides and from my understanding this should be possible. But I've been trying for hours to compile a yolov5 model into a neuron model with no success. Is it even possible to do this in my local machine or do I have to be in an inferentia instance?
This is what my environment looks like:
```
# packages in environment at /miniconda3/envs/neuron:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.2.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
attrs 22.1.0 pypi_0 pypi
backcall 0.2.0 pyhd3eb1b0_0
ca-certificates 2022.07.19 h06a4308_0
cachetools 5.2.0 pypi_0 pypi
certifi 2022.9.24 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
debugpy 1.5.1 py37h295c915_0
decorator 5.1.1 pyhd3eb1b0_0
dmlc-nnvm 1.11.1.0+0 pypi_0 pypi
dmlc-topi 1.11.1.0+0 pypi_0 pypi
dmlc-tvm 1.11.1.0+0 pypi_0 pypi
entrypoints 0.4 py37h06a4308_0
fonttools 4.37.3 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
google-auth 2.12.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
gputil 1.4.0 pypi_0 pypi
grpcio 1.49.1 pypi_0 pypi
h5py 3.7.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 4.12.0 pypi_0 pypi
inferentia-hwm 1.11.0.0+0 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
ipykernel 6.15.2 py37h06a4308_0
ipython 7.34.0 pypi_0 pypi
ipywidgets 8.0.2 pypi_0 pypi
islpy 2021.1+aws2021.x.16.0.bld0 pypi_0 pypi
jedi 0.18.1 py37h06a4308_1
jupyter_client 7.3.5 py37h06a4308_0
jupyter_core 4.10.0 py37h06a4308_0
jupyterlab-widgets 3.0.3 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libsodium 1.0.18 h7b6447c_0
libstdcxx-ng 11.2.0 h1234567_1
llvmlite 0.39.1 pypi_0 pypi
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.1 pypi_0 pypi
matplotlib 3.5.3 pypi_0 pypi
matplotlib-inline 0.1.6 py37h06a4308_0
ncurses 6.3 h5eee18b_3
nest-asyncio 1.5.5 py37h06a4308_0
networkx 2.4 pypi_0 pypi
neuron-cc 1.11.7.0+aec18907e pypi_0 pypi
numba 0.56.2 pypi_0 pypi
numpy 1.19.5 pypi_0 pypi
oauthlib 3.2.1 pypi_0 pypi
opencv-python 4.6.0.66 pypi_0 pypi
openssl 1.1.1q h7f8727e_0
opt-einsum 3.3.0 pypi_0 pypi
packaging 21.3 pyhd3eb1b0_0
pandas 1.3.5 pypi_0 pypi
parso 0.8.3 pyhd3eb1b0_0
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 9.2.0 pypi_0 pypi
pip 22.2.2 pypi_0 pypi
pluggy 1.0.0 pypi_0 pypi
prompt-toolkit 3.0.31 pypi_0 pypi
protobuf 3.20.3 pypi_0 pypi
psutil 5.9.2 pypi_0 pypi
ptyprocess 0.7.0 pyhd3eb1b0_2
py 1.11.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.13.0 pypi_0 pypi
pyparsing 3.0.9 py37h06a4308_0
pytest 7.1.3 pypi_0 pypi
python 3.7.13 h12debd9_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pytz 2022.2.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
pyzmq 23.2.0 py37h6a678d5_0
readline 8.1.2 h7f8727e_1
requests 2.28.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
seaborn 0.12.0 pypi_0 pypi
setuptools 59.8.0 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.3 h5082296_0
tensorboard 1.15.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow 1.15.0 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
thop 0.1.1-2209072238 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tomli 2.0.1 pypi_0 pypi
torch 1.11.0 pypi_0 pypi
torch-neuron 1.11.0.2.3.0.0 pypi_0 pypi
torchvision 0.12.0 pypi_0 pypi
tornado 6.2 py37h5eee18b_0
tqdm 4.64.1 pypi_0 pypi
traitlets 5.4.0 pypi_0 pypi
typing-extensions 4.3.0 pypi_0 pypi
urllib3 1.26.12 pypi_0 pypi
wcwidth 0.2.5 pyhd3eb1b0_0
werkzeug 2.2.2 pypi_0 pypi
wheel 0.37.1 pypi_0 pypi
widgetsnbextension 4.0.3 pypi_0 pypi
wrapt 1.14.1 pypi_0 pypi
xz 5.2.6 h5eee18b_0
zeromq 4.3.4 h2531618_0
zipp 3.8.1 pypi_0 pypi
zlib 1.2.12 h5eee18b_3
```
Hi Team,
I wanted to compile a BERT model and run it on inferentia. I trained my model using pytorch and tried to convert it by following the same steps in this [tutorial](https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb) on my amazon linux Machine. But I keep getting failure with this error:
```
09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError)
```
I followed the installation steps [here](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.html) for pytorch-1.11.0 and tried to execute the code in tutorial but got the same error.
We wanted to explore using Inferentia for our large BERT model but are blocked on doing so due to failure in conversion to NEFF format. I also tried following steps using TF and ran into some other ops unsupported issue. Could you please help!
Below are the setup commands i ran on my Amazon Linux Desktop
```
sudo yum install -y python3.7-venv gcc-c++
python3.7 -m venv pytorch_venv
source pytorch_venv/bin/activate
pip install -U pip
# Set Pip repository to point to the Neuron repository
pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
#Install Neuron PyTorch
pip install torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision
!pip install --upgrade "transformers==4.6.0"
pip install tensorflow==2.8.1
```
and then executed the below script(copied from tutorial) on my amazon linux host:
```
import tensorflow # to workaround a protobuf version conflict issue
import torch
import torch.neuron
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
import transformers
import os
import warnings
# Setting up NeuronCore groups for inf1.6xlarge with 16 cores
num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge
nc_env = ','.join(['1'] * num_cores)
warnings.warn("NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please \
see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes \
for more details.", DeprecationWarning)
os.environ['NEURONCORE_GROUP_SIZES'] = nc_env
# Build tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False)
# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
max_length=128
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
# Run the original PyTorch model on compilation exaple
paraphrase_classification_logits = model(**paraphrase)[0]
# Convert example inputs to a format that is compatible with TorchScript tracing
example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']
# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron
model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)
```
This gave me the following error:
```
2022-09-22 18:13:12.145617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 18:13:12.145649: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
sample_pytorch_model.py:14: DeprecationWarning: NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes for more details.
for more details.", DeprecationWarning)
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 433/433 [00:00<00:00, 641kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 636kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 436k/436k [00:00<00:00, 731kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 29.0/29.0 [00:00<00:00, 35.2kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 433M/433M [00:09<00:00, 45.7MB/s]
/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/transformers/modeling_utils.py:1968: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
INFO:Neuron:Compiling function _NeuronGraph$662 with neuron-cc
INFO:Neuron:Compiling with command line: '/local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35'
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
.2022-09-22 18:13:52.697717: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 18:13:52.697749: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError)
Compiler status ERROR
INFO:Neuron:Compile command returned: 1
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$662; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter
item, inputs, compiler_workdir=sg_workdir, **kwargs)
File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace
'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
Traceback (most recent call last):
File "sample_pytorch_model.py", line 38, in <module>
model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)
File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace
cu.stats_post_compiler(neuron_graph)
File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler
"No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
```
Hi. We are trying to convert all our in-house pytorch models to aws-neuron on inferentia. We successfully converted one, but the second model we tried did not compile. Unfortunately, compilation did not generate any error message nor log of any kind, so we are stuck.
The model is rather simple, but large, U-Net, with partial convolutions instead of regular ones, but otherwise no fancy operators.
Conversion of this model to torchscript is ok on the same instance.
Could it be a memory problem ?
Hi Team,
I have a fine-tuned BERT model which was trained using following libraries.
torch == 1.8.1+cu111
transformers == 4.19.4
And not able to convert that fine-tuned BERT model into AWS neuron and getting following compilation errors. Could you please help me to resolve this issue?
**Note:** Trying to compile BERT model on SageMaker notebook instance and with "conda_python3" conda environment.
**Installation:**
#### Set Pip repository to point to the Neuron repository
!pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
#### Install Neuron PyTorch - Note: Tried both options below.
"#!pip install torch-neuron==1.8.1.* neuron-cc[tensorflow] "protobuf<4" torchvision sagemaker>=2.79.0 transformers==4.17.0 --upgrade"
!pip install --upgrade torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision
---------------------------------------------------------------------------------------------------------------------------------------------------
**Model compilation:**
```
import os
import tensorflow # to workaround a protobuf version conflict issue
import torch
import torch.neuron
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_path = 'model/' # Model artifacts are stored in 'model/' directory
# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, torchscript=True)
# create dummy input for max length 128
dummy_input = "dummy input which will be padded later"
max_length = 128
embeddings = tokenizer(dummy_input, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
neuron_inputs = tuple(embeddings.values())
# compile model with torch.neuron.trace and update config
model_neuron = torch.neuron.trace(model, neuron_inputs)
model.config.update({"traced_sequence_length": max_length})
# save tokenizer, neuron model and config for later use
save_dir="tmpd"
os.makedirs("tmpd",exist_ok=True)
model_neuron.save(os.path.join(save_dir,"neuron_model.pt"))
tokenizer.save_pretrained(save_dir)
model.config.save_pretrained(save_dir)
```
---------------------------------------------------------------------------------------------------------------------------------------------------
**Model artifacts:** We have got this model artifacts from multi-label topic classification model.
config.json
model.tar.gz
pytorch_model.bin
special_tokens_map.json
tokenizer_config.json
tokenizer.json
---------------------------------------------------------------------------------------------------------------------------------------------------
**Error logs:**
```
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ops/aten.py:2022: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiling function _NeuronGraph$698 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35'
INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$698; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py", line 382, in op_converter
item, inputs, compiler_workdir=sg_workdir, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/decorators.py", line 220, in trace
'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-97bba321d013> in <module>
18
19 # compile model with torch.neuron.trace and update config
---> 20 model_neuron = torch.neuron.trace(model, neuron_inputs)
21 model.config.update({"traced_sequence_length": max_length})
22
~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs)
182 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
183 neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 184 cu.stats_post_compiler(neuron_graph)
185
186 # Wrap the compiled version of the model in a script module. Note that this is
~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph)
491 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
492 raise RuntimeError(
--> 493 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
494
495 if percent_operations_compiled < 50.0:
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
```
---------------------------------------------------------------------------------------------------------------------------------------------------
Thanks a lot.
Hi,
Using xrt application and trying to access to read/write register and got this error:
[XRT] ERROR: xclRegRW: can't map CU:
My xrt.ini contains this content:
[Debug]
profile=true
[Runtime]
rw_shared=true
runtime_log=console
and inside the my code I used those commands:
/ Read clock count from clk_cnt* registers and unite into 64-bit value
// There is issue with [XRT] error : xclRegRW: can't map CU: clock count is 0
xclOpenContext(handle, xclbinId, cuidx, false);
xclRegRead(handle, cuidx, clk_cnt_lsb_offset, &clk_cnt_lsb);
xclCloseContext(handle, xclbinId, cuidx);
xclOpenContext(handle, xclbinId, cuidx, false);
xclRegRead(handle, cuidx, clk_cnt_msb_offset, &clk_cnt_msb);
xclCloseContext(handle, xclbinId, cuidx);
long int clock_count = ((long int)clk_cnt_msb << 32) | (long int)clk_cnt_lsb;
std::cout << std::endl << "Clock count is: " << clock_count << std::endl;
Hi, all.
I am trying to deploy a pytorch model to inf1 instances.
But I am unable to use torch.neuron to trace and compile the model.
Thanks for any help..