I am trying to test a model compiled for Inferentia on an inf1.2xlarge
, but when loading the model I receive the following error messages:
2022-Sep-15 22:10:01.0152 3802:3802 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 1073741824
2022-Sep-15 22:10:01.0152 3802:3802 ERROR TDRV:dma_ring_alloc Failed to allocate TX ring
2022-Sep-15 22:10:01.0172 3802:3802 ERROR TDRV:io_create_rings Failed to allocate io ring for queue qPoolOut0_0
2022-Sep-15 22:10:01.0172 3802:3802 ERROR TDRV:kbl_model_add create_io_rings() error
2022-Sep-15 22:10:01.0182 3802:3802 ERROR NMGR:dlr_kelf_stage Failed to load subgraph
2022-Sep-15 22:10:01.0182 3802:3802 ERROR NMGR:stage_kelf_models Failed to stage graph: kelf-a.json to NeuronCore
2022-Sep-15 22:10:01.0184 3802:3802 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: 1.11.7.0+aec18907e-/tmp/tmptkxl8amn, err: 4
These are wrapped into a Python runtime exception:
RuntimeError: Could not load the model status=4 message=Allocation Failure
I presume that this is because the model is on the large size. The .neff
file is 373MB and takes ~4 hours to compile for a batch size of 1.
This particular model is compiled for a single Neuron core. I am now trying to compile with --neuroncore-pipeline-cores 4
to spread the model across multiple cores. This however gives me the following log message:
INFO: The requested number of neuroncore-pipeline-cores (4) may not be suitable for this network, and may lead to sub-optimal performance. Recommended neuroncore-pipeline-cores for this network is 1.
(I can't find any technical details on how much memory an Inferentia chip has, although I'm guessing that due to Inferentia architecture "memory" is not used in the same way as it might be on CPU or GPU.)
So, what is a practical size limit for an Inferentia model and what can I do about running this model on Inf1?
Thank you, I'm not able to share details publicly (happy to talk privately), but input is dimension 1×32768×3 and the model is multiple layers of convolution and batch normalisation, running on PyTorch. The trained model is 15MB.
Also, native data type is generally FP32, with integers for indexes
If you could please email us at aws-neuron-support@amazon.com we can start a more in depth conversation. In short we won't need the exact model or weights, but something close enough to replicate the failure. Because of operator fusing that happens during compilation this will likely need to be relatively close in structure.
Hi @ntw-au, are you able to email us (aws-neuron-support@amazon.com) so that we can start a private conversation about the model?
Thanks @mrnikwaws and @AWS-mvaria, I've sent an email to your support address