- Newest
- Most votes
- Most comments
Hello javinewtral,
This error message usually indicates that the Neuron runtime daemon is not running.
First, can you try the following to verify that the 'aws-neuron-dkms' package is installed:
**sudo apt list | grep neuron**
Assuming it is installed, can you run the following to make sure that the daemon service is started:
**sudo systemctl status neuron-rtd**
If the service is not running, you can restart it by running:
**sudo systemctl restart neuron-rtd**
Let us know if that works. If the problem persists, then we may need to dive deeper into the logs to understand the failure.
Regards,
Taylor
Edited by: Taylor@AWS on Apr 26, 2021 8:36 AM
Looks like there is a related issue that results in the same error message.
The OS auto upgrade is dropping the Inferentia kernel modules.
Command lines tools, for example neuron-top return with "context deadline exceeded".
neuron-ls reports the wrong number of "NEURON CORES" but appears to run!
The auto upgrade from 5.4.0-1049-aws to 5.4.0-1051-aws triggers the problem for me.
I'm using Deep Learning AMI (Ubuntu 18.04)
caffeinate wrote:
Looks like there is a related issue that results in the same error message.
The OS auto upgrade is dropping the Inferentia kernel modules.
aws-neuron-dkms is a special kernel module package that has a dependency on the linux kernel version. This means that when the package is installed through apt, it will only be compatible with the linux kernel version that was running on the instance during the installation.
You have to re-install this package if you change the linux kernel version. The current kernel version can be checked using uname -r, and a list of the kernels that have dkms installed can be checked with dkms status | grep aws-neuron. Refer to the https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-troubleshoot.html#neuron-driver-installation-fails for steps on how to re-install aws-neuron-dkms on a new kernel.
Command lines tools, for example neuron-top return with "context deadline exceeded".
neuron-ls reports the wrong number of "NEURON CORES" but appears to run!
That's rather interesting. Do you mind sharing the output neuron-ls is giving you and the output of apt list | grep aws-neuron ? If we can repro this neuron-ls problem internally, we will likely fix it before the next release goes out. Thanks!
Relevant content
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago