- Newest
- Most votes
- Most comments
@pgpx It looks like the NEURON_RT_VISIBLE_CORES is returning all the IDs of all the cores (out of order) instead of the range. The docs say that it should be an integer or a range, but maybe a list of integers is also OK?
Let me know what version of the runtime you are using and I can open a ticket to confirm the behavior.
In the meantime, if you can't update your runtime, you can either export NEURON_RT_VISIBLE_CORES yourself to the value you think it should be, or take the list, sort it, and turn it into a range. I'm sure the Hugging Face team will also welcome a pull request.
Jim
The value you're seeing from the neuron-device-plugin is actually correct and reflects a recent change in how Neuron cores are indexed in EKS container environments.
Starting with Neuron release 2.16.18.0 (released on September 1, 2023), AWS introduced 0-based indexing for Neuron Devices and NeuronCores in EKS container environments. This change was made to enable easier programmability. Previously, the Neuron Device indexing was assigned randomly.
The comma-separated list you're seeing (4,17,14,2,21,3,15,0,12,16,11,9,13,22,19,8,20,5,6,7,23,10,18,1) represents the specific Neuron cores allocated to your container. This new format allows for more granular control over which cores are assigned to each container.
However, you're correct that this change may cause issues with libraries like HuggingFace's optimum-neuron that expect the older format (0-23). In this case, it would be appropriate to submit an issue or pull request to the HuggingFace optimum-neuron library to update their code to handle both the new comma-separated format and the older range format.
As for the documentation, it appears that it may not have been fully updated to reflect this change. It would be helpful to submit feedback to AWS about updating the documentation to clarify the new format for NEURON_RT_VISIBLE_CORES in EKS environments.
In summary, the value you're seeing is correct according to the latest Neuron updates, but it may require updates to both documentation and dependent libraries to fully support this new format.
Sources
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/_sources/release-notes/containers/neuron-k8.rst
‘neurondevice’ resource name in Neuron Device K8s plugin no longer supported — AWS Neuron Documentation
Relevant content
- Accepted Answerasked 2 years ago
- Accepted Answerasked 5 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 months ago
Hi, thanks - makes sense. I'll submit a patch to HuggingFace.