Hi there, I am having an issue with the latest github version (RC_v1_4_12) of aws-fpga SW.
My goal is to create an AMI that includes the Xilinx XRT from github branch 2019.2 and the aws-fpga version RC_v1_4_12 so that
I can run instances on AWS f1.
In order to do that (using f1.2xlarge):
- I base my new AMI on ami-03746875d916becc0 (ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190628)
- I install all the requirements for the Xilinx XRT and aws-fpga .deb packages
- Install the Xilinx XRT and aws-fpga .deb packages
*** At that point, the tool "xbutil scan" lists the FPGA as ready to be used ***
- I create the AMI using the aws CLI: "aws ec2 create-image"
So far, so good, everything works and I can see my new AMI in the list of 'Owned by me' AMIs.
The problem starts when I use this new AMI to launch an f1.2xlarge instance:
The instance boots with no issues, but the tool "xbutil scan" lists the FPGA *** as not ready ***.
At this point I cannot program/use the FPGA.
After trying varius tricks, I found out that the solution was to restart the Xilinx MPD service which comes
with the aws-fpga package (or re-install the aws-fpga .deb, or just reboot the f1 instance).
After that the FPGA is ready to be used.
Is that MPD service a known issue and planned to fixed in the next versions of the aws-fpga?
Is there something wrong in the steps which I execute in order to create my new AMI?
This is the info that I get when I launch the new AMI:
System INFO:
System Configuration
OS name: Linux
Release: 4.4.0-1098-aws
Version: #109-Ubuntu SMP Fri Nov 8 09:30:18 UTC 2019
Machine: x86_64
Glibc: 2.23
Distribution: Ubuntu 16.04.6 LTS
XRT Information
Version: 2.3.0
Git Hash: ed096afaa193de4c8b0176e9245992cc3a3e7296
Git Branch: 2019.2
Build Date: 2019-12-20 10:20:49
XOCL: 2.3.0,ed096afaa193de4c8b0176e9245992cc3a3e7296
XCLMGMT: unknown
MPD log as reported by the journalctl:
a) During the AMI launch:
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: started
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: found mpd plugin: /opt/xilinx/xrt/lib/libmpd_plugin.so
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: aws mpd plugin init called: 0
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] write 56 bytes out of 56 bytes to fd 4
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] msg arrived on mailbox fd 4
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] retrieved msg size from mailbox: 40 bytes
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] read 72 bytes out of 72 bytes from fd 4, valid: 1
Jan 08 07:18:20 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] mpd daemon: request 11 received(reqSize: 24)
--> At this point xbutil scan reports: <----
[0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ts=0xabcd) user(inst=128)
WARNING: card(s) marked by '' are not ready, run xbmgmt flash --scan --verbose to further check the details.
b) Restart the mpd.service - Xilinx Management Proxy Daemon (MPD):
Jan 08 07:18:24 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] msg arrived on mailbox fd 4
Jan 08 07:18:24 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] retrieved msg size from mailbox: 32 bytes
Jan 08 07:18:24 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] read 64 bytes out of 64 bytes from fd 4, valid: 1
Jan 08 07:24:35 ip-172-31-32-66 mpd[1247]: mpd caught signal 15
Jan 08 07:24:35 ip-172-31-32-66 systemd[1]: Stopping Xilinx Management Proxy Daemon (MPD)...
Jan 08 07:24:37 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] write 56 bytes out of 56 bytes to fd 4
Jan 08 07:24:37 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] mpd_getMsg thread 0 exit!!
Jan 08 07:24:43 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] write 0 bytes out of 2104 bytes to fd 4
Jan 08 07:24:43 ip-172-31-32-66 mpd[1247]: [0:0:1d.0] mpd_handleMsg thread 0 exit!!
Jan 08 07:24:43 ip-172-31-32-66 mpd[1247]: aws mpd plugin fini called
Jan 08 07:24:43 ip-172-31-32-66 mpd[1247]: ended
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: started
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: found mpd plugin: /opt/xilinx/xrt/lib/libmpd_plugin.so
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: aws mpd plugin init called: 0
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] write 56 bytes out of 56 bytes to fd 4
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] msg arrived on mailbox fd 4
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] retrieved msg size from mailbox: 40 bytes
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] read 72 bytes out of 72 bytes from fd 4, valid: 1
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] mpd daemon: request 11 received(reqSize: 24)
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] write 2104 bytes out of 2104 bytes to fd 4
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] msg arrived on mailbox fd 4
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] retrieved msg size from mailbox: 48 bytes
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] read 80 bytes out of 80 bytes from fd 4, valid: 1
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] mpd daemon: request 10 received(reqSize: 32)
Jan 08 07:24:43 ip-172-31-32-66 mpd[2526]: [0:0:1d.0] write 524360 bytes out of 524360 bytes to fd 4
--> At this point xbutil scan reports: <----
[0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ts=0xabcd) user(inst=128)