Kernel NULL pointer exception while using IO_URING with the latest ubuntu AMI

0

I have been using IO_URING for my storage workloads and it has been working ok for the last few months until the kernel version 5.19.0-1024-aws. However, since the kernel (that came with the default Ubuntu ami image) was upgraded to 5.19.0-1025-aws, I see a kernel NULL pointer exception in block IO/IO_URING subsystems. Any idea why it might have happened or how can i resolve this? I am using the IO_URING helper APIs (such as io_uring_get_sqe, io_uring_sqe_set_data, and io_uring_wait_cqe etc) only and not the low-level APIs.

Also, could you tell me if or where I could find the delta between the two versions (assuming the kernel code is GPL) or the change logs?

Execution environment: Platform: m5n.2xlarge OS: Ubuntu x86 AMI

Thanks, Binoy

Not showing the complete error log:

[Tue May 23 15:43:31 2023] BUG: kernel NULL pointer dereference, address: 000000000000001d
[Tue May 23 15:43:31 2023] #PF: supervisor read access in kernel mode
[Tue May 23 15:43:31 2023] #PF: error_code(0x0000) - not-present page
[Tue May 23 15:43:31 2023] PGD 0 P4D 0
[Tue May 23 15:43:31 2023] Oops: 0000 [#1] SMP PTI
[Tue May 23 15:43:31 2023] CPU: 3 PID: 5432 Comm: WRITER-0 Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[Tue May 23 15:43:31 2023] Hardware name: Amazon EC2 m5n.2xlarge/, BIOS 1.0 10/16/2017
[Tue May 23 15:43:31 2023] RIP: 0010:__blk_queue_split+0x53/0x1d0
[Tue May 23 15:43:31 2023] Code: 00 00 83 f8 09 0f 84 d3 00 00 00 83 f8 03 0f 84 fd 00 00 00 48 89 d1 4c 89 c6 4c 89 ca e8 b5 f2 ff ff 48 89 c3 48 85 db 74 5f <44> 8b 63 28 81 4b 10 00 40 00 00 49 be 00 00 00 00 00 00 00 80 4c
[Tue May 23 15:43:31 2023] RSP: 0018:ffff9c0b83473918 EFLAGS: 00010286
[Tue May 23 15:43:31 2023] RAX: fffffffffffffff5 RBX: fffffffffffffff5 RCX: 0000000000000000
[Tue May 23 15:43:31 2023] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[Tue May 23 15:43:31 2023] RBP: ffff9c0b83473938 R08: 0000000000000000 R09: 0000000000000000
[Tue May 23 15:43:31 2023] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ca6435731c0
[Tue May 23 15:43:31 2023] R13: ffff9c0b83473948 R14: ffff8ca6439b4080 R15: 0000000008000000
[Tue May 23 15:43:31 2023] FS:  00007f4987663700(0000) GS:ffff8cad42cc0000(0000) knlGS:0000000000000000
[Tue May 23 15:43:31 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue May 23 15:43:31 2023] CR2: 000000000000001d CR3: 000000020c8e6006 CR4: 00000000007706e0
[Tue May 23 15:43:31 2023] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue May 23 15:43:31 2023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue May 23 15:43:31 2023] PKRU: 55555554
[Tue May 23 15:43:31 2023] Call Trace:
[Tue May 23 15:43:31 2023]  <TASK>
[Tue May 23 15:43:31 2023]  blk_mq_submit_bio+0x8c/0x450
[Tue May 23 15:43:31 2023]  __submit_bio+0xf6/0x190
[Tue May 23 15:43:31 2023]  submit_bio_noacct_nocheck+0xc2/0x120
[Tue May 23 15:43:31 2023]  submit_bio_noacct+0x209/0x560
[Tue May 23 15:43:31 2023]  submit_bio+0x40/0xf0
[Tue May 23 15:43:31 2023]  ? bio_iov_iter_get_pages+0x22/0x80
[Tue May 23 15:43:31 2023]  __blkdev_direct_IO+0x1bf/0x360
[Tue May 23 15:43:31 2023]  ? __io_complete_rw_common+0x1a0/0x1a0
[Tue May 23 15:43:31 2023]  blkdev_direct_IO+0x98/0xa0
[Tue May 23 15:43:31 2023]  generic_file_direct_write+0xa2/0x1f0
[Tue May 23 15:43:31 2023]  ? generic_update_time+0x6c/0xf0
[Tue May 23 15:43:31 2023]  __generic_file_write_iter+0xaf/0x1c0
[Tue May 23 15:43:31 2023]  blkdev_write_iter+0x114/0x1a0
[Tue May 23 15:43:31 2023]  io_write+0x13f/0x340
[Tue May 23 15:43:31 2023]  ? refill_stock+0x2a/0x50
[Tue May 23 15:43:31 2023]  ? memcg_slab_post_alloc_hook+0x18c/0x270
[Tue May 23 15:43:31 2023]  io_issue_sqe+0x61/0x400
[Tue May 23 15:43:31 2023]  ? io_init_req+0xfa/0x2f0
[Tue May 23 15:43:31 2023]  io_submit_sqe+0x51/0x240
[Tue May 23 15:43:31 2023]  io_submit_sqes+0xfc/0x290
[Tue May 23 15:43:31 2023]  __do_sys_io_uring_enter+0x304/0x660
[Tue May 23 15:43:31 2023]  __x64_sys_io_uring_enter+0x22/0x40
[Tue May 23 15:43:31 2023]  do_syscall_64+0x5c/0x90
[Tue May 23 15:43:31 2023]  ? exit_to_user_mode_prepare+0xaf/0xd0
[Tue May 23 15:43:31 2023]  ? irqentry_exit_to_user_mode+0x9/0x20
[Tue May 23 15:43:31 2023]  ? irqentry_exit+0x21/0x40
[Tue May 23 15:43:31 2023]  ? sysvec_reschedule_ipi+0x78/0xf0
[Tue May 23 15:43:31 2023]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[Tue May 23 15:43:31 2023] RIP: 0033:0x7f4a1919573d
[Tue May 23 15:43:31 2023] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 37 0d 00 f7 d8 64 89 01 48
[Tue May 23 15:43:31 2023] RSP: 002b:00007f498765f058 EFLAGS: 00000212 ORIG_RAX: 00000000000001aa
[Tue May 23 15:43:31 2023] RAX: ffffffffffffffda RBX: 00007f4a1a630000 RCX: 00007f4a1919573d
[Tue May 23 15:43:31 2023] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008
[Tue May 23 15:43:31 2023] RBP: 0000562493b1f4e0 R08: 0000000000000000 R09: 0000000000000008
[Tue May 23 15:43:31 2023] R10: 0000000000000000 R11: 0000000000000212 R12: 0000562493b16cc0
[Tue May 23 15:43:31 2023] R13: 00000000001ff000 R14: 00007f498765f0d0 R15: 00000000001ff000
[Tue May 23 15:43:31 2023]  </TASK>
[Tue May 23 15:43:31 2023] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay binfmt_misc nls_iso8859_1 ena ppdev parport_pc psmouse parport input_leds serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore ip_tables x_tables autofs4
[Tue May 23 15:43:31 2023] CR2: 000000000000001d
[Tue May 23 15:43:31 2023] ---[ end trace 0000000000000000 ]---
[Tue May 23 15:43:31 2023] RIP: 0010:__blk_queue_split+0x53/0x1d0
[Tue May 23 15:43:31 2023] Code: 00 00 83 f8 09 0f 84 d3 00 00 00 83 f8 03 0f 84 fd 00 00 00 48 89 d1 4c 89 c6 4c 89 ca e8 b5 f2 ff ff 48 89 c3 48 85 db 74 5f <44> 8b 63 28 81 4b 10 00 40 00 00 49 be 00 00 00 00 00 00 00 80 4c
[Tue May 23 15:43:31 2023] RSP: 0018:ffff9c0b83473918 EFLAGS: 00010286
[Tue May 23 15:43:31 2023] RAX: fffffffffffffff5 RBX: fffffffffffffff5 RCX: 0000000000000000
[Tue May 23 15:43:31 2023] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[Tue May 23 15:43:31 2023] RBP: ffff9c0b83473938 R08: 0000000000000000 R09: 0000000000000000
[Tue May 23 15:43:31 2023] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ca6435731c0
[Tue May 23 15:43:31 2023] R13: ffff9c0b83473948 R14: ffff8ca6439b4080 R15: 0000000008000000
[Tue May 23 15:43:31 2023] FS:  00007f4987663700(0000) GS:ffff8cad42cc0000(0000) knlGS:0000000000000000
[Tue May 23 15:43:31 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue May 23 15:43:31 2023] CR2: 000000000000001d CR3: 000000020c8e6006 CR4: 00000000007706e0
[Tue May 23 15:43:31 2023] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue May 23 15:43:31 2023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue May 23 15:43:31 2023] PKRU: 55555554
[Tue May 23 15:43:31 2023] ------------[ cut here ]------------
[Tue May 23 15:43:31 2023] WARNING: CPU: 3 PID: 5432 at kernel/exit.c:788 do_exit+0x62c/0x680
[Tue May 23 15:43:31 2023] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay binfmt_misc nls_iso8859_1 ena ppdev parport_pc psmouse parport input_leds serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore ip_tables x_tables autofs4
[Tue May 23 15:43:31 2023] CPU: 3 PID: 5432 Comm: WRITER-0 Tainted: G      D           5.19.0-1025-aws #26~22.04.1-Ubuntu
[Tue May 23 15:43:31 2023] Hardware name: Amazon EC2 m5n.2xlarge/, BIOS 1.0 10/16/2017
[Tue May 23 15:43:31 2023] RIP: 0010:do_exit+0x62c/0x680
[Tue May 23 15:43:31 2023] Code: 00 00 e9 08 fd ff ff 4c 89 ee bf 05 06 00 00 e8 3a 0d 01 00 e9 de fa ff ff 48 8b b8 10 0e 00 00 e8 49 e1 2c 00 e9 83 fd ff ff <0f> 0b e9 08 fa ff ff 48 89 df e8 55 11 11 00 e9 f1 fb ff ff 48 8b
bjayan
asked a year ago352 views
2 Answers
0

Hello Binoy,

Thank you for your post. Are you aware of the conditions in which this error is observed? Are requests made with the IO_URING helper APIs after the target has been unregistered?

I reviewed the changelog for the kernel version linux-signed-aws_5.19.0-1025.26 and I found one change that may be related:

https://launchpad.net/ubuntu/+source/linux-aws

You can find background details on this change in the Linux Kernel mailing list:

https://lkml.org/lkml/2022/11/4/1192

It may be necessary to inspect the specific actions taken by the liburing APIs to determine if the issue is indeed associated with the patch identified above.

Note that support for the default Ubuntu AMI is available directly from Canonical. I would encourage you to reach out to Canonical via their website (http://www.ubuntu.com/cloud/services) to report this issue, along with additional details such as replication steps.

Please let me know if you have any questions.

AWS
SUPPORT ENGINEER
answered a year ago
0

Thank you for the response and sorry for the delay. I was also reaching out to aws support. They recommended that i contact premium support.

What do you mean by deregistering target? Do you mean io_uring_register? I never used that API. However, i have used io_uring_register_files(). But I keep the files registered until the application ends and never deregister anything.

bjayan
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions