I’ve been attempting to use kexec to write a network bootloader for an Acer R13 Chromebook. I initially tried on a 5.8.7 kernel, then moved on to 5.9-rc4 to see if that fixed the problem (spoiler: it didn’t).
When booted from kexec (instead of the native u-boot), the console is flooded with IOMMU fault messages (fortunately rate limited!) like these:
Code:
[ +0.001401] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffb39000 pa=0xb6d39500 larb=0 port=0 layer=0 read
[ +0.001402] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffbe4000 pa=0xb6de4400 larb=0 port=0 layer=0 read
[ +0.001377] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffc90000 pa=0xb6e90400 larb=0 port=0 layer=0 read
[ +0.001357] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffd3a000 pa=0xb6f3a500 larb=0 port=0 layer=0 read
[ +0.001342] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffde0000 pa=0xb6fe0e00 larb=0 port=0 layer=0 read
[ +0.001336] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xffe86000 pa=0xb6886b00 larb=0 port=0 layer=0 read
[ +0.001316] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xfff2b001 pa=0xb7d5b700 larb=0 port=0 layer=1 read
[ +0.001323] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xfffcf001 pa=0xb7dff200 larb=0 port=0 layer=1 read
[ +0.001325] mtk-iommu 10205000.iommu: fault type=0x4 iova=0xff861000 pa=0xb7e61380 larb=0 port=0 layer=0 read
[ +4.991802] mtk_iommu_isr: 1209385 callbacks suppressed
That’s a lot of suppressed messages, and a similar block is printed every 5 seconds.
The system appears to be perfectly functional otherwise, but the kernel is using ~70% of a CPU core at idle as opposed to ~10% after a non-kexec boot (as reported by htop).
So far I’ve identified
mtk_iommu_isr as the source of the message, and a call to
domain->handler(...) in
report_iommu_fault as responsible for the status code triggering the message. I’m still struggling to figure out where that handler is registered though.
My current thinking is that the actual issue stems from
mtk_iommu_hw_init where the IOMMU hardware on the SoC is initialized. I suspect the hardware isn’t expecting to be initialized again, but without any documentation publicly available from Mediatek this is difficult to prove.
Does anyone have any suggestions as to how I could go about getting to the bottom of this?
PS: I hope this is the right place to post this. If not, please let me know!