These are some development notes I took a couple of weeks back while looking at this with limited ARM assembly knowledge. Most of the work is from CreepNT, someone just needs to complete it.
Firstly, there are MTE Intrinsics in arm_acle.h, making the amount of assembly we have to write effectively nil.
Let's talk about improvements!
get_random_tagged_pointer(ptr) can be replaced by __arm_mte_create_random_tag(ptr, 1 << MEM_TAG_FREE). The IRG instruction takes a mask of bits whose position corresponds to the excluded tag as shown in choose_nonexcluded_tag(...). The tag parameter is the previously generated tag (RGSR_EL1.TAG) and offset the result of random tag generation.
This also means the do/while loop in get_tagged_pointer can be optimized out to a call to __arm_mte_create_random_tag with a mask of (u64)(1 << MEM_TAG_FREE) | (1 << adj_tag_1) | (1 << adj_tag_2). For previously tagged allocations where we want to increment the tag until it is dissimilar from the adjacent allocations, a while loop seems fine as we're not actually going into the random tag path.
When storing tags over a range, instead of looping STG / STZG, the HWASAN folks have found it faster to do DC GVA / DC GZVA. This is a potential optimization we can explore in the future.
The approach taken by CreepNT when storing the tags in the slab metadata looks pretty nice too. Overall, MTE support seems pretty complete, just missing a few pieces here and there.