We've been making more progress on hardware memory tagging support for Pixel 8 and Pixel 8 Pro. Our initial hardened_malloc integration has no noticeable overhead in fastest asynchronous mode and the asymmetric mode is lower overhead than legacy mitigations like stack canaries.
Asynchronous is very fast but can be bypassed via races. Synchronous is very high overhead and aimed at debugging. It's still much faster than HWAsan (based on Top Byte Ignore) and especially ASan. Asymmetric is nearly as fast as asynchronous and as secure as synchronous.
There isn't any clear way to bypass asynchronous wrote checks for the asymmetric mode since they're checked immediately on reads and system calls. io_uring might be able to bypass it, but it's not relevant since it's only allowed for 2 core SELinux domains (fastbootd, snapuserd).
We plan to enable memory tagging by default in asymmetric mode for the whole base OS including system apps. For user installed apps, our plan is to enable it by default for app processes without their own native code. It will be opt-in for apps with native code for compatibility.
Memory tagging is going to be a huge game changer and GrapheneOS will be on the leading edge deploying it. Stock Pixel OS has it as a developer option which isn't usable in practice since it breaks far too much. The implementation is also much less powerful than hardened_malloc.
Initial hardened_malloc implementation sets random tags for each 128k byte and below allocation. It excludes tags of adjacent allocations and reserves a tag for free allocations. It fully prevents sequential overflows as a vulnerability class and freed memory can't be accessed.
Use-after-free is detected until another allocation is made in the same slot with the same random tag chosen for it. hardened_malloc already defends this by quarantining freed allocations by default. They go through a First-In-First-Out ring buffer and a swap with a random array.
Arbitrary read/write via buffer overflows are caught by the random tags. They're unfortunately currently only 4 bit, but a future architecture revision could raise them to 8 bit. CFI, PAC, etc. only try to defend specific targets and don't work well against arbitrary read/write.
Nearly all remote code execution vulnerabilities in the OS are memory corruption bugs: either use-after-free or buffer overflows. The majority involves the malloc heap and the rest mostly involves the stack which could also use MTE-based defenses to replace SSP + ShadowCallStack.
Most apps are a similar story as the base OS. Chromium has pervasive type confusion bugs which MTE doesn't explicitly protect against, but CFI and PartitionAlloc already do. Vanadium already has CFI enabled unlike Android Chrome, but there are more CFI features we need to enable.
After the initial hardened_malloc memory tagging implementation is shipped and enabled by default for the OS and many user installed apps, we can consider using more selection of tags (see https://github.com/GrapheneOS/hardened_malloc/blob/main/README.md#memory-tagging). We can also consider using MTE beyond inside hardened_malloc.