To someone like me, who's not very well-versed in cybersecurity and memory exploits, all the features like memory tagging, guard regions, zeroing out and allocation randomization sound like quite a lot and it would effectively make software written in C/C++ memory safe. Maybe it would crash the app, but creating an exploit that does something using buffer overflows is unlikely to succeed. However, I don't know all the tricks out there. How much security do these features actually add? Is it possible to overcome them?

As per the features page, Hardened Malloc (and other relevant memory management hardening) are integral, significant components to GrapheneOS's approach of defending itself against unknown vulnerabilities. Hardened Malloc is one of the more important hardening projects the project has done. In addition, the GrapheneOS MTE implementation and support is one of the most important additions added to the OS since the project started ten years ago. GrapheneOS and Vanadium are the first platform and web browser to incorporate MTE in production. It is such a massive improvement that future devices will need to support MTE before they can be considered in-support for GrapheneOS.

Hardening the memory management makes up a very large security benefit as most exploits (including the most severe) done on the upstream are memory exploits like overflows or use-after-free. Hardened Malloc is designed to mitigate attacks like these in particular plus other forms of heap/memory corruption exploitation. By changing the most exploited components either by hardening them or replacing them with a more secure alternative, the OS is less likely to be effected by exploits (known or unknown) that would target the Android platform. This is part of the second objective (the first being attack surface reduction) for the OS' exploit mitigations, where we try preventing an attacker from exploiting vulnerabilities by making such a move unreliable, unlikely, costly and difficult to develop an exploit for.

As you've already read, the OS zeroes (sanitizes) the kernel and slab memory when it is freed. This would protect against use-after-free as there would be no data to use when sanitized. Address space and memory allocation re-use is delayed through the combination of deterministic/randomized quarantines to mitigate use-after-free as well. The sanitization would also protect against uninitialized data usage and keeps sensitive data in memory for a less amount of time (only when it is necessary to be).

Other types of corruption are also covered, I would recommend checking the documentation here: https://grapheneos.org/features#exploit-protection
and the README for hardened_malloc itself for detailed information on it's security properties, it is quite in-depth and it wouldn't do enough justice within a forum reply: https://github.com/GrapheneOS/hardened_malloc#security-properties

Hardware memory tagging is a separate thing from Hardened Malloc, but it is integrated by Hardened Malloc and helps provide a form of memory safety for memory unsafe code (from C and C++) and low-level unsafe code in safe languages like Java and Kotlin. MTE is enabled for all base OS apps and almost all executables with some small exceptions on those that have bugs. You can use hardware memory tagging if you have a Pixel 8 or later.

As for overcoming this hardening, it is still possible for someone to develop an exploit for the OS providing they had the skills and budget, even if it hasn't been known to be done. Although, it would be far more difficult for them to do in comparison to other platforms because of the hardening work done, ESPECIALLY on a platform incorporating MTE like Pixel 8 or later. It would also be far more difficult to upkeep an exploitation kit because these sophisticated threats would have to work faster against our pace of adding new security/privacy features and improvements. Making the exploit is already difficult, but making sure it is persistent or functional across updates is a harder effort, especially when this solution would need to be bespoke and made purposefully to target GrapheneOS.

Thank you for writing this up, that is helpful. I have a couple of questions regarding what you said:

  • You mentioned that the kernel also zeroes its freed memory. Does the hardened kernel that's used by GrapheneOS implement its own heap allocation algorithm, like the hardened malloc? AFAIK, malloc cannot be used in the kernel, since it must define all the functions itself.
  • Is there any overflow protection in the stack? Canaries and/or randomisation to protect the return address?
  • You mentioned that if GrapheneOS is targeted specifically, an overflow exploit can potentially be designed for it. Would that supposed attacker need to break the CSPRNG algorithm and/or the hardware memory tagging or would a good understanding of the implementation suffice?

    Carpool7341 You mentioned that the kernel also zeroes its freed memory. Does the hardened kernel that's used by GrapheneOS implement its own heap allocation algorithm, like the hardened malloc? AFAIK, malloc cannot be used in the kernel, since it must define all the functions itself.

    We enable kernel heap hardening features not used by the stock OS and add our own, but it's not a whole new hardened allocator as we do for userspace via hardened_malloc.

    Carpool7341 Is there any overflow protection in the stack? Canaries and/or randomisation to protect the return address?

    The stock Pixel OS uses ShadowCallStack for the kernel return address protection before 8th gen Pixels and PAC on 8th gen, we use both SCS + PAC on the 8th gen Pixels. Clang Control-Flow Integrity is used for forward edge CFI in kernel, which will switch to the more efficient kCFI implementation, same as the stock OS.

    Carpool7341 You mentioned that if GrapheneOS is targeted specifically, an overflow exploit can potentially be designed for it. Would that supposed attacker need to break the CSPRNG algorithm and/or the hardware memory tagging or would a good understanding of the implementation suffice?

    I didn't exactly say that it would be an overflow exploit, rather I was just talking about remote exploitation of the OS in general. Apologies if I made you misinterpret it. If a specific, sophisticated actor were to target the OS then they could possibly consider other categories of exploitation that wouldn't be covered too. Likewise, hardware memory tagging is only bulletproof against certain kinds of vulnerabilities like linear overflows, small overflows, certain classes of use-after-free not able to delay the use until it's reused later, among other things. It is also not the only mitigation on offer in GrapheneOS.

    Hardware memory tagging is mostly based on randomization, there are only 4 bits, 16 possible tags, standard for 0 to be reserved which it is for hardened_malloc and only used for free data (so 15 possible random tags). Attackers would need to bypass it multiple times in most cases, not just once, since it's for any reads and writes. Would be extremely difficult to exploit with success. I would point you to the hardened_malloc README again for some MTE details:

    https://github.com/GrapheneOS/hardened_malloc?tab=readme-ov-file#memory-tagging

    Carpool7341 You mentioned that if GrapheneOS is targeted specifically, an overflow exploit can potentially be designed for it. Would that supposed attacker need to break the CSPRNG algorithm and/or the hardware memory tagging or would a good understanding of the implementation suffice?

    If one notices that lots of exploits use some technique X, getting rid of X, or cutting down on it, is genuinely helpful as hardening.

    But it's hard to make positive statements or to be confident, because the problem with bugs is that a bug is when something is unexpectedly wrong. So trying to say "For somebody to exploit a bug they would need to X" is that some bugs probably don't need X.

    The GrapheneOS project works hard to reduce the exploitability of GrapheneOS. But every month things are patched that were potentially exploitable last month. So running GrapheneOS shouldn't be seen as having a magic shield that enables one to confidently download hundreds of apps from sketchy sources, or visit random shady web sites, on the grounds that the OS has your back.

      de0u

      de0u So trying to say "For somebody to exploit a bug they would need to X" is that some bugs probably don't need X.

      Yup, makes sense. I was just wondering whether the malloc change makes it hard to exploit heap buffer overflows or just inconvenient. Seems that it would, in theory, be hard, though it might be early to say, since it's not battle-tested enough.

      Stack-based buffer overflows might be more exploitable, since Control Flow Integrity has to be enabled at compilation time, so third party apps aren't expected to use it. At the same time, I guess, there are fewer bugs with stack buffers...

      And, of course, there are many other types of bugs and exploits that aren't covered.