Blocking AI web crawlers from accessing GOS resources

TTheAwesomenESQ · Jun 9, 2024

Are the GOS devs using any tools to block AI webcrawlers from accessing valuable resources from GOS homepage & forums?

If chatbots have unrestricted access, I'm afraid bad actors may use this information to 'potentially' build exploits with ease.

For images there are tools from TheGlazeProject etc, but for websites, I couldn't find much information other than this article on pcworld. Interesting read.

I understand implementing such restrictions may contravene the ethos of an open-source project, but in my humble opinion AI is too unregulated to be allowed the freedom without certain checks and balances.
Thank you.

DDumdum · Jun 9, 2024

TheAwesomenESQ
Real question.

If real people struggle to design exploits for GOS, then how does AI supposedly do so?

other8026 · Jun 9, 2024

TheAwesomenESQ I'm not sure I understand why this would be necessary, or what kind of threat they pose?

TheAwesomenESQ I'm afraid bad actors may use this information to 'potentially' build exploits with ease.

How? So far all AI chatbots can do is make up a bunch of junk that reads like real people wrote it. Also GitHub Copilot is trained on public repositories (according to their FAQ), so they can make up code. If you know even basic programming, you can try using it in an editor that supports AI autocomplete. Some suggestions are just really, really bad. If an AI can't help me with basic programming, I don't see how it can find exploits in GrapheneOS code. I don't think there's any reason to be at all concerned about AI reading through GrapheneOS's public repos.

TheAwesomenESQ I couldn't find much information other than this article on pcworld.

Bots don't have to respect the robots.txt file.

TTheAwesomenESQ · Jun 9, 2024

@other8026 @Dumdum
Agree, very valid questions indeed. I'm aware of the amount of junk AI outputs and their limited capabilities ridiculed across the web. But you'd all agree that AI is evolving and getting better every day.

My line of thinking was based on the amount of brilliant explanations, technical overviews and workarounds for questions within the forum.

If this data is extracted, all it takes is a chat query where the response could cite "according to GOS forum......." and directly point to the solution instead of hours of manual research. Therefore could be potentially reverse engineered to create vulnerabilities based on the answers given.

Maybe my concerns are too hypothetical and far from reality. Sorry about that.

Dde0u · Jun 9, 2024

TheAwesomenESQ My line of thinking was based on the amount of brilliant explanations, technical overviews and workarounds for questions within the forum.

If this data is extracted, all it takes is a chat query where the response could cite "according to GOS forum......." and directly point to the solution instead of hours of manual research.

If that would work, it could be great! Honestly, I think some people may be a little worn out from answering the same questions over and over roughly weekly (e.g., "If I install Play Services, doesn't that defeat the whole purpose of installing GrapheneOS?"). If a chat bot could spit out an engaging, breezy answer that could be great -- as long as the answers were engaging, breezy, and accurate.

TheAwesomenESQ Therefore could be potentially reverse engineered to create vulnerabilities based on the answers given.

I don't think this forum has a lot of content about how to exploit GrapheneOS, so it's not clear what would serve as the basis of the "reverse engineering". I think most people who uncover a vulnerability in GrapheneOS will either disclose it responsibly to the developer team or sell the vulnerability to Cellbrite/XRY etc.

TheAwesomenESQ Maybe my concerns are too hypothetical and far from reality.

Is it possible to identify some actual posts on this forum that could be "reverse engineered" into an attack?
Or is it possible to identify "reverse-engineerable" public posts on some other forum for some other project?

If there is an absence of concrete examples, that would align with the definition of "hypothetical".

There are programs that analyze source-code changelogs to uncover vulnerabilities and craft attacks. Here is a paper from 2008 (long before LLMs) on that topic: Brumley, Poosankam, Song, and Zheng, Automatic Patch-based Exploit Generation, IEEE Symposium on Security and Privacy, May 2008. Open-source code bases are vulnerable to this sort of analysis because the changes are public, but closed-source code bases are vulnerable too, because there are tools to compare a new executable against an old one and reverse the relevant code back into source code, which can then by analyzed by source-based tools.

What to do?

Try to run code bases with fewer vulnerabilities rather than more,
Patch quickly!

Ammako · Jun 9, 2024

Food for thought: if bad actors can use AI to find vulnerabilities, so can the "good guys."

TTheAwesomenESQ · Jul 4, 2024

Latest from Cloudflare
https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click