I'm looking at swapping to GrapheneOS for my main device (replacing 85% of laptop use) because of the upcoming microdroid/virtualization work being done (source: https://grapheneos.social/@GrapheneOS/113185686714810236) as well as 'desktop mode' that should be getting improved bit by bit. My workflow is nearly all CLI/TUI and web browser shenanigans or replaceable by good FOSS apps, so I don't feel there will be too much friction going from a laptop to a phone in this case.
That 85% laptop replacement number could be about 95% if I can also run fairly performant local LLMs, 2B-7B params on the phone directly. The hardware should be no problem and I've seen the odd forum posts (reddit and elsewhere) that people have run 7B models of the Pixel 9 Pro/XL with 5-10t/s but haven't received a reply as to what they're doing/using to get these results (maybe running something like llama.cpp in termux or similar, idk). I currently have an iphone so I can't test it prior to purchasing the P9P and I'd really like to know prior to dropping $1,000 on it.
So yeah, my main questions are in the title, if you have this working, what tools/apps do you use? Is it running on the CPU or the TPU/GPU? What kind of performance do you get out of it? Any additional useful info?
If anyone has any insight, I'd love to hear it! Thanks for your time!