I have been having a play with local LLM chat on a Pixel 7 Pro. I've had a look over this thread and others on the forum for app recommendations. Just based on my experiences playing with this for a couple of hours, and without claiming to be "fair" (I'm sure some problems could be rectified with further playing around, and I haven't gone and tried to download alternate versions or report problems to the authors), here's how it went for me:
Maid
I got this from F-Droid, which was nice. I was able to chat with this if I downloaded the 1b parameter TinyLlama, but it wasn't terribly coherent - probably a limitation of the model. I tried to download Mistral 7b but the download kept getting stuck party way through so I had to give up. I tried to run a local GGUF of Meta Llama 3.1 8B Instruct Q6_K_L but it either didn't work or was so slow I never got a response - given it wasn't even chewing up my battery, I suspect it wasn't working.
Private AI (by FireEdge)
I got this from Aurora Store. It appears to prevent you using larger models or your own GGUFs unless you pay - certainly I kept getting errors about a billing SDK. However, it ran both Gemma 2b and Qwen 2.5 7b from its built-in list of models at very acceptable speeds.
The first time I tried to download Qwen it got wedged somehow and would neither finish the download nor let me delete the partial download. Every time I tried to chat it would just spew out garbage characters. I fixed this by uninstalling the app and reinstalling, and (not sure it was necessary) I made sure to keep tapping the screen to stop it turning off during the download and installation of the model.
The selection of models is limited but this is very easy to use, if you ignore the glitch.
MLCChat
I got the apk from the github release page and downloaded the Gemma 2b model from the link within the app. The performance seemed noticeably bad, much worse than "Private AI" running notionally the same model. My phone was visibly chugging and the whole phone UI was unresponsive. The response to a simple question took a minute or so, whereas Private AI was taking a few seconds. The GUI said "prefill: 0.2 tokens/s, decode: 4.0 tokens/s", for what it's worth.
ollama in termux
I installed termux from Aurora Store and otherwise followed the instructions here. I had to edit one of the files to change "gzip --best" to "gzip -9" but otherwise the instructions worked fine.
The performance with Llama 3.2 3b was bad. Borderline usable, but very bad. I left it answering a simple question, came back ten minutes later and for some reason my phone had essentially crashed. I had a black screen, pressing the power button gave me a menu allowing me to choose Lockdown/Power Off/Restart, but I couldn't get anything else to come up so I forced it to restart. (As a result, I can't tell you which file I had to make the gzip change in.)
This could be a termux problem rather than a ollama one, I don't know. In any case, the performance was bad enough that for running LLMs I'm not that interested, although I think termux could be very useful to me in general.
ChatterUI
The latest beta from github wouldn't let me add a new model (the button seemed unresponsive) but the 0.8.2 release is working like a charm. I don't properly understand what kind of models it wants - on the Models->Show Settings tab it says that supported quantizations are "Q4_0_4_4 Available" and "Q4_0_4_8 Not Available", but I'm used to downloading models from hugging face with quant names like Q4_KM. However, I had a copy of Meta Llama 3.1 8B Instruct Q6_K_L lying around on my PC which I copied over to my phone and ChatterUI is running it at completely usable speeds.
TL;DR
Of the apps I tried here, only "Private AI" and "ChatterUI" performed well. GIven I tried different models on each it's hard to say if one is faster than the other, but they are roughly comparable and streets ahead of the others.
If you want to play around with local LLM on your phone with minimum fuss, try "Private AI" - except for the download glitch I had, it's pretty much install-and-play.
If you want to run arbitrary models or go larger than 7b, ChatterUI is the way to go. Depending on what you want to do, a lot of the fun in the LLM space is playing around with whatever the hot new model is, so ChatterUI has a definite advantage here. And while I didn't build it from source, the fact it is open source (AGPL-3.0) is very nice.