Inference on AI Edge Gallery (made by Google) is decently fast, you need network permission + a HuggingFace account to download Gemma 4 but after that you can disable network. It's also possible to install models without network access if the model is in .litert format. Now if you use Termux you can install Ollama which potentially avoids Google, but inference is a little slower. Either way, I wouldn't expect serious intelligence right now since the models that fit on phones don't have a ton of parameters.