Chatgpt FOSS and Offline alternative?

0000000 · Oct 5, 2024

Is there an AI i can use on Grapheneos?
Privacy friendly offline on my device?
I don't know anything about ai yet i never used one.
But since i have many questions and googeling them lead to mostly nothing but cookiefeeding my browser.
So i thaught if i could ask an ai the stuff it mabe knows anything because it was pretrained.
But i need an secure app for this open source and with no internet connection. I dont want every question to be leaked.

Ddoublefree · Oct 5, 2024

There are open source models (labeled like this, I did not investigated the details).
However running anything with half of the "quality" of GPT-4o* offline in a (Pixel) phone will likely take days to write a respone.
The best option is likely to use duckduckgo.

yore · Oct 5, 2024

000000 You can install Meta's Llama LLM and use it offline. Although I believe it only runs on Windows, macOS and Linux. The model sizes are also quite large, and the only competitive ones are 256+ GiB in size, which would not be feasible on most phones anyway.

Rryrona · Oct 6, 2024

000000 Is there an AI i can use on Grapheneos? Privacy friendly offline on my device?

No, unfortunately not. Even smaller models like 8B or 14B models require like 10 GB of RAM to run, and a very strong CPU. Or a similarly sized GPU. This is generally not possible to run on phones yet. For one, if the CPUs or GPUs in phones were powerful enough, you would still drain your fully charged battery by just asking 3-5 questions. And model sizes below 8B are generally worthless, too stupid for anything.

But you can run 8B and 14B models on most modern computers. To get started, download LM Studio, and then install Gemma 2 8B, Qwen 2.5 14B, Llama 3.1 8B or another top performing model in that parameter range. LM Studio does not leak your chats, it runs entirely locally on your computer, and works perfectly even if you disconnects the internet. All while being beginner friendly.

And as @yore said, you need to go all the way up to 70B models or the like to be competitive with GPT4, and now we are talking about a computer with 128 GB RAM and four serial connected 24 GB VRAM GPUs. The 8B/14B models are comparable with the original ChatGPT version, not GPT4 or other modern ones.

Pocketstar · Oct 6, 2024

ChatterUI is working great for me, also with 8b models, offline as well as online models are supported.
On my Tensor G2 devices it is fast enough for me, but experiences may vary across models and what speed the user demands.
It has an AGPL-3.0 license.
It uses GGUF models, look for those on huggingface.
https://huggingface.co/models?sort=trending&search=gguf
https://github.com/Vali-98/ChatterUI/releases

Ttreenutz68 · Oct 6, 2024

000000

This isn't offline and it isn't FOSS, but you can read about venice.ai

It doesn't require a login so if you can use it over VPN or TOR, perhaps that is private enough for you?

0000000 · Oct 6, 2024

treenutz68 so i could use it over the web browser? For free? And it's like chatgpt/ it can answer questions?

r134a · Oct 6, 2024

There is also 'Maid', an app that can use models locally.

Depending on whether you use the f-droid store, below are both links to f-droid and github.

F-droid: https://f-droid.org/packages/com.danemadsen.maid/

Github: https://github.com/Mobile-Artificial-Intelligence/maid

Ggoskm75f · Oct 6, 2024

Cool alternative: duck.ai

Ttreenutz68 · Oct 6, 2024

000000

Yes and yes and yes :)

Bbayesian · Oct 7, 2024

Unleashed.chat

From the folks that protect your coins at coinkite.com

0000000 · Oct 7, 2024

treenutz68 noise. I'll try it. But it can't deannonamoze me right? Grapheneos protects me right?
I mean because i heared that when people use invative applications like spotify, the aplications manage to deannonymize the users. I think it was because of the router the person used but normally it should not be possible because he used vpn.

What do you think?

Ttreenutz68 · Oct 7, 2024

000000

Hmm, I think at most they would be able to track the IP address, cookies or other browser based identifiers. If you techniques that obfuscate IP address and browser based identifiers like Orbot, VPN, and/or specific browsers, you should be able to avoid that type of tracking.

My understanding is the prompts and response data are only stored in the user's browsers so perhaps that helps as well?

Here is a link to their privacy policy:
https://venice.ai/legal/privacy-policy

DDeletedUser161 · Oct 8, 2024

MLC-AI GitHub, Android Release - GitHub

00xAB · Oct 23, 2024

Pocketstar

Hey there! Would you be able to share some example models and the speed that you're getting on G2? I'd love to see it and I've only been able to find like 3 people who have done this on Pixels and only one listed t/s.

Thanks for your time!

r134a · Oct 23, 2024

I can't speak for G2, i have G4 and there are 2 models i use sometimes locally. llama3.2 3b works fairly in terms of speed. Llama 3.1 8b runs :).

I didn't test many models tbh locally as i mostly use my selfhosted ollama instance which is much more performant. I mainly just use these 2 models mentioned in case once i have no internet and use them basically as a small 'wikipedia'.

Pocketstar · Oct 30, 2024

0xAB
My apologies for the delay, I haven't been on the forum for a while.

I use the 4-bit (Q4_K_M) Spicyboros 7b from https://huggingface.co/TheBloke/Spicyboros-7B-2.2-GGUF?not-for-all-audiences=true

Also the 4-bit (Q4_K_M) Silicon maid 7b works decent.
https://huggingface.co/TheBloke/Silicon-Maid-7B-GGUF?not-for-all-audiences=true

Strangely Silicon maid is slow on my Pixel tablet, but Spicyboros is not, and vice versa on the phone wheras Spicyboros is slow but Silicon maid is fast.

I use uncensored models because I don't feel like I want to argue with my phone in order to be able to ask "naughty" questions and get the anwsers for those.

The speed is decent, it is as fast like if a person is typing, it gets the job done, for me it is not an issue.

Ssr967 · Nov 29, 2024

I have been having a play with local LLM chat on a Pixel 7 Pro. I've had a look over this thread and others on the forum for app recommendations. Just based on my experiences playing with this for a couple of hours, and without claiming to be "fair" (I'm sure some problems could be rectified with further playing around, and I haven't gone and tried to download alternate versions or report problems to the authors), here's how it went for me:

Maid

I got this from F-Droid, which was nice. I was able to chat with this if I downloaded the 1b parameter TinyLlama, but it wasn't terribly coherent - probably a limitation of the model. I tried to download Mistral 7b but the download kept getting stuck party way through so I had to give up. I tried to run a local GGUF of Meta Llama 3.1 8B Instruct Q6_K_L but it either didn't work or was so slow I never got a response - given it wasn't even chewing up my battery, I suspect it wasn't working.

Private AI (by FireEdge)

I got this from Aurora Store. It appears to prevent you using larger models or your own GGUFs unless you pay - certainly I kept getting errors about a billing SDK. However, it ran both Gemma 2b and Qwen 2.5 7b from its built-in list of models at very acceptable speeds.

The first time I tried to download Qwen it got wedged somehow and would neither finish the download nor let me delete the partial download. Every time I tried to chat it would just spew out garbage characters. I fixed this by uninstalling the app and reinstalling, and (not sure it was necessary) I made sure to keep tapping the screen to stop it turning off during the download and installation of the model.

The selection of models is limited but this is very easy to use, if you ignore the glitch.

MLCChat

I got the apk from the github release page and downloaded the Gemma 2b model from the link within the app. The performance seemed noticeably bad, much worse than "Private AI" running notionally the same model. My phone was visibly chugging and the whole phone UI was unresponsive. The response to a simple question took a minute or so, whereas Private AI was taking a few seconds. The GUI said "prefill: 0.2 tokens/s, decode: 4.0 tokens/s", for what it's worth.

ollama in termux

I installed termux from Aurora Store and otherwise followed the instructions here. I had to edit one of the files to change "gzip --best" to "gzip -9" but otherwise the instructions worked fine.

The performance with Llama 3.2 3b was bad. Borderline usable, but very bad. I left it answering a simple question, came back ten minutes later and for some reason my phone had essentially crashed. I had a black screen, pressing the power button gave me a menu allowing me to choose Lockdown/Power Off/Restart, but I couldn't get anything else to come up so I forced it to restart. (As a result, I can't tell you which file I had to make the gzip change in.)

This could be a termux problem rather than a ollama one, I don't know. In any case, the performance was bad enough that for running LLMs I'm not that interested, although I think termux could be very useful to me in general.

ChatterUI

The latest beta from github wouldn't let me add a new model (the button seemed unresponsive) but the 0.8.2 release is working like a charm. I don't properly understand what kind of models it wants - on the Models->Show Settings tab it says that supported quantizations are "Q4_0_4_4 Available" and "Q4_0_4_8 Not Available", but I'm used to downloading models from hugging face with quant names like Q4_KM. However, I had a copy of Meta Llama 3.1 8B Instruct Q6_K_L lying around on my PC which I copied over to my phone and ChatterUI is running it at completely usable speeds.

TL;DR

Of the apps I tried here, only "Private AI" and "ChatterUI" performed well. GIven I tried different models on each it's hard to say if one is faster than the other, but they are roughly comparable and streets ahead of the others.

If you want to play around with local LLM on your phone with minimum fuss, try "Private AI" - except for the download glitch I had, it's pretty much install-and-play.

If you want to run arbitrary models or go larger than 7b, ChatterUI is the way to go. Depending on what you want to do, a lot of the fun in the LLM space is playing around with whatever the hot new model is, so ChatterUI has a definite advantage here. And while I didn't build it from source, the fact it is open source (AGPL-3.0) is very nice.

Hharvel · Nov 30, 2024

+1 for Private AI (by FireEdge)
+1 flor LM Studio on Windows

On my P9P, using Private AI, I run the AI Model from Brighteon.ai
https://brighteon.ai/Home/

It is from Mike Adams, aka the Health Ranger, founder of Brighteon...it does skew towards health, but I count that as I plus.

It is from June, but they are working on an update for 2025.

You have to register to download, but it can be run fully offline (deny network to Private AI) in Private AI on my phone or on LM Studio.

Here is a partial list of what is included:

Neo-Dolphin-Mistral 7B V0.1.6
8,973 articles from Mercola.com
26 books on vitamins, minerals, nutrients and natural medicine
18 books on survival, foraging, wild foods, off grid survival skills, bushcrafting
17 books on mainstream medicine, COVID, pharmaceuticals, pesticides and herbicides
Plus all the data from earlier data sets

HTH