Unleashed.chat

From the folks that protect your coins at coinkite.com

treenutz68 noise. I'll try it. But it can't deannonamoze me right? Grapheneos protects me right?
I mean because i heared that when people use invative applications like spotify, the aplications manage to deannonymize the users. I think it was because of the router the person used but normally it should not be possible because he used vpn.

What do you think?

    000000

    Hmm, I think at most they would be able to track the IP address, cookies or other browser based identifiers. If you techniques that obfuscate IP address and browser based identifiers like Orbot, VPN, and/or specific browsers, you should be able to avoid that type of tracking.

    My understanding is the prompts and response data are only stored in the user's browsers so perhaps that helps as well?

    Here is a link to their privacy policy:
    https://venice.ai/legal/privacy-policy

    16 days later

    Pocketstar

    Hey there! Would you be able to share some example models and the speed that you're getting on G2? I'd love to see it and I've only been able to find like 3 people who have done this on Pixels and only one listed t/s.

    Thanks for your time!

      I can't speak for G2, i have G4 and there are 2 models i use sometimes locally. llama3.2 3b works fairly in terms of speed. Llama 3.1 8b runs :).

      I didn't test many models tbh locally as i mostly use my selfhosted ollama instance which is much more performant. I mainly just use these 2 models mentioned in case once i have no internet and use them basically as a small 'wikipedia'.

      7 days later

      0xAB
      My apologies for the delay, I haven't been on the forum for a while.

      I use the 4-bit (Q4_K_M) Spicyboros 7b from https://huggingface.co/TheBloke/Spicyboros-7B-2.2-GGUF?not-for-all-audiences=true

      Also the 4-bit (Q4_K_M) Silicon maid 7b works decent.
      https://huggingface.co/TheBloke/Silicon-Maid-7B-GGUF?not-for-all-audiences=true

      Strangely Silicon maid is slow on my Pixel tablet, but Spicyboros is not, and vice versa on the phone wheras Spicyboros is slow but Silicon maid is fast.

      I use uncensored models because I don't feel like I want to argue with my phone in order to be able to ask "naughty" questions and get the anwsers for those.

      The speed is decent, it is as fast like if a person is typing, it gets the job done, for me it is not an issue.

      a month later

      I have been having a play with local LLM chat on a Pixel 7 Pro. I've had a look over this thread and others on the forum for app recommendations. Just based on my experiences playing with this for a couple of hours, and without claiming to be "fair" (I'm sure some problems could be rectified with further playing around, and I haven't gone and tried to download alternate versions or report problems to the authors), here's how it went for me:

      Maid

      I got this from F-Droid, which was nice. I was able to chat with this if I downloaded the 1b parameter TinyLlama, but it wasn't terribly coherent - probably a limitation of the model. I tried to download Mistral 7b but the download kept getting stuck party way through so I had to give up. I tried to run a local GGUF of Meta Llama 3.1 8B Instruct Q6_K_L but it either didn't work or was so slow I never got a response - given it wasn't even chewing up my battery, I suspect it wasn't working.

      Private AI (by FireEdge)

      I got this from Aurora Store. It appears to prevent you using larger models or your own GGUFs unless you pay - certainly I kept getting errors about a billing SDK. However, it ran both Gemma 2b and Qwen 2.5 7b from its built-in list of models at very acceptable speeds.

      The first time I tried to download Qwen it got wedged somehow and would neither finish the download nor let me delete the partial download. Every time I tried to chat it would just spew out garbage characters. I fixed this by uninstalling the app and reinstalling, and (not sure it was necessary) I made sure to keep tapping the screen to stop it turning off during the download and installation of the model.

      The selection of models is limited but this is very easy to use, if you ignore the glitch.

      MLCChat

      I got the apk from the github release page and downloaded the Gemma 2b model from the link within the app. The performance seemed noticeably bad, much worse than "Private AI" running notionally the same model. My phone was visibly chugging and the whole phone UI was unresponsive. The response to a simple question took a minute or so, whereas Private AI was taking a few seconds. The GUI said "prefill: 0.2 tokens/s, decode: 4.0 tokens/s", for what it's worth.

      ollama in termux

      I installed termux from Aurora Store and otherwise followed the instructions here. I had to edit one of the files to change "gzip --best" to "gzip -9" but otherwise the instructions worked fine.

      The performance with Llama 3.2 3b was bad. Borderline usable, but very bad. I left it answering a simple question, came back ten minutes later and for some reason my phone had essentially crashed. I had a black screen, pressing the power button gave me a menu allowing me to choose Lockdown/Power Off/Restart, but I couldn't get anything else to come up so I forced it to restart. (As a result, I can't tell you which file I had to make the gzip change in.)

      This could be a termux problem rather than a ollama one, I don't know. In any case, the performance was bad enough that for running LLMs I'm not that interested, although I think termux could be very useful to me in general.

      ChatterUI

      The latest beta from github wouldn't let me add a new model (the button seemed unresponsive) but the 0.8.2 release is working like a charm. I don't properly understand what kind of models it wants - on the Models->Show Settings tab it says that supported quantizations are "Q4_0_4_4 Available" and "Q4_0_4_8 Not Available", but I'm used to downloading models from hugging face with quant names like Q4_KM. However, I had a copy of Meta Llama 3.1 8B Instruct Q6_K_L lying around on my PC which I copied over to my phone and ChatterUI is running it at completely usable speeds.

      TL;DR

      Of the apps I tried here, only "Private AI" and "ChatterUI" performed well. GIven I tried different models on each it's hard to say if one is faster than the other, but they are roughly comparable and streets ahead of the others.

      If you want to play around with local LLM on your phone with minimum fuss, try "Private AI" - except for the download glitch I had, it's pretty much install-and-play.

      If you want to run arbitrary models or go larger than 7b, ChatterUI is the way to go. Depending on what you want to do, a lot of the fun in the LLM space is playing around with whatever the hot new model is, so ChatterUI has a definite advantage here. And while I didn't build it from source, the fact it is open source (AGPL-3.0) is very nice.

      +1 for Private AI (by FireEdge)
      +1 flor LM Studio on Windows

      On my P9P, using Private AI, I run the AI Model from Brighteon.ai
      https://brighteon.ai/Home/

      It is from Mike Adams, aka the Health Ranger, founder of Brighteon...it does skew towards health, but I count that as I plus.

      It is from June, but they are working on an update for 2025.

      You have to register to download, but it can be run fully offline (deny network to Private AI) in Private AI on my phone or on LM Studio.

      Here is a partial list of what is included:

      Neo-Dolphin-Mistral 7B V0.1.6
      8,973 articles from Mercola.com
      26 books on vitamins, minerals, nutrients and natural medicine
      18 books on survival, foraging, wild foods, off grid survival skills, bushcrafting
      17 books on mainstream medicine, COVID, pharmaceuticals, pesticides and herbicides
      Plus all the data from earlier data sets

      HTH

      6 days later

      Anyone got Lite Mistral working in ChatterUI? It's lighweight and fast but rumbles incoherently to itself. Can't figure the instruct sequence. I see it featured on one the screenshots in ChatterUI github.