[SOLVED] How do we find how LLM organized; break up due to VRAM

rico001 · 07-27-2023, 06:29 PM

An LLM has parts for helping with programming. Couldn't we seperate out. Everyday speech doesn't use programming or that data... So could we seperate out the parts of the LLM out like a human brain sections (medula... cerebrum) using software such as h2o studio software?

How do we make it more usable by low end computers? looking over orca free willy 2 papers might give give ideas. I think it might help to use multiple computers across networks to run a single LLM. Our human lanuage uses a whole system, not just text. It uses sight, smell, hearing, and checks the feedback it hears or receives from itself (example). opinion: TV with Nelson Ratings is like a quantum computer system or has some quantum properties, perhaps? Thanks for any help.

Example: want to fit 12Gb model into 8G vram? Example llm: Falcon 7 Billion parameters. Apache 2 license, blog with companies info... https://huggingface.co/blog/falcon Thanks for feedback.

astrogeek · 07-28-2023, 07:20 PM

Moved: This thread is more suitable in Linux General and has been moved accordingly to help your thread/question get the exposure it deserves.

rico001 · 07-31-2023, 06:18 PM

Quote:

Originally Posted by astrogeek

Moved: This thread is more suitable in Linux General and has been moved accordingly to help your thread/question get the exposure it deserves.

Thanks. I think it might not be feasable or possible at the current time, for most people without cloud computing/supercomputers, as LLM's require training for about 100 days so far from researching..... I'll try to work on this later, maybe technology will change, posting on blog. found psmathur_orca_mini_3B that is 2/3 faster but has different license. will try to give works cited , security permitting. [subject to change]

the h2o studio supposedly lets you fine tune without programming.
This appears to be differant than cutting uneeded parts out and reformating the config files. Such as removing bin files. SIMPLY CUTTING PARTS OUT This was a waste of time as it was still slow. tried with a different model.

https://www.youtube.com/watch?v=u48QaIAIFw4
Discovering the Potential of LLMs: A Journey through H2O.ai's LLM Studio!
watched at about 23 minutes...

BigScience model, book not found
but license info: https://bigscience.huggingface.co/bl...e-rail-license

too big for me, here is link: https://bigscience.huggingface.co/bl...e-rail-license

for mini orca model, the developers may work on making 4bit:
https://huggingface.co/psmathur/orca_mini_3b
also found: https://huggingface.co/TheBloke/orca_mini_3B-GGML

Other info:

fine-tuning: https://dzone.com/articles/custom-tr...dels-a-compreh
vs.
building your own: https://www.databricks.com/resources...ge-model-dolly
other better link for beginners: http://A jargon-free explanation of ...ge models work

info on orca language model: https://huggingface.co/psmathur/orca_mini_3b; worked well, may be based on GPT4
web ui, used one click installer (Windows/multiplatform) on about middle of page: https://github.com/oobabooga/text-generation-webui
page with install view llm model on ubuntu, view and manage dataset:https://docs.h2o.ai/h2o-llmstudio/gu...s/view-dataset
Security: Should avoid personal info, follow proper guidlines, laws.... who defines what is ultimately good ?the supreme lawgiver of the universe.// fix spelling grammer, update

rico001 · 08-23-2023, 01:33 PM

1. Use Cloud computing/ a friends computer
2. USE LOAD IN 4-BIT: after launch your server: a.load a small llm model that is quantized or is able to run in 4bit b. click 'load-in-4bit' option under the model tab. was faster for speech.
3. (tentative)Possibly working solution, I plan to test sometime: Use a model template and or illiminate uneeded info from model with H2O LLM studio or other software.
4. Use a smaller model
5. pay for a custom or get a free smaller llm.

Let me know if need to show screenshots.