How do we find how LLM organized; break up due to VRAM
An LLM has parts for helping with programming. Couldn't we seperate out. Everyday speech doesn't use programming or that data... So could we seperate out the parts of the LLM out like a human brain sections (medula... cerebrum) using software such as h2o studio software?
How do we make it more usable by low end computers? looking over orca free willy 2 papers might give give ideas. I think it might help to use multiple computers across networks to run a single LLM. Our human lanuage uses a whole system, not just text. It uses sight, smell, hearing, and checks the feedback it hears or receives from itself (example). opinion: TV with Nelson Ratings is like a quantum computer system or has some quantum properties, perhaps? Thanks for any help. Example: want to fit 12Gb model into 8G vram? Example llm: Falcon 7 Billion parameters. Apache 2 license, blog with companies info... https://huggingface.co/blog/falcon Thanks for feedback. |
Moved: This thread is more suitable in Linux General and has been moved accordingly to help your thread/question get the exposure it deserves.
|
not feasable to program; but found smaller model
Quote:
the h2o studio supposedly lets you fine tune without programming. This appears to be differant than cutting uneeded parts out and reformating the config files. Such as removing bin files. SIMPLY CUTTING PARTS OUT This was a waste of time as it was still slow. tried with a different model. https://www.youtube.com/watch?v=u48QaIAIFw4 Discovering the Potential of LLMs: A Journey through H2O.ai's LLM Studio! watched at about 23 minutes... BigScience model, book not found but license info: https://bigscience.huggingface.co/bl...e-rail-license too big for me, here is link: https://bigscience.huggingface.co/bl...e-rail-license for mini orca model, the developers may work on making 4bit: https://huggingface.co/psmathur/orca_mini_3b also found: https://huggingface.co/TheBloke/orca_mini_3B-GGML Other info: fine-tuning: https://dzone.com/articles/custom-tr...dels-a-compreh vs. building your own: https://www.databricks.com/resources...ge-model-dolly other better link for beginners: http://A jargon-free explanation of ...ge models work info on orca language model: https://huggingface.co/psmathur/orca_mini_3b; worked well, may be based on GPT4 web ui, used one click installer (Windows/multiplatform) on about middle of page: https://github.com/oobabooga/text-generation-webui page with install view llm model on ubuntu, view and manage dataset:https://docs.h2o.ai/h2o-llmstudio/gu...s/view-dataset Security: Should avoid personal info, follow proper guidlines, laws.... who defines what is ultimately good ?the supreme lawgiver of the universe.// fix spelling grammer, update |
Solutions that work for now
1. Use Cloud computing/ a friends computer
2. USE LOAD IN 4-BIT: after launch your server: a.load a small llm model that is quantized or is able to run in 4bit b. click 'load-in-4bit' option under the model tab. was faster for speech. 3. (tentative)Possibly working solution, I plan to test sometime: Use a model template and or illiminate uneeded info from model with H2O LLM studio or other software. 4. Use a smaller model 5. pay for a custom or get a free smaller llm. Let me know if need to show screenshots. |
All times are GMT -5. The time now is 11:14 PM. |