Vicuna does a better job than GPT4All, but I did notice some of the going off the rails/not stopping - it however strongly leans on "as an AI language model." responses - IMO, any fine-tune based on ChatGPT output really should filter that out, it really knee caps the responses. GPT4All w/ the unfiltered checkpoint was the only one that did OK until I tried Vicuna (13B load-8-bit on GPU I tried Baiz but wasn't impressed, have yet to try Koala, but don't have high expectations). I tried a bunch of other Alpaca/instruction-tuned models and they're better, but IMO still not very good. pythia, gpt-j, gpt-neox, chatglm and the other open raw models I found to be much worse than what the various eval scores would suggest (PIQA, HellaSwag, WinoGrande, ARC-e, etc)? I did a fair amount of playing w/ inference hyper-parameters early on to no avail, but did not do much k-shot learning or proper prompts (like the one's Scale AI uses for training). ![]() I found all the raw LLaMA variants I could run (up to 30B) to not be very coherent or useful. I've used ChatGPT 3.5 and 4 quite a bit, and have done a bunch of comparisons w/ nat.dev's Playground between a variety of models (claude-instant provides gpt-3.5-turbo level output and is about 3-4X faster gpt-3.5-turbo, text-davinci-003 to me are about equal and about the cutoff level of where they are generally useful for me - reliability as an end user for summarizations, Q&A, code assistance, etc). I have an M2 MBA laptop and a 5950X w/ 64GB RAM and an RTX 4090 (24GB VRAM). I'm in the process of testing the various self-hosted LLMs.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |