Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Amazing but the instruction fine-tuning is still a huge challenge for businesses since what is released cannot be used for commercial purposes. Instructions are much more useful.

I have a feeling that there are probably some people who will look at the "commercial okay" license for the first part and in their mind that will somehow make it okay to use the instruction-tuned ones for commercial purposes.

Maybe we don't really need Instruct stuff? Because it seems like its a huge amount of redoing work. I wonder if the OpenAssistant people will start building off of these models.



The instruct tuning can be done with several open datasets at minimal cost. Should be easy for someone to create their own open model.


How?


You can finetune 7B in a couple of hours on a $200 3060 with https://github.com/johnsmith0031/alpaca_lora_4bit



That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned


I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: