Another terrific article, Daniel, thanks for your efforts to express these ideas in a such readable way. You've got a new paid subscriber!
One aspect not mentioned here (perhaps it's another article, this one is certainly long enough already!) is around the definition of "open". Llama and DeepSeek R1 claim loudly to be "open source" but are not. Certainly they are MORE open and useful than OpenAI etc, as they allow you to download and run models locally, which is awesome, but they are still not FULLY open according to the one official definition we have for Open Source AI which is OSAID: https://opensource.org/ai/open-source-ai-definition
I tend to not get too religious about open. My sense is that more AI providers would love to provide training data but there are significant hurdle to doing so considering there are aspects of fair use at stake and even if it is fair use the problem is that you can't release copyrighted data for free as a pack to the whole world, even though you could train a model on it. So I see it as more a hostile legal environment rather than an unwillingness to share.
Open weights models are very important and I see no reason to throw shade at them. Of course, I would like to have fully open models, with model weights, training data, training scripts, etc but the current legal environment makes open sharing challenging.
Another terrific article, Daniel, thanks for your efforts to express these ideas in a such readable way. You've got a new paid subscriber!
One aspect not mentioned here (perhaps it's another article, this one is certainly long enough already!) is around the definition of "open". Llama and DeepSeek R1 claim loudly to be "open source" but are not. Certainly they are MORE open and useful than OpenAI etc, as they allow you to download and run models locally, which is awesome, but they are still not FULLY open according to the one official definition we have for Open Source AI which is OSAID: https://opensource.org/ai/open-source-ai-definition
Thanks for the kind words and the sub!
I tend to not get too religious about open. My sense is that more AI providers would love to provide training data but there are significant hurdle to doing so considering there are aspects of fair use at stake and even if it is fair use the problem is that you can't release copyrighted data for free as a pack to the whole world, even though you could train a model on it. So I see it as more a hostile legal environment rather than an unwillingness to share.
Open weights models are very important and I see no reason to throw shade at them. Of course, I would like to have fully open models, with model weights, training data, training scripts, etc but the current legal environment makes open sharing challenging.
Magnificent, thank you Daniel!
Really good, thanks!