“More than any other place on the internet, Reddit is a home for authentic conversation. There’s a lot of stuff on the site that you’d only ever say in therapy, or A.A., or never at all. The Reddit corpus of data is really valuable, but we don’t need to give all of that value to some of the largest companies in the world for free.”
– Steve Huffman, founder and chief executive of Reddit.
Steve Huffman / Jason Henry for The New York Times
To make and grow models, AI developers need huge amounts of computing power (which the biggest developers have a lot of) and huge amounts of data.
During the last few years, Reddit conversations have served as a free training data source for companies with large language model interests including Google, OpenAI and Microsoft. Reddit believes this data is very valuable for LLM dev, especially because it is constantly updated.
But now, as Reddit prepares for a potential initial public offering this year, it looks forward to beginning to charge for access to its API and the vast database of conversations it hosts. The AI makers have to pay.
But the API will remain free for developers building tools that help people use Reddit, including software that helps moderators monitor Reddit spaces.
“We think that is fair,” Huffman said.