Its about training the AI language models. They are not doing this for pond knowledge. It is to train these AI models how humans converse and how AI can be more human. So online forums are a great way for the models to learn, because there are discussions back and forth. Reddit is a big source for learning. This is an open forum that is free so it is perfect for these AI models. Its a big issue for online media companies like WSJ, NYT, WAPO because their content is being used without compensation.
A quote from one article:
This battle exists because of the way AI chatbots are made. The so-called large-language-model algorithms that power these bots have to be trained by taking in and processing oceans of existing language to try to mimic what humans say, and how they say it. This isn’t the kind of data we’re used to thinking about as a commodity on the internet, such as the behavioral and personal information used to target ads by companies like
Facebook parent Meta Platforms.
This data is the creative output of the human users of various services, such as the hundreds of millions of posts by Reddit’s users. Only on the web can you find sufficiently sized repositories of such human-generated words. And without it, all of today’s chat-based AIs and related technologies wouldn’t work.