Google’s AI Might Also Be Trained by Reddit Content

You’ve either heard of Reddit or live under a rock.

In case you belong to the group of uncultured netizens who do not know what Reddit is, Reddit is a social media platform where people have forum-style discussions on their interests, hobbies, passions, and everything else.

From cooking tips and career advice to unpopular opinions and discussions after every single O Level paper, there is a subreddit community for everything.

Think you have a unique niche that nobody else is interested in? Think again because you’ll probably find at least a hundred people on Reddit who enjoy the same unique things as you. More popular niches boast tens of millions of members.

It’s a surprisingly educational place, with certified doctors, lawyers, and technicians offering nuggets of wisdom or answering questions posed by other users.

Google AI Might Be Trained by Reddit

Reddit has reportedly struck a deal with Google to make its user-generated content available for training the tech giant’s artificial intelligence (AI) models. According to Bloomberg, the contract is worth about S$80.8 million per year.

Under this deal, Google’s AI will be given access to Reddit’s posts, comments, and discussions. The AI will then analyse this information and learn from this data to understand patterns, language, and behaviour on the platform.

This training will help Google’s AI become better at tasks like understanding human language, generating responses, or making predictions based on patterns it learns from Reddit’s data.

Until recently, most AI companies trained their AI on data from the open web without seeking permission. After this was proven to be legally questionable (with an increasing number of copyright lawsuits being filed against AI companies by content creators), companies are now trying to get their hands on data by more legal means.

You see, when you ask AI a question, it needs data to answer. If an AI system can’t access the latest news or statistics online, it won’t be able to give you an accurate answer when you ask for “breaking news this week” or “what is the weather like today?”

You might end up seeing news from 2021 or the weather for 24 February 2022, when you want the weather for 24 February 2024.

If companies can’t get their hands on data to train their AI with, their AI will no longer be up-to-date with current happenings, and fewer people would use them.

OpenAI, the creator of ChatGPT, has reportedly been offering news publishers a deal of S$6.7 million per year for their data, setting Reddit’s deal with Google at 12 times the value of OpenAI’s offers.

Apple has also been proposing deals with major news companies that could be worth “at least” S$67 million, according to the New York Times.

Reader: Apple is developing AI?

Hah. You kidding?

The news of Reddit’s deal with Google comes shortly after Reddit had supposedly threatened to cut off Google and Bing’s search crawlers if it couldn’t make a training data deal with AI companies.

This meant that if someone searched for “baking tips” on Google, any user-generated baking content on Reddit would not show up on Google, forcing people to search for Reddit posts on Reddit directly instead of on a search engine. Results from other websites would show up, but Reddit content would no longer be available on Google.

The source of this rumour told The Washington Post that the company “can survive without search”.

I think that’s up for debate. Reddit does have a large userbase but whether or not it can survive without search is really debatable. 

What Led to the Deal

Last year, Reddit announced it would charge companies for access to its application programming interface (API) – the means by which is distributes its content.

Addressing the community about changes to our API
byu/spez inreddit

 

This came about as in order to sustain itself, Reddit could no longer subsidise commercial entities that require large-scale data use. In other other words, if an app wants to uses its contents, they’d have to pay for it.

To remain competitive among rivals like TikTok and Instagram, which attract more advertising spending, this is Reddit’s way of developing a new way to generate revenue.

Unfortunately, the introduction of this API caused some third-party apps, such as Apollo, Reddit is Fun, and Sync to shut down due to its pricing.

Reddit’s deal with Google is its first reported deal with a big AI company.

So yes, Google’s AI like Gemini may be trained by Reddit. Legally.

The Deal May Be Bad for Content Creators

This deal could potentially be seen as disadvantageous for content creators for a number of reasons.

Previously, I mentioned that some content creators had filed lawsuits against AI companies for using their content without seeking permission. Under this new deal with Reddit, Google will have permission to access a large amount of information on Reddit, and it may do so without compensating content creators.

Moreover, their contributions will be used without their direct consent, since Reddit and Google can’t possibly reach out individually to Reddit’s millions of contributors to get their consent. This make cause creators to feel like they’ve lost control over their own work.

If Google’s AI becomes more adept in understanding and generating content based on Reddit’s data, it could even diminish the value of the original creators’ work, which would be extra bad if some of those creators rely on their work for recognition or income.

To put it in simpler terms, let’s look at this analogy. Leonardo Da Vinci’s paintings are extremely valuable; one of his paintings even sold for S$604 million in 2017. Since Da Vinci is no longer around and no one in the world could make an exact replica of his works, his paintings are valuable.

Now imagine some young prodigy comes along who can somehow replicate Da Vinci’s work so perfectly that even the most experienced art connoisseur cannot even distinguish between works by this child and Da Vinci. All of a sudden, the value of Da Vinci’s paintings will drop because there’s someone else who can replicate his works.

The child could paint 20 Mona Lisas and sell them at a fraction of the price. Since these replicas look exactly like the original work, who would pay so much money for the original one? The original work then loses its value.

You see what I mean?

In any case, Redditors who may end up losing their source of income as a result of Reddit’s deal with Google could look at Reddit shares as an alternative.

Reader: Reddit? Shares? No way, Reddit’s shares aren’t available for purchase by the public.

Times are changing, dear reader.

Reddit Might Float Soon

For three years, Reddit has also been considering the option of having an Initial Public Offering (IPO). This refers to making their shares available for purchase by the public on a stock exchange, making it a publicly traded company.

Finally, Reddit is preparing to make its first IPO this week.

The company was valued at about $13.4 billion in a funding round in 2021 and is looking to sell around 10% of its shares in the offering. The IPO filing set to come out soon will detail its financials for the first time to potential investors.

If Reddit floats, meaning it goes through with the IPO, it would be the first IPO of a major social media company since Pinterest floated its shares in 2019.