Google is a company that stays in the headlines consistently for one reason or another. They could either be good reasons or bad reasons. The company made waves by entering a deal with Reddit to use its data to train its AI. Well, it seems that, conveniently, Reddit is only showing search results for Google and no other search engine. This could be a developing story, so more details may come out as time goes on.
Right now, major media companies and publications are entering deals that will fork over their data to AI companies. For example, companies like Axel Springer (owns Business Insider), Vox Media (owns The Verge), and News Corp (owns more than a dozen publications) have entered multi-million-dollar partnerships that will allow OpenAI to legally train on their data.
Well, before many of these deals took place, Google entered a partnership with Reddit that lets the search giant access its content and data. That’s unfortunate, as we found out about this right after we found out that OpenAI was scraping tons of data from social media sites. So, these major companies were making deals that would give AI our data without our knowing.
Reddit seems to be blocking search engines, but not Google
Google isn’t the only search engine this side of the Mississippi. Other search engines have been serving up results for years like Bing (Google’s biggest competitor), DuckDuckGo, Mojeek, and Qwant. There are hundreds out there, but we mostly only know about a handful.
Well, it appears that Reddit only knows one, and that’s Google. According to a new report from 404 Media, when searching for content using “site:reddit.com” you won’t see any recent results if you’re not using Google. It appears that you won’t see any results from the past week or so. This only goes for search engines that don’t rely on Google’s indexing. If a search engine uses Google’s crawlers, then it will surface results.
Users surmise that this is because of the deal that the two companies cut a few months back. It’s just so convenient that Reddit and Google cut a content deal and suddenly, all non-Gooogle search engines can’t access Reddit’s recent content. However, that hasn’t been confirmed just yet.
Crawlers
While there’s no proof that Reddit is blocking other search engines because of the deal, it would make sense. A bit part of AI tech has to do with what are called “crawlers.” Crawlers “crawl” throughout websites and extract important information from them. If you have a website, it has crawlers from different companies on it at all times. It’s important, as this is how search engines index your site. It’s how they surface your site in search results. So, in order to see your website in Google’s search results, your site needs to be crawled by Googlebot, Google’s crawler.
Well, crawlers are also notorious because AI companies use them to extract data to train their models. Well, there’s a way to combat crawlers. Site developers can use “Robots.txt.” This is a file that tells them not to index that site’s data. However, these files can also make exceptions for certain crawlers, allowing them to crawl the site and not others.
Well, since Reddit allows Google to use its data, there’s a chance that it only allows Google to crawl it, so only it can access its data to train Gemini. However, since other companies aren’t able to crawl it to train their models, they’re also not able to index Reddit and surface search results. That’s only speculation.
Mojeek’s CEO’s situation
According to 404 Media, Mojeek’s CEO, Colin Hayhurst, recounted his experience with this issue. The company realized that Reddit was blocking Mojeek’s crawler from indexing the website.
What makes things worse is the fact that Reddit hasn’t responded to his emails. It’s been nearly two months since he emailed the social media site. He told 404 Media in a call that Reddit is “killing everything for search but Google.”
“It’s never happened to us before,” he continued. “Because this happens to us, we get blocked, usually because of ignorance or stupidity or whatever, and when we contact the site you certainly can get that resolved, but we’ve never had no reply from anybody before.”
That’s probably the most frustrating part of this ordeal. Hayhurst has been trying to resolve the issue for over a month with no progress. We’re not sure if other search engines are also experiencing the same issues that he’s experiencing.
Reddit claims no foul play
Reddit has been radio silent to Hayhurst, but not to everyone else. A company spokesperson responded to the accusations.
“This is not at all related to our recent partnership with Google. It is not accurate to say recent Reddit results are not coming up in non-Google search engines because of our recent deal with Google,” said spokesperson Tim Rathschmidt to 404 Media. According to Rathschmidt, Reddit has been shooting down crawlers that want to use data to train AI models.
Rathschmidt continues to say that Reddit has been “in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.”
If true, then that would be good on Reddit. However, we can’t overlook the fact that only Google search engines seem to be getting through to Reddit, and that’s the only company that signed a $60 million deal with it. With that information, it seems that Reddit is only interested in letting sites crawl in if they pay up. That’ll be corroborated if we see news of Microsoft making a deal with Reddit, and suddenly, Bing results start showing recent Reddit posts in its results.
Reddit is already in bad faith with its users. Last year, there was the whole controversy of the company charging an exorbitant amount of money to access its API. After that, signing over its users’ data to Google for use in AI. If Reddit is really selling access to its site for search engines, it could really sour its vision in the public eye.
Developing story
As stated, this is still a developing story, so it will be updated should any more information reach the surface. We’re still waiting for some sort of response from Google on the whole situation.
2024-07-26 15:11:50