Last year, Stack Overflow became one of the first sites to announce it would charge the AI giant for content used to train chatbots. Now the popular Q&A service for programmers has signed on its first customer – Google – which CEO Prashanth Chandrasekar says is the start of a “meaningful” new revenue stream.
The deal is significant because it’s unclear to what extent Google and other AI developers will pay for content needed for AI projects. Millions of books and websites power artificial intelligence systems, but most publishers are not compensated and some are suing over what they call abuses. Many publishers, including Stack Overflow, appear to be threatened by ChatGPT and other generative AI products that answer queries previously sent to coders.
The deal will see Google’s cloud arm use Stack Overflow questions and answers about Google’s cloud services to provide coding help and technical support through a version of Google’s Gemini chatbot. Google Cloud customers can also ask questions through Google Cloud’s command line interface. “Their AI may not have all the answers, so we have great capabilities to help complete the cycle,” Chandrasekhar said. “We are the largest place to curate and validate community knowledge.”
Gemini will summarize answers derived from Stack Overflow in its own words, but include the company’s logo, a link back to the original material, and the username of the site contributor who provided the material. The companies plan to demonstrate the system at Google Cloud Next, the search company’s annual cloud conference, in April and launch it soon after.
Chandrasekar said there are no major restrictions on how Google Cloud can use Stack Overflow data, meaning it can be used to train large language models and other artificial intelligence systems. “What we want to be firm on – things that are non-negotiable for us – is trust, accuracy, quality and attribution to the source of these AI outputs,” he said.
He declined to say how much Google paid Stack Overflow for the data. “This is a commercial product that makes sense for us in the short, medium and long term,” Chandrasekhar said.
Covert crawling
Google and other AI developers have previously collected data from Stack Overflow and other sites without much notice. As demand for generative AI technologies surges, and the valuations of the companies developing them soar, sites that provide the underlying text are starting to demand what they consider their fair share. Fortunately for Stack Overflow, potential customers have taken notice, Chandrasekar said. “We don’t have to chase people,” he said.
Stack Overflow data is particularly beneficial for artificial intelligence systems that generate computer code, which has proven popular with software engineers and a significant source of revenue for Microsoft and OpenAI.
The new Stack Overflow deal comes just a week after Google struck a licensing deal to obtain data from discussion forum operator Reddit, whose content contributes to the chatbot’s conversational capabilities. Before Stack Overflow last year, Reddit announced plans to start charging for data access.