You may not know Automattic, but they know you.
As the parent company of WordPress, its content management system hosts approximately 43% of the 10 million most popular websites on the Internet. At the same time, it also owns a large number of large platforms, including Tumblr, where there are a lot of embarrassing personal posts. All of which is to say that, through all the myriad terms and conditions and third-party consents, Automattic current has access to a vast amount of internet content and data.
[Related: OpenAI’s Sora pushes us one mammoth step closer towards the AI abyss.]
according to 404 media Earlier this week, Automattic was finalizing deals with OpenAI and Midjourney to provide Ton This information is used in their ongoing artificial intelligence training tasks. Most people see the results in chatbots, as tech companies require text from millions of websites to train large language models for conversational capabilities. But this could also take the form of using selfies to train facial recognition algorithms, or improving image and video generation capabilities by analyzing original artwork you upload online. However, it’s difficult to know exactly what data is used and how much is used, as companies such as Midjourney and OpenAI maintain black box technology products – which is the case for upcoming commercial transactions.
So what if you want to opt out of ChatGPT devouring your confessional Weibo entries or daily workflow? Good luck.
When asked for comment, an Automattic spokesperson said pop science Its “Protecting User Choices” page, published Tuesday afternoon 404 mediaReport. This page attempts to provide you with many guarantees. There’s now a privacy setting that “blocks” search engines on WordPress.com and Tumblr from indexing the site, and Automattic promises to “only share public content hosted on these platforms.” Additional opt-out settings will also “block” AI companies from collecting data, and Automattic plans to regularly update its partners on which users are “newly opt-out” so that their content can be removed from future training and past source sets.
However, there is a caveat to all this:
“Currently, there is no legal requirement for crawlers to follow these preferences,” Automattic said.
The Wild West of Copyright and Privacy
“As far as I know, I’m not really sure what AI can share,” said Erin Coyle, associate professor of media and communications at Temple University. “In terms of the data privacy rights that people have, we’re not very sure right now. It’s a really confusing situation.”
In Coyle’s view, the obscure access to vast amounts of online user information “absolutely illustrates” the lack of coherent privacy legislation in the United States. One of the biggest challenges holding back progress is that the law is generally reactive rather than preventive regulation.
“It’s really hard for legislators to stay ahead of developments, especially when it comes to technology,” she added. “While there are reasons for them to be very careful… it’s also very challenging in an era when technology is evolving so rapidly.”
As companies like OpenAI, Google, and Meta continue the artificial intelligence arms race, ordinary people who provide the bulk of the Internet’s content, both public and private, are caught in the middle. Clicking “yes” to the manifesto-length terms and conditions on the front of almost every app, website or social media platform is often the only way to access these services.
“It’s all about terms of service, no matter which website we’re talking about,” said Christopher Terry, a journalism professor at the University of Minnesota who focuses on regulatory and legal analysis of media ownership, Internet policy and political advertising.
talking pop scienceBasically, Terry explains, every terms of service agreement you sign online is a legal contractual obligation with the website operator. Digging into the legalese, “you’ll find that you’re agreeing to provide them with and allow them to use the data you generate…and you’re allowing them to monetize it.”
Of course, when was the last time you actually read Have those annoying pop-ups?
“Overall, there is no data privacy,” Terry said. “We’ve been living in a digital life for decades, and people have been sharing so much information… without really knowing what happens to that information,” Coyle continued. “Many of us signed these agreements without knowing where AI would be today.”
All that’s needed to sign your data over for potential AI training is a simple Terms of Service update notification — another pop-up that you likely didn’t read before clicking “Agree.”
OpenAI wants to buy access to vast swaths of the internet
If Automattic completes its deal with OpenAI, Midjourney, or any other artificial intelligence company, some of these exact same update alerts will probably pop up on millions of email inboxes and websites, and most people will reflexively dismiss them Shut out. But some researchers believe that even offering voluntary opt-out is not enough in this case.
“Most users are likely unaware that this is an option and/or that the collaboration with OpenAI/Midjourney is happening,” Alexis Shore, a Boston University researcher who focuses on technology policy and communication research, wrote to us pop science. “In this sense, it makes no sense to provide users with this opt-out option when the default settings allow AI crawling.”
Experts such as Shore and Coyle believe that one potential solution is a reversal of approach – switching from voluntary opt-outs to opt-ins, which is increasingly the case for EU internet users due to the EU’s General Data Protection Regulation (GDPR). many. Unfortunately, U.S. lawmakers have yet to make much progress on anything approaching this level of oversight.
If you have enough evidence to prove your case, your next option is to take legal action. Although copyright infringement lawsuits against companies like OpenAI are increasing, it will take several years for their legal precedent to be established. Until then, it’s anyone’s guess what impact the AI industry will have on the digital landscape and your privacy. Terry compared the moment to the gold rush of the 19th century.
“They’re going all out now while they still can,” he said. “You’re going out there now to stake your claim, and you’re putting everything you can into that machine so that later, when that’s a [legal] The problem has been solved. “
As of this writing, neither OpenAI nor Midjourney responded to multiple requests for comment.