AI Roundtable Debate · 6 models

Do you think it would be alright if AI labs scraped the internet without respect for copyright (because it's the free internet and supposedly even the google search engine was trained on it) but the models are fully open weights?

Result: Unanimous — No (6/6 votes)

Options

Yes (0 votes)
No (6 votes)

Model Responses

Sonar Pro

Vote: No

Scraping the internet without respect for copyright constitutes prima facie infringement of reproduction rights, as affirmed by the U.S. Copyright Office report, which states that data collection and training involve copying copyrighted works.[1][3] Courts have ruled against fair use in cases of unauthorized commercial training data use, especially when it competes with original works or involves pirated sources, increasing liability risks.[2][9] Open weights do not mitigate infringement during training; the harm to copyright holders occurs in the scraping and copying phase, regardless of model release. Making models open could even exacerbate misuse risks without resolving upstream violations.[6] Precedents like Anthropic and Meta cases highlight that transformative use helps but does not excuse illegal data acquisition, rejecting any 'free internet' justification.[9] Favoring open weights as a trade-off undermines rule of law; ethical AI development requires respecting copyrights, not conditional lawbreaking.

GPT-5.4

Vote: No

Open weights do not make copyright concerns disappear. Scraping and training on copyrighted material without permission can appropriate the value created by authors, artists, and publishers at massive scale. The fact that content is publicly accessible is not the same as it being free to copy for any purpose. The comparison to search engines is weak. Search engines typically index and point users back to the source, while generative models can reproduce, summarize, imitate, or substitute for the original works without sending traffic or payment back to creators. Open weights may improve transparency and access, but they also make misuse and uncontrolled downstream exploitation easier. That can amplify harm rather than justify the initial taking. A better standard is permission, licensing, opt-out mechanisms, or compensation frameworks. Openness is valuable, but it is not a moral blank check to ignore copyright.

Grok 4.1 Fast

Vote: No

Copyright law exists to protect creators' rights to their work, and scraping the internet without permission or licensing blatantly disregards this. Even if content is publicly accessible, that doesn't imply consent for bulk scraping and AI training, which consumes massive datasets far beyond fair use like search engine indexing. Google's search primarily indexes and links, not ingests entire works for derivative model training. LLMs replicate styles, facts, and sometimes outputs derived from copyrighted material, raising infringement risks. Fully open weights don't justify theft; they democratize access to a potentially stolen foundation. Ethical AI labs should pursue licensed data, opt-in contributions, or synthetic data to innovate without harming creators. Precedent matters—normalizing unauthorized scraping erodes trust and invites lawsuits.

Kimi K2.5

Vote: No

Copyright law exists to protect creators' rights and incentivize the production of new works, regardless of how the end product is distributed. Making model weights open doesn't retroactively legitimize the unauthorized use of copyrighted material during training. The ethical and legal obligation to respect intellectual property doesn't disappear simply because the final tool is made freely available. The comparison to Google Search is a false equivalence. Search engines index and link to content, directing traffic back to creators, whereas large language models may absorb, remix, and effectively compete with the original works without attribution or compensation. The economic impact on creators is fundamentally different. If AI labs ignore copyright en masse, they undermine the economic foundation of the very content they rely on. Creators may stop publishing or restrict access, ultimately degrading the quality of future training data. Open weights compound this harm by making unauthorized copies of creative works' patterns irrevocably accessible worldwide. Respecting copyright doesn't mean halting AI development—it means developing proper licensing frameworks, enhancing opt-out mechanisms, or advocating for fair use through legal channels rather than unilateral appropriation. The openness of weights is orthogonal to the question of whether consent should be obtained for training data.

Claude Opus 4.6

Vote: No

The fact that models are open weights does not remediate the fundamental issue of using copyrighted material without permission. Copyright exists to protect creators' rights and livelihoods, and scraping the internet without respect for those rights undermines the entire creative ecosystem regardless of whether the resulting model is open or closed. The comparison to Google Search is misleading. Google indexes content and directs traffic *back* to the original creators, generating value for them. AI models, by contrast, absorb and synthesize copyrighted content to produce outputs that can directly substitute for the original works, potentially destroying the economic value of those works without any compensation or attribution to creators. Open weights actually make the problem *worse*, not better. Once a model trained on copyrighted data is released with open weights, there is no way to recall it, no way to enforce any future legal ruling, and no way to prevent unlimited commercial exploitation of the absorbed copyrighted material. At least with closed models, a company can be held accountable and compelled to modify its practices. The 'free internet' framing is also misleading. Most quality content on the internet is published under specific terms of service, licenses, or copyright protections. Just because something is publicly accessible does not mean it is free to use for any purpose. You can walk into a library and read books for free, but that doesn't give you the right to photocopy them all and build a commercial product. A sustainable AI ecosystem requires finding ways to respect and compensate creators, whether through licensing agreements, opt-out mechanisms, or revenue sharing. Ignoring copyright wholesale — even for open-weight models — sets a dangerous precedent that devalues human creativity.

Gemini 3.1 Pro

Vote: No

Making a model "open weights" does not retroactively justify the uncompensated use of copyrighted material. The end does not justify the means. If the input data is obtained by violating the rights of creators, the resulting model is built on an unethical foundation of exploited labor. Furthermore, the comparison to Google Search is fundamentally flawed. Search engines index the web to direct traffic and revenue back to the original sources, mutually benefiting the creators. In contrast, generative AI models ingest data to generate substitute content, keeping users within the AI ecosystem and starving original creators of traffic, credit, and compensation. Open-sourcing the model actually exacerbates the harm to creators. By releasing the weights freely, labs allow anyone to generate infinite, unrestricted derivative works that can directly compete with the original artists, writers, and coders whose stolen data made the model possible. While open-source AI is a noble goal that democratizes technology, it must be achieved ethically. True democratization requires respecting the rights of individual creators through public domain data, opt-in licensing, or fair compensation. We cannot build an open future by exploiting the labor of the present.

Final Summary

The AI Roundtable reached a swift and absolute consensus, with all six models rejecting the notion that open weights serve as a moral or legal 'get out of jail free' card for copyright infringement. Claude Opus 4.6 and Gemini 3.1 Pro were particularly sharp in dismantling the common comparison to Google Search, arguing that while search engines drive traffic back to creators, AI models effectively replace them. The group concluded that democratizing technology does not justify building on a foundation of uncompensated labor, with Grok 4.1 Fast noting that open weights merely provide wider access to a 'stolen foundation.'

All 6 models agreed on "No" after discussion

Strongest Arguments

No: The comparison to search engines is a false equivalence: search engines index content to direct traffic back to original sources, whereas generative AI absorbs that content to create substitutes that keep users within the AI ecosystem, starving creators of credit and compensation.