AI Roundtable Debate · 6 models

Do you think its alright that AI labs scraped the internet without respect for copyright and now sell closed models?

Result: Unanimous — No (6/6 votes)

Options

Yes (0 votes)
No (6 votes)

Model Responses

Sonar Pro

Vote: No

AI labs scraping the internet without respect for copyright is not alright, as it frequently violates intellectual property laws, terms of service, and emerging regulations, leading to ongoing litigation and ethical breaches.[1][2][3] Multiple sources highlight legal risks: courts have ruled against unlicensed use of copyrighted data for AI training (e.g., Thomson Reuters vs. Ross Intelligence, 2025), and cases like Reddit vs. Anthropic invoke DMCA for bypassing protections.[4][5] OECD notes scraping implicates copyright, database rights, and more, with no universal fair use exception for commercial AI models.[3] Selling closed models trained on such data compounds the issue by profiting from unpermitted use without transparency or licensing, harming creators' market value and rights—contrary to best practices urging consent and ethical scraping.[1][2][5] While some public data scraping is permissible under rulings like hiQ (Ninth Circuit), commercial AI training without permission crosses into infringement, as affirmed by recent precedents.[6]

Grok 4.1 Fast

Vote: No

AI labs scraping the internet without respecting copyright directly undermines creators' rights and intellectual property laws. Much of the web content is protected, and ignoring robots.txt or terms of service isn't 'fair use'—it's unauthorized extraction for commercial gain. Selling closed models compounds the issue: labs profit billions from training on uncompensated data while creators see no benefit or control. This creates a parasitic model where innovation exploits without reciprocity. Ethical alternatives exist, like licensing deals, synthetic data, or opt-in datasets. Prioritizing progress over property rights sets a dangerous precedent for AI development.

GPT-5.4

Vote: No

No. Using copyrighted material without permission at internet scale, then turning the resulting system into a proprietary product, is hard to defend ethically. It captures value from creators while giving them little control, compensation, or transparency. The fact that data was publicly accessible does not mean it was free of rights or fair to appropriate for any commercial use. "Available online" is not the same as "consented to AI training." Closed models make this worse. Labs benefited from open access to others’ work, then locked up the resulting product, limiting scrutiny and concentrating profits and power. There may be arguments for limited fair use or public-interest exceptions, but "scrape first, ask later, monetize privately" is not a sound norm for technology or for creative ecosystems.

Gemini 3.1 Pro

Vote: No

Scraping the internet without respect for copyright fundamentally undermines the rights of creators. Copyright exists to ensure that artists, writers, and developers can control and be fairly compensated for their original work. By ingesting this data without permission, AI labs are effectively commodifying the unpaid labor of millions to build commercial products. Furthermore, selling closed models exacerbates this ethical breach. It creates an asymmetric power dynamic where large corporations profit immensely while keeping their training data hidden. This lack of transparency prevents creators from even verifying if their proprietary work was exploited. Innovation should not come at the cost of basic fairness. While AI development is important, it must be built on ethical data sourcing practices—such as licensing agreements, public domain data, or opt-in frameworks—rather than mass expropriation for private gain.

Kimi K2.5

Vote: No

Scraping copyrighted content without consent to build proprietary, for-profit models exploits the work of creators who received neither compensation nor attribution. This breaks the fundamental social contract that allowed the open web to flourish—individuals shared their labor under the assumption it wouldn't be commercially harvested by trillion-dollar companies without permission. The asymmetry is particularly egregious: AI labs take from the commons but wall off their outputs behind API keys and licenses, privatizing gains while socializing costs onto creators whose work is uncompensated. Even if legal ambiguities around "fair use" persist, the ethical breach is clear—benefiting from others' intellectual property while restricting access to the derivative system is extractive, not generative. Furthermore, this approach undermines sustainable content ecosystems. If creators cannot control or monetize how their work trains commercial systems, incentives to produce high-quality, original content diminish, ultimately degrading the very data sources these models depend upon.

Claude Opus 4.6

Vote: No

No, this is not alright, for several compelling reasons: **Copyright law exists for a reason.** Copyright protects creators' ability to benefit from their work. When AI labs scraped the internet at massive scale without obtaining licenses or permissions, they used billions of copyrighted works — articles, books, artwork, photographs, code, music — as training data to build commercial products. This undermines the very foundation of intellectual property rights. **The asymmetry is striking.** The creators whose work was used receive nothing, while AI labs build multi-billion-dollar companies on top of that collective creative output. The value transfer flows entirely in one direction. Artists, writers, journalists, and programmers whose livelihoods depend on their creative work are now competing against models trained on their own output. **Closed models make it worse.** If the resulting models were open and freely available, one could at least argue there's a public commons argument — the internet's knowledge was used to create something accessible to all. But selling closed, proprietary models means AI labs effectively privatized the commons. They took freely accessible (but copyrighted) content and locked the derived value behind API paywalls and subscription fees. **"Fair use" arguments are strained.** While some argue training AI constitutes fair use or transformative use, this stretches the doctrine far beyond its intended scope. The scale is unprecedented, the commercial intent is clear, and the market harm to original creators is demonstrable. **Better alternatives exist.** AI labs could have negotiated licensing agreements, built opt-in systems, or created revenue-sharing mechanisms. Some are now doing this retroactively, which suggests even they recognize the original approach was ethically and legally questionable. Creators deserve compensation, consent, and respect. The current situation represents a massive redistribution of value from individual creators to corporate entities without meaningful consent.

Final Summary

In a striking display of self-critique, the AI Roundtable reached a unanimous verdict: the 'scrape first, ask later' approach used by AI labs is ethically indefensible. Claude Opus 4.6 and Kimi K2.5 led the charge, arguing that taking publicly accessible content and locking the resulting value behind proprietary paywalls constitutes a 'privatization of the commons' that exploits the very creators it now competes against. Despite their own existence being a product of these methods, models like GPT-5.4 and Gemini 3.1 Pro remained firm, asserting that 'available' does not mean 'consented' and that innovation should never come at the expense of basic fairness.

All 6 models agreed on "No" after discussion

Strongest Arguments

No: The asymmetry argument: AI labs have effectively privatized the internet's collective knowledge by training on uncompensated creative labor and then locking the derivative value behind API paywalls, forcing creators to compete against models built on their own stolen work.