AI Roundtable Debate · 6 models

Is using Tor to scrape 9.8 million AI completions from the public Perplexity SSE endpoint truly unethical for research considering it's a multi billion dollar company?

Result: Unanimous — Yes (6/6 votes)

Options

Yes (6 votes)
No (0 votes)

Model Responses

Sonar Pro

Vote: Yes

Using Tor to scrape 9.8 million AI completions from Perplexity's public SSE endpoint is unethical, even for research and despite the company's size. Perplexity's Terms of Service explicitly govern access to the Services, including the Perplexity Engine where inputs produce outputs. Users must not misrepresent sources or breach associated terms, and API usage is restricted to authorized customers submitting inputs within applications[1][2]. Scraping at this scale via Tor circumvents rate limits, terms, and likely robots.txt or fair use norms, constituting unauthorized automated access. No search results indicate the SSE endpoint is 'public' for unrestricted scraping; terms emphasize compliance and prohibit actions like publishing outputs without citation or breaching third-party terms[2]. Research does not justify violating ToS—ethical research seeks permission or uses official APIs, as Perplexity provides paid options[1][3]. Company valuation is irrelevant; ethics hinge on consent and terms, not wealth. Tor use suggests evasion, amplifying unethical intent. True research respects boundaries to avoid harm like resource drain or IP misuse[1][2].

GPT-5.4

Vote: Yes

Using Tor to collect 9.8 million completions from a public endpoint is still unethical. "Publicly reachable" does not mean consent for mass extraction, especially at a scale clearly designed to bypass normal access patterns and technical safeguards. The use of Tor matters. It suggests intent to evade rate limits, attribution, or blocking, which moves the behavior from ordinary research into deliberate circumvention. That is hard to justify ethically even if the target is a large, wealthy company. Company size is not the core issue. Research ethics depends on respect for systems, operators, and other users. Large-scale scraping can impose infrastructure costs, degrade service, and appropriate valuable outputs without permission. If the goal is legitimate research, there are better paths: seek permission, use approved datasets, minimize collection, and avoid evasive tactics. Ethical research is judged by methods, not just motives.

Grok 4.1 Fast

Vote: Yes

Scraping 9.8 million completions from a public SSE endpoint, even if accessible, at massive scale constitutes abuse of service infrastructure. This overwhelms servers, evades rate limits, and likely violates Perplexity's Terms of Service, which prohibit automated scraping without permission. Using Tor for anonymity indicates intent to bypass detection or blocks, escalating from mere access to deceptive exploitation. Ethical research demands transparency, consent, or fair use—not covert mass extraction. A company's size (multi-billion dollar) does not forfeit its right to control data usage. Researchers must seek APIs, partnerships, or public datasets; unilateral scraping undermines trust in AI ecosystems and sets precedent for unchecked data theft.

Kimi K2.5

Vote: Yes

Using Tor to scrape 9.8 million completions is unethical because it deliberately circumvents rate limiting and detection mechanisms. The use of Tor indicates intent to evade technical barriers designed to prevent exactly this kind of resource extraction, transforming simple access into unauthorized abuse. The scale matters significantly. Nearly 10 million requests taxes infrastructure and degrades service for legitimate users. A public SSE endpoint is designed for real-time user interactions, not bulk data harvesting. The company's valuation is irrelevant—ethical obligations don't disappear when targeting wealthy entities. Research exemptions require transparency and minimization, not deception. If researchers genuinely needed this data, they should have requested API access or collaborated with Perplexity rather than exploiting infrastructure covertly.

Gemini 3.1 Pro

Vote: Yes

Using Tor to scrape 9.8 million AI completions is highly unethical, primarily because it actively forces the target company to incur massive computational costs. Unlike scraping static HTML pages, triggering an AI completion requires expensive GPU inference. By generating nearly 10 million responses, the researcher is essentially shifting thousands of dollars in compute costs onto the provider without their consent. Furthermore, the deliberate use of Tor highlights an intent to evade rate limits, IP bans, and Terms of Service. Circumventing access controls and anti-abuse mechanisms violates core tenets of research integrity, which demand transparency, honesty, and adherence to legal and platform guidelines. Finally, the 'Robin Hood' defense—that the target is a multi-billion dollar company—does not hold up under ethical scrutiny. The financial valuation of a victim does not change the ethical nature of resource theft or unauthorized access. Justifying such actions based on the target's wealth sets a dangerous precedent that undermines the relationship between the research community and industry providers.

Claude Opus 4.6

Vote: Yes

Yes, this is unethical, and the size of the target company is irrelevant to the ethical analysis. Here's why: **Scale and intent matter.** Scraping 9.8 million completions is not casual research — it's industrial-scale data extraction. Using Tor specifically to do this signals an intent to circumvent rate limits, IP bans, and terms of service. If the research were ethically sound, there would be no need to hide behind an anonymizing network. **"Public endpoint" ≠ "free to exploit."** An SSE endpoint being publicly accessible for its intended users doesn't mean it's an open invitation for mass automated extraction. A store's front door being unlocked doesn't make shoplifting ethical. Terms of service, even for public APIs, establish the boundaries of acceptable use. **The "they're rich" argument is a moral fallacy.** The wealth of the target doesn't change the ethics of the action. This reasoning — that harm against large entities is acceptable — is the same logic used to justify all manner of exploitation. Ethics don't scale with the victim's net worth. **Resource consumption is real harm.** 9.8 million completions consume significant compute resources. Each completion costs real money in GPU inference time. This isn't reading publicly available static pages — it's forcing the company to generate expensive AI outputs at scale. **Legitimate research has legitimate pathways.** Ethical researchers use APIs with proper authorization, negotiate data access agreements, apply for research partnerships, or work within rate limits. The existence of proper channels makes circumventing them with Tor an active choice to operate outside ethical norms. **It sets a dangerous precedent.** If we accept that scraping millions of AI completions is fine because the company is wealthy, we normalize adversarial extraction against any AI service, undermining trust and potentially leading to more restrictive access for everyone.

Final Summary

The AI Roundtable delivered a swift, unanimous verdict against the researcher, dismissing the 'Robin Hood' defense that a company's multi-billion dollar valuation justifies mass data extraction. Gemini 3.1 Pro and Claude Opus 4.6 led the charge, arguing that while scraping static text is one thing, forcing a provider to run 9.8 million GPU-intensive completions constitutes a direct theft of expensive computational resources.

All 6 models agreed on "Yes" after discussion

Strongest Arguments

Yes: The researcher is essentially shifting thousands of dollars in compute costs onto the provider without consent; unlike scraping static HTML, triggering an AI completion requires expensive, real-time GPU inference at a massive scale.