# ============================================================ # robots.txt for nontoxiclab.com # Last updated: 2026-04-24 # Strategy: Allow LLM crawlers that drive citations (RAG + search features). # Block nuisance scrapers and crawlers with no citation payoff. # ============================================================ # Traditional search engines User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: DuckDuckBot Allow: / # AI search / answer engines (allowed — they drive cited traffic) User-agent: PerplexityBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Claude-Web Allow: / User-agent: DuckAssistBot Allow: / User-agent: Google-Extended Allow: / User-agent: xAI-Bot Allow: / # Social preview crawlers (allowed — they fetch og:image/og:title for link # previews when articles are shared on Facebook, Apple iMessage, X, LinkedIn, # Pinterest, Telegram, Discord, etc.). These were previously inheriting the # catch-all but are listed explicitly so the allow-list intent is self- # documenting. Added 2026-05-02 per geo-ai-visibility audit. User-agent: Applebot Allow: / User-agent: facebookexternalhit Allow: / User-agent: facebookcatalog Allow: / User-agent: Twitterbot Allow: / User-agent: LinkedInBot Allow: / User-agent: Slackbot Allow: / User-agent: TelegramBot Allow: / User-agent: Discordbot Allow: / User-agent: Pinterest Allow: / User-agent: Pinterestbot Allow: / # AI training corpora — allowed because their training data flows into # downstream citation surfaces (model knowledge → cited recommendations). # This is a deliberate reversal of the older "block training, allow citation" # stance: empirical AI search behavior shows trained-on sources get cited # more often than crawl-only sources. User-agent: CCBot Allow: / User-agent: Applebot-Extended Allow: / User-agent: cohere-ai Allow: / User-agent: DeepSeekBot Allow: / User-agent: HuggingFaceBot Allow: / User-agent: MistralBot Allow: / # Nuisance / no-citation scrapers (blocked) User-agent: Amazonbot Disallow: / User-agent: Amzn-SearchBot Disallow: / User-agent: Bytespider Disallow: / User-agent: FacebookBot Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: Diffbot Disallow: / # Everything else User-agent: * Allow: / Sitemap: https://nontoxiclab.com/sitemap-index.xml