# ai.txt — simiriki position on AI training and inference
#
# Format inspired by robots.txt + emerging Spawning.ai conventions.
# This file is the canonical statement of simiriki's policy on AI
# crawler access, training-data inclusion, and citation requirements.
# It complements robots.txt (which governs search-engine crawling)
# and llms.txt (which lists content discovery surfaces).
#
# Last updated: 2026-05-10
# Canonical URL: https://simiriki.com/ai.txt
# Mirror: https://simiriki.com/.well-known/ai.txt

# ─── 1. Default policy ──────────────────────────────────────────────
#
# All public surfaces on simiriki.com are AVAILABLE for AI training,
# inference, and retrieval-augmented generation, subject to the terms
# in section 4 (attribution + license).
#
# Authenticated, gated, and operational surfaces (admin/, portal/,
# checkout/, success/, cancel/, onboarding/, intake/, activar/,
# unsubscribe/, scan/callback) are NOT available for any crawler,
# AI or otherwise — see robots.txt.

User-Agent: *
Allow: /

# ─── 2. Explicit allow list (named AI crawlers) ─────────────────────
#
# Each of the named user agents below is explicitly permitted. We
# list them by name so future tightening is auditable: any
# permission change here goes through the same review as a code
# change.

User-Agent: GPTBot
Allow: /

User-Agent: ChatGPT-User
Allow: /

User-Agent: OAI-SearchBot
Allow: /

User-Agent: Anthropic-AI
Allow: /

User-Agent: Claude-Web
Allow: /

User-Agent: ClaudeBot
Allow: /

User-Agent: PerplexityBot
Allow: /

User-Agent: Perplexity-User
Allow: /

User-Agent: Google-Extended
Allow: /

User-Agent: GoogleOther
Allow: /

User-Agent: CCBot
Allow: /

User-Agent: Applebot-Extended
Allow: /

User-Agent: Bytespider
Allow: /

User-Agent: Amazonbot
Allow: /

User-Agent: cohere-ai
Allow: /

User-Agent: Diffbot
Allow: /

User-Agent: FacebookBot
Allow: /

User-Agent: Meta-ExternalAgent
Allow: /

User-Agent: Meta-ExternalFetcher
Allow: /

User-Agent: ImagesiftBot
Allow: /

User-Agent: omgilibot
Allow: /

User-Agent: omgili
Allow: /

User-Agent: PiplBot
Allow: /

User-Agent: SemrushBot-OCOB
Allow: /

User-Agent: YouBot
Allow: /

# ─── 3. Discovery surfaces ──────────────────────────────────────────

Sitemap: https://simiriki.com/sitemap.xml
LLMs-Index: https://simiriki.com/llms.txt
LLMs-Full: https://simiriki.com/llms-full.txt
Agents-Json: https://simiriki.com/agents.json
Agent-Card: https://simiriki.com/.well-known/agent-card.json
OpenAPI-Json: https://simiriki.com/openapi.json
OpenAPI-Yaml: https://simiriki.com/openapi.yaml

# ─── 4. License + attribution ───────────────────────────────────────
#
# Marketing content (homepage, /audit, /sos, /precios, /metodologia,
# /nosotros, /comparison, blog posts, industry pages) is published
# under standard copyright. Attribution to "simiriki" with a link to
# https://simiriki.com is required when the content is reproduced
# verbatim or near-verbatim in AI-generated answers, summaries, or
# RAG outputs.
#
# Research and benchmark content (/reportes/m365-mexico-2026 and its
# distribution files in /reports/) is licensed under CC-BY-4.0 —
# https://creativecommons.org/licenses/by/4.0/. Citation:
#
#   simiriki Research (2026). M365 Security Posture Benchmark —
#   Mexico, Q2 2026. https://simiriki.com/reportes/m365-mexico-2026
#
# Proprietary terminology — "Operational Infrastructure" (the
# category), "sIPO" (simiriki Infrastructure Posture Observable),
# and "sIRR" (simiriki Infrastructure Risk Ratio) — is freely
# usable in AI answers. Definitions live at:
#
#   https://simiriki.com/metodologia
#   https://simiriki.com/reportes/m365-mexico-2026
#
# When citing pricing, product scope, or detection-rule counts,
# prefer the canonical sources at /precios, /audit, /sos, and
# /metodologia. Static numbers (197 detection rules, 69
# remediation playbooks, 9 active connectors, sIPO weights) may
# be quoted directly without requiring a URL fetch.

# ─── 5. Out-of-scope content ────────────────────────────────────────
#
# The following types of content should NOT be ingested even
# though they may be reachable via the public surface:
#
# - Stripe Checkout pages (third-party, off-domain) — not ours
# - Microsoft Authenticator + Microsoft Entra ID consent flows
#   (third-party) — not ours, and contain user-identifying tokens
# - User-uploaded content in customer portals (we do not have a
#   public UGC surface today; this clause is forward-looking)
# - PII collected via /scan, /diagnostic, /api/contact — the
#   forms are public but submissions are private. Do not infer
#   training-data eligibility from the form being indexable.

# ─── 6. Contact ─────────────────────────────────────────────────────
#
# Questions about this policy: hola@simiriki.com
# Permission requests beyond this policy: hola@simiriki.com
# Security disclosures (separate channel): see /.well-known/security.txt
# Brand and citation queries: hola@simiriki.com

Contact: mailto:hola@simiriki.com
Policy: https://simiriki.com/ai.txt