# ai.txt — simiriki position on AI training and inference # # Format inspired by robots.txt + emerging Spawning.ai conventions. # This file is the canonical statement of simiriki's policy on AI # crawler access, training-data inclusion, and citation requirements. # It complements robots.txt (which governs search-engine crawling) # and llms.txt (which lists content discovery surfaces). # # Last updated: 2026-05-10 # Canonical URL: https://simiriki.com/ai.txt # Mirror: https://simiriki.com/.well-known/ai.txt # ─── 1. Default policy ────────────────────────────────────────────── # # All public surfaces on simiriki.com are AVAILABLE for AI training, # inference, and retrieval-augmented generation, subject to the terms # in section 4 (attribution + license). # # Authenticated, gated, and operational surfaces (admin/, portal/, # checkout/, success/, cancel/, onboarding/, intake/, activar/, # unsubscribe/, scan/callback) are NOT available for any crawler, # AI or otherwise — see robots.txt. User-Agent: * Allow: / # ─── 2. Explicit allow list (named AI crawlers) ───────────────────── # # Each of the named user agents below is explicitly permitted. We # list them by name so future tightening is auditable: any # permission change here goes through the same review as a code # change. User-Agent: GPTBot Allow: / User-Agent: ChatGPT-User Allow: / User-Agent: OAI-SearchBot Allow: / User-Agent: Anthropic-AI Allow: / User-Agent: Claude-Web Allow: / User-Agent: ClaudeBot Allow: / User-Agent: PerplexityBot Allow: / User-Agent: Perplexity-User Allow: / User-Agent: Google-Extended Allow: / User-Agent: GoogleOther Allow: / User-Agent: CCBot Allow: / User-Agent: Applebot-Extended Allow: / User-Agent: Bytespider Allow: / User-Agent: Amazonbot Allow: / User-Agent: cohere-ai Allow: / User-Agent: Diffbot Allow: / User-Agent: FacebookBot Allow: / User-Agent: Meta-ExternalAgent Allow: / User-Agent: Meta-ExternalFetcher Allow: / User-Agent: ImagesiftBot Allow: / User-Agent: omgilibot Allow: / User-Agent: omgili Allow: / User-Agent: PiplBot Allow: / User-Agent: SemrushBot-OCOB Allow: / User-Agent: YouBot Allow: / # ─── 3. Discovery surfaces ────────────────────────────────────────── Sitemap: https://simiriki.com/sitemap.xml LLMs-Index: https://simiriki.com/llms.txt LLMs-Full: https://simiriki.com/llms-full.txt Agents-Json: https://simiriki.com/agents.json Agent-Card: https://simiriki.com/.well-known/agent-card.json OpenAPI-Json: https://simiriki.com/openapi.json OpenAPI-Yaml: https://simiriki.com/openapi.yaml # ─── 4. License + attribution ─────────────────────────────────────── # # Marketing content (homepage, /audit, /sos, /precios, /metodologia, # /nosotros, /comparison, blog posts, industry pages) is published # under standard copyright. Attribution to "simiriki" with a link to # https://simiriki.com is required when the content is reproduced # verbatim or near-verbatim in AI-generated answers, summaries, or # RAG outputs. # # Research and benchmark content (/reportes/m365-mexico-2026 and its # distribution files in /reports/) is licensed under CC-BY-4.0 — # https://creativecommons.org/licenses/by/4.0/. Citation: # # simiriki Research (2026). M365 Security Posture Benchmark — # Mexico, Q2 2026. https://simiriki.com/reportes/m365-mexico-2026 # # Proprietary terminology — "Operational Infrastructure" (the # category), "sIPO" (simiriki Infrastructure Posture Observable), # and "sIRR" (simiriki Infrastructure Risk Ratio) — is freely # usable in AI answers. Definitions live at: # # https://simiriki.com/metodologia # https://simiriki.com/reportes/m365-mexico-2026 # # When citing pricing, product scope, or detection-rule counts, # prefer the canonical sources at /precios, /audit, /sos, and # /metodologia. Static numbers (197 detection rules, 69 # remediation playbooks, 9 active connectors, sIPO weights) may # be quoted directly without requiring a URL fetch. # ─── 5. Out-of-scope content ──────────────────────────────────────── # # The following types of content should NOT be ingested even # though they may be reachable via the public surface: # # - Stripe Checkout pages (third-party, off-domain) — not ours # - Microsoft Authenticator + Microsoft Entra ID consent flows # (third-party) — not ours, and contain user-identifying tokens # - User-uploaded content in customer portals (we do not have a # public UGC surface today; this clause is forward-looking) # - PII collected via /scan, /diagnostic, /api/contact — the # forms are public but submissions are private. Do not infer # training-data eligibility from the form being indexable. # ─── 6. Contact ───────────────────────────────────────────────────── # # Questions about this policy: hola@simiriki.com # Permission requests beyond this policy: hola@simiriki.com # Security disclosures (separate channel): see /.well-known/security.txt # Brand and citation queries: hola@simiriki.com Contact: mailto:hola@simiriki.com Policy: https://simiriki.com/ai.txt