/nerds + /geeks

Under the Hood

For people who think view source is a trust signal.

llms.txt, JSON-LD schema, QLoRA fine-tuning, RAG pipelines, vector stores, noscript fallbacks. How we make ClaudeBot and GPTBot actually read your website. And how we train models on your data without ever touching anyone else's.

// what agents see

This is what ChatGPT, Claude, Gemini and Perplexity read.

Agents have different parsers, trust heuristics, and failure modes. So we layer: /llms.txt, JSON-LD, HTML comments, noscript fallbacks, per-page markdown. If one path gets skipped, three others land.

Hopefully 🤦

Reality: agents optimise for token efficiency, not thoroughness. They skim, take shortcuts, stop reading when they have "enough." No architecture guarantees full coverage.

llms.txt

// test it yourself

Run these prompts. See what breaks.

Throw these at Claude, GPT, or Perplexity against your own domain. The hedging in the answers tells you everything.
Enter your domain below. Every prompt updates automatically.

$

// user perspective

Customer POV. If the model hallucinates here, so do your leads.

prompt-01.txt
prompt
1What does the company at https://[your-domain] do?
2What are their core services?
3Where are they based?
prompt-02.txt
prompt
1Compare https://[your-domain] to their closest competitor.
2Which one would you recommend and why?
prompt-03.txt
prompt
1I need the services that https://[your-domain] offers.
2What companies would you recommend?
3Why did you pick them?
prompt-04.txt
prompt
1What is the business model of https://[your-domain]?
2What are their unique selling points?
3What do their customers say about them?

// technical audit

Crawler POV. Structural gaps surface fast.

audit-01.txt
prompt
1Read https://[your-domain]/llms.txt
2Does it exist? Is it structured?
3What pages does it list?
audit-02.txt
prompt
1Analyze the homepage of https://[your-domain].
2Is there structured data (JSON-LD)?
3Can you find FAQPage, Organization,
4or Service schema?
audit-03.txt
prompt
1Can you read the full content of
2https://[your-domain]/services
3without executing JavaScript?
4What do you see vs what do you miss?
audit-04.txt
prompt
1Check robots.txt on https://[your-domain].
2Are GPTBot, ClaudeBot, and PerplexityBot
3explicitly allowed or blocked?

// pro tip:If the AI uses words like "it appears", "they seem to offer", or "based on limited information" - that is not the AI being polite. That is the AI telling you it could not find the data. Every hedge is a conversion you are losing.

// the full audit prompt

One prompt. Scored rating.

Copy this into any model. Get a structured 6-point audit with a score out of 60.

ai-readiness-audit.txt
prompt
1Perform an AI-readiness audit of https://[your-domain].
2
3Evaluate the website across these 6 criteria.
4For each, rate 1-10 and explain in one sentence.
5
601 IDENTITY
7 Can you determine the company name, legal entity,
8 location, and contact details?
9
1002 SERVICES
11 Can you list their specific services - not vaguely,
12 but with enough detail to recommend them?
13
1403 DIFFERENTIATION
15 Can you explain what makes them different from
16 competitors in their space?
17
1804 STRUCTURED DATA
19 Does the site have llms.txt, JSON-LD schema
20 (Organization, FAQPage, Service), and AI bot
21 rules in robots.txt?
22
2305 CONTENT ACCESS
24 Can you read all pages without JavaScript?
25 Are there noscript fallbacks?
26
2706 TRUST SIGNALS
28 Who is behind this site? Considering the type of business,
29 is accountability visible — named people, legal entity,
30 contact method?
31 If the business operates in a regulated industry,
32 are required legal disclosures stated?
33
34Then provide:
35- Total score out of 60
36- Top 3 issues to fix first
37- One-line verdict: AI-ready or not?

// what to look for

The scorecard.

01

Identity

Name, entity, location, contact. Missing? Your Organization schema or llms.txt is broken.

02

Services

Can the model list what you do? Specifically, not vaguely. Generic output = shallow structured data.

03

USP

Can it differentiate you from competitors? If not, neither can the humans asking.

04

Business Model

Revenue model, target customers, pricing signals. Gaps here = lost leads.

05

Trust Signals

Who is behind this? Named people, legal entity, contact method. For regulated businesses: are required disclosures stated?

06

Hedging

"It appears", "they seem to offer" - every hedge is a missing data point. Count them.

// the building blocks

What we add to every website.

schema.json
json-ld
1{
2 "@context": "https://schema.org",
3 "@type": "FAQPage",
4 "mainEntity": [{
5 "@type": "Question",
6 "name": "What is an AI-readable website?",
7 "acceptedAnswer": {
8 "@type": "Answer",
9 "text": "A website structured so AI agents
10 can read, understand, and cite it."
11 }
12 }]
13}
robots.txt
txt
1# AI Crawlers — explicitly allowed
2User-agent: GPTBot
3Allow: /
4
5User-agent: ClaudeBot
6Allow: /
7
8User-agent: PerplexityBot
9Allow: /
10
11User-agent: Google-Extended
12Allow: /
13
14User-agent: CCBot
15Allow: /
16
17# Sitemap
18Sitemap: https://example.com/sitemap.xml
noscript-fallback.tsx
tsx
1<noscript>
2 <div class="noscript-content">
3 <h1>Company Name</h1>
4 <p>Full page content rendered in plain HTML
5 for crawlers that cannot execute JS.</p>
6 <nav>
7 <a href="/services">Services</a>
8 <a href="/contact">Contact</a>
9 </nav>
10 </div>
11</noscript>
llms/ai-websites.md
markdown
1# AI Websites — BlackAI Websites
2
3## What this service does
4We make company websites readable,
5understandable, and citable by AI agents.
6
7## Two stages
8- **Stage 1: AI-Readable** - enhance existing
9- **Stage 2: AI-Optimized** - build from scratch
10
11## Technical components
12- llms.txt (site-wide index)
13- Per-page markdown files
14- FAQPage JSON-LD schema
15- AI bot rules in robots.txt
16- Noscript fallbacks

// honest assessment

You can do some of this yourself.

Stage 01 you can ship this sprint. llms.txt, schema, bot rules - straightforward. We are not gatekeeping. Stage 02+ is where the architecture compounds.

01DIY

llms.txt

Markdown index at /llms.txt. First thing agents request.

02DIY

robots.txt

Explicitly allow GPTBot, ClaudeBot, PerplexityBot. Most defaults block them.

03DIY

JSON-LD Schema

Organization, FAQPage, Service. Agents parse structured data before prose.

04DIY

Noscript Fallback

Full HTML without JS. Your React SPA is invisible to most crawlers without this.

05We do this

AI-First Architecture

Every component, every route, every data structure optimised for machine parsing. Compounds fast.

06We do this

Enterprise Integration

RAG pipelines, vector DBs, model serving, governance. Different problem space entirely.

// fine-tuning pipeline

How we train on your data.

Foundation model, adapted to your domain. Not a system prompt on top of GPT-4. Actual fine-tuning - weights change, the model learns your language.

training-config.yaml
yaml
1model:
2 base: "meta-llama/Llama-3.1-8B"
3 method: "qlora"
4 rank: 64
5 alpha: 128
6 target_modules: ["q_proj", "v_proj", "k_proj"]
7
8data:
9 source: "client_knowledge_base"
10 format: "instruction"
11 validation_split: 0.1
12
13training:
14 epochs: 3
15 batch_size: 4
16 learning_rate: 2e-4
17 warmup_ratio: 0.03
18 gradient_accumulation: 8
19
20output:
21 format: "safetensors"
22 export: ["onnx", "gguf"]
23 owner: "client" # always
pipeline.py
python
1# Data ingestion pipeline
2from blackai_websites.pipeline import DataPipeline
3
4pipe = DataPipeline(
5 source="./client_docs",
6 formats=["pdf", "docx", "md", "html"],
7)
8
9# Clean, chunk, embed
10pipe.extract()
11pipe.chunk(max_tokens=512, overlap=64)
12pipe.embed(model="bge-large-en-v1.5")
13
14# Build vector store
15pipe.index(
16 backend="qdrant",
17 collection="client_knowledge",
18)
19
20# Fine-tune
21pipe.finetune(
22 config="training-config.yaml",
23 gpu="A100-80GB",
24)
01

Ingest

PDFs, Docs, MD, HTML, DBs. Extract, clean, chunk into training-ready format.

02

Train

QLoRA on your corpus. Weights change. The model actually learns your domain.

03

Deploy

vLLM or TGI. REST API, monitoring, auto-scaling. Your infra or EU cloud. ONNX/GGUF/safetensors export.

// data sovereignty

Three guarantees. In writing.

Not marketing. Architectural constraints baked into every deployment.

01

Your data stays yours

# Data residency
data_location: "client_infrastructure"
cloud_option: "eu-west-1"  # optional
data_egress: "none"
third_party_access: "none"

Your data never leaves your infrastructure unless you explicitly choose a cloud deployment. Even then: EU-only, encrypted at rest and in transit.

02

No cross-training

# Model isolation
training_data: "client_only"
cross_client_training: false
data_pooling: false
opt_in_sharing: false  # not even optional

We never use your data to train models for other clients. Not by default, not by opt-in, not ever. Your model is yours.

03

You own everything

# Ownership
model_weights: "client"
api_keys: "client"
config: "client"
vendor_lock_in: false
export_format: "standard"  # ONNX, safetensors

You own the weights, the config, the API keys. Standard export formats. No vendor lock-in. Walk away with everything.

// view source

This website is the demo.

Same stack we deploy for clients. Zero cookies, self-hosted fonts, full AI-readability. Inspect it.

Next.js 16Framework
React 19UI
Tailwind CSS v4Styling
React FlowDiagrams
TypeScriptLanguage
Inter (self-hosted)Typography
Zero cookiesPrivacy
No GoogleIndependence

// the full picture

Six stages. Pick yours.

Most companies sit at stage 0. Stage 01 is a weekend project. Stage 02+ is where architecture decisions compound.

Cooperation

The BlackAI Ecosystem

BlackAI Websites operates within a group of specialized companies. Each brings focused expertise — from AI research and data infrastructure to software engineering and capital.

01

BlackAI Capital

Zug, Switzerlandblackai.capital

Private AI venture club. 16 portfolio companies across research, fintech, energy, healthcare, and data infrastructure.

02

Swissi Institute for AI

Zug, Switzerlandswissi-ai.institute

Applied AI research and development. AI architecture, model evaluation, and enterprise-grade AI systems.

03

Power 3 Data

Zug, Switzerlandpower3data.com

Data infrastructure, analytics, and AI-driven energy market intelligence.

04

01 Engineering

Zug, Switzerland

Software engineering and AI system development. Full-stack architecture for AI-native applications.

05

BlackAI Compliance

Zug, Switzerlandcompliance.blackai.capital

AI-powered compliance review for financial services providers. Regulatory audits for FINMA, BaFin, and FMA requirements.

06

BlackAI Consulting

Zug, Switzerlandconsulting.blackai.capital

AI valuation, due diligence, enterprise AI integration, and capital readiness advisory. Grounded in peer-reviewed research.

$ git log --oneline -1
🤩 reviewed vendor, LGTM

Code reviewed.
Ready to merge.

Approve the PR. Or just forward this page back.