Skip to content

Intent Discovery

Overview

Intent Discovery enables LLM agents to find skills by natural language intent without knowing exact intent strings. BM25 keyword search is the default and is always available. Semantic embeddings are optional and recommended only for large skill catalogs where keyword matching may miss relevant results.

Solution: Convention (C) + Schema Summary (A) + Intent Discovery

ComponentPurpose
ConventionPredictable naming (verb_noun for intents, camelCase for properties)
Schema SummaryMCP Resource ontology://schema — 2KB compact schema
ontoskillMCP Tool — unified skill discovery and context retrieval (BM25 + optional semantic)

BM25 is the default search method and is always available. It requires no external dependencies or model downloads — the index is built in-memory from Catalog data at MCP server startup.

  • Always available: no extra dependencies, no model downloads, no compile-time changes
  • Built at startup: the BM25 index is constructed from the Catalog data loaded into memory
  • Search fields: skill intents, aliases, and nature descriptions
  • Tokenization: English stemming and stop words via the bm25 crate
  • Response shape: results include "mode": "bm25" to identify the search method
{
"query": "create a pdf document",
"mode": "bm25",
"matches": [
{"intent": "create_pdf", "score": 12.4, "skills": ["pdf"]},
{"intent": "export_document", "score": 8.1, "skills": ["pdf", "document-export"]}
]
}

Semantic Search (Optional)

Semantic search is only needed for large skill catalogs where keyword matching may not capture the user’s intent. It uses pre-computed embeddings and ONNX inference for semantic similarity matching.

Requirements:

  • Compile time: ontocore[embeddings] (Python extra)
  • Rust MCP build: --features embeddings
  • Falls back from BM25 when semantic confidence is low

Embeddings are pre-computed per-skill at compile time and downloaded optionally at install time. The MCP server scans per-skill intents.json files across the ontology tree at startup, performing ONNX inference only for the query.

┌─────────────────────────────────────────────────────────────────┐
│ COMPILE-TIME (Python) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ontocore compile │
│ │ │
│ ├──► ontoskill.ttl (existing) │
│ │ │
│ └──► intents.json # Optional per-skill file │
│ Pre-computed 384-dim embeddings (L2-normalized) │
│ │
│ ontocore export-embeddings # ONE-TIME: global ONNX model │
│ │ │
│ └──► model.onnx + tokenizer.json │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ INSTALL-TIME (CLI) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ontoskills install <package> │
│ │ │
│ └──► Installs ontoskill.ttl + package.json │
│ │
│ ontoskills install <package> --with-embeddings │
│ │ │
│ ├──► Download model.onnx + tokenizer.json (once, cached) │
│ └──► Download per-skill intents.json │
│ │
│ MCP server scans per-skill intents.json at startup │
│ (no centralized merge step needed) │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ RUNTIME (Rust MCP) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Tools: │
│ ontoskill(q: str, top_k: int) → SkillContext | SearchResults │
│ │ │
│ ├── If q matches a skill_id → returns full skill context │
│ ├── Otherwise → BM25 search (or semantic if features enabled)
│ │ across intents, aliases, and nature descriptions │
│ │
└─────────────────────────────────────────────────────────────────┘

Per-skill embeddings

When embeddings are enabled, every skill that declares intents gets an intents.json generated next to its ontoskill.ttl during compilation. Skills without declared intents simply skip embedding generation — compilation does not fail.

ontoskills/
└── <skill>/
├── ontoskill.ttl
└── intents.json # Optional (when embeddings enabled) — skipped if no intents

intents.json format:

{
"model": "sentence-transformers/all-MiniLM-L6-v2",
"dimension": 384,
"intents": [
{
"intent": "edit spreadsheet",
"embedding": [0.12, -0.05, ...],
"skills": ["calc-skill"]
}
]
}

If a skill has zero declared intents and embeddings are enabled, the compiler skips embedding generation for that skill and logs a warning.


Usage

Compile (mandatory)

Terminal window
ontocore compile -i skills/ -o ontoskills/

This produces ontoskill.ttl per skill. By default, no embedding dependencies are required. To generate per-skill embeddings, install the embeddings extra:

Terminal window
pip install ontocore[embeddings]

Export ONNX model (one-time)

Terminal window
ontoskills export-embeddings --ontology-root ./ontoskills --output-dir ./embeddings

This creates the global model artifacts (model.onnx + tokenizer.json) that the MCP server uses for query inference. Published once to the registry by the maintainer.

Install + optional embeddings

Terminal window
ontoskills install obra/superpowers

By default, installs only ontoskill.ttl + package.json (no embeddings). To include per-skill embedding files for semantic search:

Terminal window
ontoskills install obra/superpowers --with-embeddings

The CLI downloads per-skill intents.json files alongside the skill TTLs. The MCP server discovers them automatically at startup by scanning the ontology tree — no centralized merge step needed.

MCP Tool: ontoskill (unified discovery + context)

{
"name": "ontoskill",
"arguments": {
"q": "create a pdf document",
"top_k": 5
}
}

When q matches a known skill ID, returns the full skill context (payload, knowledge nodes, code examples). Otherwise, searches across intents, aliases, and nature descriptions using BM25 (or semantic search when embeddings are enabled):

{
"query": "create a pdf document",
"matches": [
{"intent": "create_pdf", "score": 0.92, "skills": ["pdf"]},
{"intent": "export_document", "score": 0.78, "skills": ["pdf", "document-export"]}
]
}

Hybrid Scoring

Results are ranked by hybrid score — cosine similarity multiplied by a trust-tier quality multiplier. This ensures higher-trust skills rank above community skills even when their raw similarity is slightly lower.

Trust TierMultiplierEffect
local1.0Neutral for locally compiled skills
official1.2Boosts official/trusted author skills
verified1.0Neutral (baseline)
community0.8Dampens community contributions

Example: a verified skill with cosine 0.80 (hybrid: 0.80) outranks a community skill with cosine 0.90 (hybrid: 0.72).

MCP resource: ontology://schema

A compact JSON schema describing available classes and properties:

{
"version": "0.1.0",
"base_uri": "https://ontoskills.sh/ontology#",
"prefix": "oc",
"classes": { ... },
"properties": { ... },
"example_queries": [ ... ]
}

Agent workflow

1. Agent starts → reads ontology://schema (2KB)
→ Knows all properties and conventions
2. User: "I need to create a PDF"
→ Agent calls: ontoskill(q: "create a pdf", top_k: 3)
→ Returns: full skill context for matching skill, or search results
with matched intents [{intent: "create_pdf", score: 0.92, skills: ["pdf"]}]
3. Agent now has full skill context — payload, dependencies, knowledge nodes, code examples

Performance targets

MetricTargetVerification
Schema resource size< 4KBtest_schema_size
ontoskill latency (BM25)< 5msManual benchmark
ontoskill latency (semantic)< 50msManual benchmark
ONNX model size~90MBCheck file size
Memory footprint (without embeddings)< 50MBMonitor with top

File structure

~/.ontoskills/
├── ontologies/
│ ├── system/
│ │ ├── index.enabled.ttl
│ │ └── embeddings/
│ │ ├── model.onnx # Global ONNX model (~90MB)
│ │ └── tokenizer.json # HuggingFace tokenizer
│ └── author/
│ └── <author>/<pkg>/<skill>/
│ ├── ontoskill.ttl
│ └── intents.json # Per-skill pre-computed embeddings (optional)

Source code:

core/
├── src/embeddings/
│ └── exporter.py # Per-skill export + ONNX model export
mcp/
├── src/
│ ├── embeddings.rs # Rust embedding engine (ONNX inference + per-skill scan)
│ ├── bm25_engine.rs # BM25 knowledge node + section ranking (always available)
│ ├── catalog.rs # Catalog with trust tier quality multiplier
│ ├── schema.rs # Schema resource
│ └── main.rs # MCP tool handlers

Dependencies

Python (core/) — compile

# pyproject.toml — optional dependency (embedding generation)
[project.optional-dependencies]
embeddings = ["sentence-transformers>=2.2.0"]
# pyproject.toml — optional (for export-embeddings only)
optimum>=1.12.0
onnx>=1.15.0
onnxruntime>=1.16.0

Rust (mcp/)

# Mandatory — always included
bm25 = "1"
anyhow = "1.0"
# Optional — behind [features] embeddings
[features]
embeddings = ["ort", "tokenizers", "ndarray"]
[dependencies]
ort = { version = "2.0.0-rc.12", features = ["load-dynamic"], optional = true }
tokenizers = { version = "0.19", optional = true }
ndarray = { version = "0.17", optional = true }

Runtime requirement (optional)

When semantic search is enabled (--features embeddings), the ONNX Runtime shared library must be available. The MCP server uses ort with load-dynamic, which looks for libonnxruntime.so at runtime. Set ORT_DYLIB_PATH if needed:

Terminal window
export ORT_DYLIB_PATH=/path/to/libonnxruntime.so

Testing

Python tests

Terminal window
cd core && python -m pytest tests/test_embeddings.py -v

Rust tests

Terminal window
cd mcp && cargo test

E2E test

Terminal window
bash mcp/tests/e2e_search.sh