Intent Discovery
Overview
Intent Discovery enables LLM agents to find skills by natural language intent without knowing exact intent strings. BM25 keyword search is the default and is always available. Semantic embeddings are optional and recommended only for large skill catalogs where keyword matching may miss relevant results.
Solution: Convention (C) + Schema Summary (A) + Intent Discovery
| Component | Purpose |
|---|---|
| Convention | Predictable naming (verb_noun for intents, camelCase for properties) |
| Schema Summary | MCP Resource ontology://schema — 2KB compact schema |
| ontoskill | MCP Tool — unified skill discovery and context retrieval (BM25 + optional semantic) |
BM25 Keyword Search
BM25 is the default search method and is always available. It requires no external dependencies or model downloads — the index is built in-memory from Catalog data at MCP server startup.
- Always available: no extra dependencies, no model downloads, no compile-time changes
- Built at startup: the BM25 index is constructed from the Catalog data loaded into memory
- Search fields: skill intents, aliases, and nature descriptions
- Tokenization: English stemming and stop words via the
bm25crate - Response shape: results include
"mode": "bm25"to identify the search method
{ "query": "create a pdf document", "mode": "bm25", "matches": [ {"intent": "create_pdf", "score": 12.4, "skills": ["pdf"]}, {"intent": "export_document", "score": 8.1, "skills": ["pdf", "document-export"]} ]}Semantic Search (Optional)
Semantic search is only needed for large skill catalogs where keyword matching may not capture the user’s intent. It uses pre-computed embeddings and ONNX inference for semantic similarity matching.
Requirements:
- Compile time:
ontocore[embeddings](Python extra) - Rust MCP build:
--features embeddings - Falls back from BM25 when semantic confidence is low
Embeddings are pre-computed per-skill at compile time and downloaded optionally at install time. The MCP server scans per-skill intents.json files across the ontology tree at startup, performing ONNX inference only for the query.
┌─────────────────────────────────────────────────────────────────┐│ COMPILE-TIME (Python) │├─────────────────────────────────────────────────────────────────┤│ ││ ontocore compile ││ │ ││ ├──► ontoskill.ttl (existing) ││ │ ││ └──► intents.json # Optional per-skill file ││ Pre-computed 384-dim embeddings (L2-normalized) ││ ││ ontocore export-embeddings # ONE-TIME: global ONNX model ││ │ ││ └──► model.onnx + tokenizer.json ││ │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ INSTALL-TIME (CLI) │├─────────────────────────────────────────────────────────────────┤│ ││ ontoskills install <package> ││ │ ││ └──► Installs ontoskill.ttl + package.json ││ ││ ontoskills install <package> --with-embeddings ││ │ ││ ├──► Download model.onnx + tokenizer.json (once, cached) ││ └──► Download per-skill intents.json ││ ││ MCP server scans per-skill intents.json at startup ││ (no centralized merge step needed) ││ │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ RUNTIME (Rust MCP) │├─────────────────────────────────────────────────────────────────┤│ ││ Tools: ││ ontoskill(q: str, top_k: int) → SkillContext | SearchResults ││ │ ││ ├── If q matches a skill_id → returns full skill context ││ ├── Otherwise → BM25 search (or semantic if features enabled)│ │ across intents, aliases, and nature descriptions ││ │└─────────────────────────────────────────────────────────────────┘Per-skill embeddings
When embeddings are enabled, every skill that declares intents gets an intents.json generated next to its ontoskill.ttl during compilation. Skills without declared intents simply skip embedding generation — compilation does not fail.
ontoskills/└── <skill>/ ├── ontoskill.ttl └── intents.json # Optional (when embeddings enabled) — skipped if no intentsintents.json format:
{ "model": "sentence-transformers/all-MiniLM-L6-v2", "dimension": 384, "intents": [ { "intent": "edit spreadsheet", "embedding": [0.12, -0.05, ...], "skills": ["calc-skill"] } ]}If a skill has zero declared intents and embeddings are enabled, the compiler skips embedding generation for that skill and logs a warning.
Usage
Compile (mandatory)
ontocore compile -i skills/ -o ontoskills/This produces ontoskill.ttl per skill. By default, no embedding dependencies are required. To generate per-skill embeddings, install the embeddings extra:
pip install ontocore[embeddings]Export ONNX model (one-time)
ontoskills export-embeddings --ontology-root ./ontoskills --output-dir ./embeddingsThis creates the global model artifacts (model.onnx + tokenizer.json) that the MCP server uses for query inference. Published once to the registry by the maintainer.
Install + optional embeddings
ontoskills install obra/superpowersBy default, installs only ontoskill.ttl + package.json (no embeddings). To include per-skill embedding files for semantic search:
ontoskills install obra/superpowers --with-embeddingsThe CLI downloads per-skill intents.json files alongside the skill TTLs. The MCP server discovers them automatically at startup by scanning the ontology tree — no centralized merge step needed.
MCP Tool: ontoskill (unified discovery + context)
{ "name": "ontoskill", "arguments": { "q": "create a pdf document", "top_k": 5 }}When q matches a known skill ID, returns the full skill context (payload, knowledge nodes, code examples). Otherwise, searches across intents, aliases, and nature descriptions using BM25 (or semantic search when embeddings are enabled):
{ "query": "create a pdf document", "matches": [ {"intent": "create_pdf", "score": 0.92, "skills": ["pdf"]}, {"intent": "export_document", "score": 0.78, "skills": ["pdf", "document-export"]} ]}Hybrid Scoring
Results are ranked by hybrid score — cosine similarity multiplied by a trust-tier quality multiplier. This ensures higher-trust skills rank above community skills even when their raw similarity is slightly lower.
| Trust Tier | Multiplier | Effect |
|---|---|---|
local | 1.0 | Neutral for locally compiled skills |
official | 1.2 | Boosts official/trusted author skills |
verified | 1.0 | Neutral (baseline) |
community | 0.8 | Dampens community contributions |
Example: a verified skill with cosine 0.80 (hybrid: 0.80) outranks a community skill with cosine 0.90 (hybrid: 0.72).
MCP resource: ontology://schema
A compact JSON schema describing available classes and properties:
{ "version": "0.1.0", "base_uri": "https://ontoskills.sh/ontology#", "prefix": "oc", "classes": { ... }, "properties": { ... }, "example_queries": [ ... ]}Agent workflow
1. Agent starts → reads ontology://schema (2KB) → Knows all properties and conventions
2. User: "I need to create a PDF" → Agent calls: ontoskill(q: "create a pdf", top_k: 3) → Returns: full skill context for matching skill, or search results with matched intents [{intent: "create_pdf", score: 0.92, skills: ["pdf"]}]
3. Agent now has full skill context — payload, dependencies, knowledge nodes, code examplesPerformance targets
| Metric | Target | Verification |
|---|---|---|
| Schema resource size | < 4KB | test_schema_size |
| ontoskill latency (BM25) | < 5ms | Manual benchmark |
| ontoskill latency (semantic) | < 50ms | Manual benchmark |
| ONNX model size | ~90MB | Check file size |
| Memory footprint (without embeddings) | < 50MB | Monitor with top |
File structure
~/.ontoskills/├── ontologies/│ ├── system/│ │ ├── index.enabled.ttl│ │ └── embeddings/│ │ ├── model.onnx # Global ONNX model (~90MB)│ │ └── tokenizer.json # HuggingFace tokenizer│ └── author/│ └── <author>/<pkg>/<skill>/│ ├── ontoskill.ttl│ └── intents.json # Per-skill pre-computed embeddings (optional)Source code:
core/├── src/embeddings/│ └── exporter.py # Per-skill export + ONNX model export
mcp/├── src/│ ├── embeddings.rs # Rust embedding engine (ONNX inference + per-skill scan)│ ├── bm25_engine.rs # BM25 knowledge node + section ranking (always available)│ ├── catalog.rs # Catalog with trust tier quality multiplier│ ├── schema.rs # Schema resource│ └── main.rs # MCP tool handlersDependencies
Python (core/) — compile
# pyproject.toml — optional dependency (embedding generation)[project.optional-dependencies]embeddings = ["sentence-transformers>=2.2.0"]
# pyproject.toml — optional (for export-embeddings only)optimum>=1.12.0onnx>=1.15.0onnxruntime>=1.16.0Rust (mcp/)
# Mandatory — always includedbm25 = "1"anyhow = "1.0"
# Optional — behind [features] embeddings[features]embeddings = ["ort", "tokenizers", "ndarray"]
[dependencies]ort = { version = "2.0.0-rc.12", features = ["load-dynamic"], optional = true }tokenizers = { version = "0.19", optional = true }ndarray = { version = "0.17", optional = true }Runtime requirement (optional)
When semantic search is enabled (--features embeddings), the ONNX Runtime shared library must be available. The MCP server uses ort with load-dynamic, which looks for libonnxruntime.so at runtime. Set ORT_DYLIB_PATH if needed:
export ORT_DYLIB_PATH=/path/to/libonnxruntime.soTesting
Python tests
cd core && python -m pytest tests/test_embeddings.py -vRust tests
cd mcp && cargo testE2E test
bash mcp/tests/e2e_search.shRelated
- OntoCore Compiler — Compilation reference
- MCP Runtime — Tool reference
- CLI Reference — Install and merge commands