Knowledge Extraction
A skill is not just code — it’s structured knowledge. OntoCore extracts this knowledge and compiles it into a queryable ontology.
Note: Embedding generation is an optional step in the compilation pipeline. Install
ontocore[embeddings]to produce per-skill vector embeddings for semantic intent search. When not installed, embedding generation is skipped with a warning — BM25 keyword search remains available in the MCP runtime.
What gets extracted
Every skill is compiled with:
| Element | Property | Description |
|---|---|---|
| Identity | oc:nature, oc:genus, oc:differentia | ”A is a B that C” definition |
| Intents | oc:resolvesIntent | What user intentions this skill resolves |
| Requirements | oc:hasRequirement | Dependencies (EnvVar, Tool, Hardware, API, Knowledge) |
| Knowledge Nodes | oc:impartsKnowledge | Epistemic + operational knowledge (8-15 per skill) |
| State Transitions | oc:requiresState, oc:yieldsState, oc:handlesFailure | Preconditions, outcomes, error handling |
| Execution Payload | oc:hasPayload | Optional code to execute |
| Provenance | oc:generatedBy | Attestation (which LLM compiled it) (optional) |
Components
| Element | Property | Description |
|---|---|---|
| Reference Files | oc:hasReferenceFile | Supporting docs with purpose (api-reference, examples, guide, domain-specific, other) |
| Workflows | oc:hasWorkflow | Multi-step processes with hasStep dependencies |
| Examples | oc:hasExample | Input/output pairs for pattern matching |
Knowledge nodes
The heart of knowledge extraction. Each skill contains 8-15 Knowledge Nodes — structured epistemic rules and operational instructions.
Epistemic Nodes
OntoCore organizes knowledge into 10 dimensions with 26 epistemic node types:
Dimension 1: NormativeRule
Rules that define what’s correct, incorrect, or constrained.
| Type | Description | Example |
|---|---|---|
| Standard | The correct practice | ”Use SPARQL for ontology queries” |
| AntiPattern | What to avoid | ”Don’t read entire files into memory for >100MB” |
| Constraint | Explicit limitations | ”Only works on Unix” |
Dimension 2: StrategicInsight
Strategic insights for effective decisions.
| Type | Description | Example |
|---|---|---|
| Heuristic | Rules of thumb | ”Prefer streaming for large files” |
| DesignPrinciple | Architectural principles | ”One skill = one responsibility” |
| WorkflowStrategy | Process strategies | ”Compile dependencies first” |
Dimension 3: ResilienceTactic
How to handle problems and recover.
| Type | Description | Example |
|---|---|---|
| KnownIssue | Known problems | ”Timeout on slow networks” |
| RecoveryTactic | How to recover | ”Retry with exponential backoff” |
Dimension 4: ExecutionPhysics
Physical characteristics of execution.
| Type | Description | Example |
|---|---|---|
| Idempotency | Safe to repeat | ”Compilation is idempotent” |
| SideEffect | Side effects | ”Writes files to ontoskills/“ |
| PerformanceProfile | Performance characteristics | ”O(n) on number of skills” |
Dimension 5: Observability
How to observe and measure.
| Type | Description | Example |
|---|---|---|
| SuccessIndicator | Success signals | ”.ttl file generated without SHACL errors” |
| TelemetryPattern | Telemetry patterns | ”Log extraction time per skill” |
Dimension 6: SecurityGuardrail
Security guardrails.
| Type | Description | Example |
|---|---|---|
| SecurityImplication | Security implications | ”Requires API key in env var” |
| DestructivePotential | Destructive potential | ”Can overwrite existing files” |
| FallbackStrategy | Fallback strategies | ”Use cache if offline” |
Dimension 7: CognitiveBoundary
Cognitive limits and ambiguity.
| Type | Description | Example |
|---|---|---|
| RequiresHumanClarification | When to ask the user | ”Ambiguous intent → ask for confirmation” |
| AssumptionBoundary | Assumptions made | ”Assumes UTF-8 encoding” |
| AmbiguityTolerance | Ambiguity tolerance | ”Accepts both .md and .MD” |
Dimension 8: ResourceProfile
Resource profile.
| Type | Description | Example |
|---|---|---|
| TokenEconomy | Token usage | ”SPARQL query: ~100 tokens vs 50KB skill files” |
| ComputeCost | Compute cost | ”LLM extraction: ~2s per skill” |
Dimension 9: TrustMetric
Trust metrics.
| Type | Description | Example |
|---|---|---|
| ExecutionDeterminism | How deterministic | ”SPARQL: 100% deterministic” |
| DataProvenance | Data provenance | ”Compiled by Claude 4 with verified hash” |
Dimension 10: LifecycleHook
Lifecycle hooks.
| Type | Description | Example |
|---|---|---|
| PreFlightCheck | Pre-execution checks | ”Verify ANTHROPIC_API_KEY is set” |
| PostFlightValidation | Post-execution validation | ”Validate .ttl with SHACL” |
| RollbackProcedure | How to roll back | ”Restore from .bak if validation fails” |
Operational Nodes
In addition to epistemic knowledge, OntoCore extracts operational nodes — compact, actionable instructions that tell the agent what to do. These condense verbose skill documentation into directly executable directives.
| Type | Description | Special Fields | Example |
|---|---|---|---|
| Procedure | Ordered step sequence | step_order (integer) | “1. Write failing test → 2. Run → 3. Minimal code → 4. Refactor” |
| CodePattern | Reusable code snippet | code_language | def test_add(): assert add(1,2) == 3 |
| OutputFormat | Expected output template | template_variables | ”## Summary\n- Finding\n- Recommendation” |
| Command | CLI command with exact syntax | — | pytest tests/ -v --tb=short |
| Prerequisite | Required precondition | — | “Python 3.10+ must be installed” |
Each skill generates 3-8 operational nodes. The compiler aggressively compacts multi-line instructions into minimal directives — removing filler words, explanations, and motivational text. Only what the agent needs to do is preserved.
How operational nodes help
| Without operational nodes | With operational nodes |
|---|---|
| Agent reads full SKILL.md (5-20KB) | Agent queries specific directives (~200 bytes) |
| Instructions buried in prose | Numbered procedures, exact commands |
| Code examples mixed with explanation | Minimal snippets with language context |
| Output format implicit | Explicit template with variables |
| Prerequisites scattered | Single prerequisite check list |
Knowledge node structure
Each Knowledge Node has:
Epistemic node (reasoning about the skill):
oc:kn_a1b2c3d4 a oc:Heuristic ; oc:directiveContent "Prefer streaming for files >100MB" ; oc:appliesToContext "When processing large files" ; oc:hasRationale "Avoids OOM errors on low-RAM machines" ; oc:severityLevel "HIGH" .Operational node (what to do):
oc:kn_e5f6g7h8 a oc:Procedure ; oc:directiveContent "1. Write failing test 2. Run test 3. Write minimal code 4. Refactor" ; oc:stepOrder 1 ; oc:appliesToContext "When implementing new features" .| Field | Description |
|---|---|
directiveContent | The rule, insight, or instruction |
appliesToContext | When it applies |
hasRationale | Why this rule exists (epistemic only) |
severityLevel | Importance: CRITICAL, HIGH, MEDIUM, LOW |
stepOrder | Step position for Procedure nodes |
codeLanguage | Programming language for CodePattern nodes |
templateVariables | Placeholder names for OutputFormat nodes |
Modular architecture
The skill as module
Each compiled skill is a self-contained .ttl file:
ontoskills/├── core.ttl # Core TBox (shared)├── index.ttl # Manifest with owl:imports├── pdf/│ └── ontoskill.ttl # PDF skill module├── markdown/│ └── ontoskill.ttl # Markdown skill module└── email/ └── ontoskill.ttl # Email skill modulePluggable knowledge
- Add a skill → Drop a
.ttlfile - Remove a skill → Delete the
.ttlfile - Update a skill → Replace the
.ttlfile
The global ontology grows by addition, not modification.
Querying the knowledge
Find skills by intent
SELECT ?skill WHERE { ?skill oc:resolvesIntent "create_pdf"}Get knowledge nodes for a skill
SELECT ?content ?type WHERE { <skill:pdf> oc:impartsKnowledge ?node . ?node oc:directiveContent ?content . ?node a ?type .}Find all AntiPatterns
SELECT ?skill ?content WHERE { ?skill oc:impartsKnowledge ?node . ?node a oc:AntiPattern . ?node oc:directiveContent ?content .}Find all PreFlightChecks
SELECT ?skill ?content WHERE { ?skill oc:impartsKnowledge ?node . ?node a oc:PreFlightCheck . ?node oc:directiveContent ?content .}The value proposition
| Before (Reading Files) | After (Ontology Query) |
|---|---|
| Parse 50 SKILL.md files | Single SPARQL query |
| ~500KB text scan | ~1KB query |
| Non-deterministic | Exact results |
| Context overflow | Query only what you need |
| LLM interprets | Graph returns |
Knowledge becomes queryable. Intelligence becomes democratized.