Skip to content

Knowledge Extraction

A skill is not just code — it’s structured knowledge. OntoCore extracts this knowledge and compiles it into a queryable ontology.

Note: Embedding generation is an optional step in the compilation pipeline. Install ontocore[embeddings] to produce per-skill vector embeddings for semantic intent search. When not installed, embedding generation is skipped with a warning — BM25 keyword search remains available in the MCP runtime.


What gets extracted

Every skill is compiled with:

ElementPropertyDescription
Identityoc:nature, oc:genus, oc:differentia”A is a B that C” definition
Intentsoc:resolvesIntentWhat user intentions this skill resolves
Requirementsoc:hasRequirementDependencies (EnvVar, Tool, Hardware, API, Knowledge)
Knowledge Nodesoc:impartsKnowledgeEpistemic + operational knowledge (8-15 per skill)
State Transitionsoc:requiresState, oc:yieldsState, oc:handlesFailurePreconditions, outcomes, error handling
Execution Payloadoc:hasPayloadOptional code to execute
Provenanceoc:generatedByAttestation (which LLM compiled it) (optional)

Components

ElementPropertyDescription
Reference Filesoc:hasReferenceFileSupporting docs with purpose (api-reference, examples, guide, domain-specific, other)
Workflowsoc:hasWorkflowMulti-step processes with hasStep dependencies
Examplesoc:hasExampleInput/output pairs for pattern matching

Knowledge nodes

The heart of knowledge extraction. Each skill contains 8-15 Knowledge Nodes — structured epistemic rules and operational instructions.

Epistemic Nodes

OntoCore organizes knowledge into 10 dimensions with 26 epistemic node types:

Dimension 1: NormativeRule

Rules that define what’s correct, incorrect, or constrained.

TypeDescriptionExample
StandardThe correct practice”Use SPARQL for ontology queries”
AntiPatternWhat to avoid”Don’t read entire files into memory for >100MB”
ConstraintExplicit limitations”Only works on Unix”

Dimension 2: StrategicInsight

Strategic insights for effective decisions.

TypeDescriptionExample
HeuristicRules of thumb”Prefer streaming for large files”
DesignPrincipleArchitectural principles”One skill = one responsibility”
WorkflowStrategyProcess strategies”Compile dependencies first”

Dimension 3: ResilienceTactic

How to handle problems and recover.

TypeDescriptionExample
KnownIssueKnown problems”Timeout on slow networks”
RecoveryTacticHow to recover”Retry with exponential backoff”

Dimension 4: ExecutionPhysics

Physical characteristics of execution.

TypeDescriptionExample
IdempotencySafe to repeat”Compilation is idempotent”
SideEffectSide effects”Writes files to ontoskills/“
PerformanceProfilePerformance characteristics”O(n) on number of skills”

Dimension 5: Observability

How to observe and measure.

TypeDescriptionExample
SuccessIndicatorSuccess signals”.ttl file generated without SHACL errors”
TelemetryPatternTelemetry patterns”Log extraction time per skill”

Dimension 6: SecurityGuardrail

Security guardrails.

TypeDescriptionExample
SecurityImplicationSecurity implications”Requires API key in env var”
DestructivePotentialDestructive potential”Can overwrite existing files”
FallbackStrategyFallback strategies”Use cache if offline”

Dimension 7: CognitiveBoundary

Cognitive limits and ambiguity.

TypeDescriptionExample
RequiresHumanClarificationWhen to ask the user”Ambiguous intent → ask for confirmation”
AssumptionBoundaryAssumptions made”Assumes UTF-8 encoding”
AmbiguityToleranceAmbiguity tolerance”Accepts both .md and .MD”

Dimension 8: ResourceProfile

Resource profile.

TypeDescriptionExample
TokenEconomyToken usage”SPARQL query: ~100 tokens vs 50KB skill files”
ComputeCostCompute cost”LLM extraction: ~2s per skill”

Dimension 9: TrustMetric

Trust metrics.

TypeDescriptionExample
ExecutionDeterminismHow deterministic”SPARQL: 100% deterministic”
DataProvenanceData provenance”Compiled by Claude 4 with verified hash”

Dimension 10: LifecycleHook

Lifecycle hooks.

TypeDescriptionExample
PreFlightCheckPre-execution checks”Verify ANTHROPIC_API_KEY is set”
PostFlightValidationPost-execution validation”Validate .ttl with SHACL”
RollbackProcedureHow to roll back”Restore from .bak if validation fails”

Operational Nodes

In addition to epistemic knowledge, OntoCore extracts operational nodes — compact, actionable instructions that tell the agent what to do. These condense verbose skill documentation into directly executable directives.

TypeDescriptionSpecial FieldsExample
ProcedureOrdered step sequencestep_order (integer)“1. Write failing test → 2. Run → 3. Minimal code → 4. Refactor”
CodePatternReusable code snippetcode_languagedef test_add(): assert add(1,2) == 3
OutputFormatExpected output templatetemplate_variables”## Summary\n- Finding\n- Recommendation”
CommandCLI command with exact syntaxpytest tests/ -v --tb=short
PrerequisiteRequired precondition“Python 3.10+ must be installed”

Each skill generates 3-8 operational nodes. The compiler aggressively compacts multi-line instructions into minimal directives — removing filler words, explanations, and motivational text. Only what the agent needs to do is preserved.

How operational nodes help

Without operational nodesWith operational nodes
Agent reads full SKILL.md (5-20KB)Agent queries specific directives (~200 bytes)
Instructions buried in proseNumbered procedures, exact commands
Code examples mixed with explanationMinimal snippets with language context
Output format implicitExplicit template with variables
Prerequisites scatteredSingle prerequisite check list

Knowledge node structure

Each Knowledge Node has:

Epistemic node (reasoning about the skill):

oc:kn_a1b2c3d4
a oc:Heuristic ;
oc:directiveContent "Prefer streaming for files >100MB" ;
oc:appliesToContext "When processing large files" ;
oc:hasRationale "Avoids OOM errors on low-RAM machines" ;
oc:severityLevel "HIGH" .

Operational node (what to do):

oc:kn_e5f6g7h8
a oc:Procedure ;
oc:directiveContent "1. Write failing test 2. Run test 3. Write minimal code 4. Refactor" ;
oc:stepOrder 1 ;
oc:appliesToContext "When implementing new features" .
FieldDescription
directiveContentThe rule, insight, or instruction
appliesToContextWhen it applies
hasRationaleWhy this rule exists (epistemic only)
severityLevelImportance: CRITICAL, HIGH, MEDIUM, LOW
stepOrderStep position for Procedure nodes
codeLanguageProgramming language for CodePattern nodes
templateVariablesPlaceholder names for OutputFormat nodes

Modular architecture

The skill as module

Each compiled skill is a self-contained .ttl file:

ontoskills/
├── core.ttl # Core TBox (shared)
├── index.ttl # Manifest with owl:imports
├── pdf/
│ └── ontoskill.ttl # PDF skill module
├── markdown/
│ └── ontoskill.ttl # Markdown skill module
└── email/
└── ontoskill.ttl # Email skill module

Pluggable knowledge

  • Add a skill → Drop a .ttl file
  • Remove a skill → Delete the .ttl file
  • Update a skill → Replace the .ttl file

The global ontology grows by addition, not modification.


Querying the knowledge

Find skills by intent

SELECT ?skill WHERE {
?skill oc:resolvesIntent "create_pdf"
}

Get knowledge nodes for a skill

SELECT ?content ?type WHERE {
<skill:pdf> oc:impartsKnowledge ?node .
?node oc:directiveContent ?content .
?node a ?type .
}

Find all AntiPatterns

SELECT ?skill ?content WHERE {
?skill oc:impartsKnowledge ?node .
?node a oc:AntiPattern .
?node oc:directiveContent ?content .
}

Find all PreFlightChecks

SELECT ?skill ?content WHERE {
?skill oc:impartsKnowledge ?node .
?node a oc:PreFlightCheck .
?node oc:directiveContent ?content .
}

The value proposition

Before (Reading Files)After (Ontology Query)
Parse 50 SKILL.md filesSingle SPARQL query
~500KB text scan~1KB query
Non-deterministicExact results
Context overflowQuery only what you need
LLM interpretsGraph returns

Knowledge becomes queryable. Intelligence becomes democratized.