Skip to content

Knowledge Extraction

A skill is not just code — it’s structured knowledge. OntoCore extracts this knowledge and compiles it into a queryable ontology.


What gets extracted

Every skill is compiled with:

ElementPropertyDescription
Identityoc:nature, oc:genus, oc:differentia”A is a B that C” definition
Intentsoc:resolvesIntentWhat user intentions this skill resolves
Requirementsoc:hasRequirementDependencies (EnvVar, Tool, Hardware, API, Knowledge)
Knowledge Nodesoc:impartsKnowledgeEpistemic knowledge (8-12 per skill)
State Transitionsoc:requiresState, oc:yieldsState, oc:handlesFailurePreconditions, outcomes, error handling
Execution Payloadoc:hasPayloadOptional code to execute
Provenanceoc:generatedByAttestation (which LLM compiled it)

Components

ElementPropertyDescription
Reference Filesoc:hasReferenceFileSupporting docs with purpose (api-reference, examples, guide, domain-specific, other)
Executable Scriptsoc:hasExecutableScriptScripts with executor, executionIntent, requirements
Workflowsoc:hasWorkflowMulti-step processes with hasStep dependencies
Examplesoc:hasExampleInput/output pairs for pattern matching

Knowledge nodes

The heart of knowledge extraction. Each skill contains 8-12 Knowledge Nodes — structured epistemic rules.

The 10 Epistemic Dimensions

OntoCore organizes knowledge into 10 dimensions with 26 node types:

Dimension 1: NormativeRule

Rules that define what’s correct, incorrect, or constrained.

TypeDescriptionExample
StandardThe correct practice”Use SPARQL for ontology queries”
AntiPatternWhat to avoid”Don’t read entire files into memory for >100MB”
ConstraintExplicit limitations”Only works on Unix”

Dimension 2: StrategicInsight

Strategic insights for effective decisions.

TypeDescriptionExample
HeuristicRules of thumb”Prefer streaming for large files”
DesignPrincipleArchitectural principles”One skill = one responsibility”
WorkflowStrategyProcess strategies”Compile dependencies first”

Dimension 3: ResilienceTactic

How to handle problems and recover.

TypeDescriptionExample
KnownIssueKnown problems”Timeout on slow networks”
RecoveryTacticHow to recover”Retry with exponential backoff”

Dimension 4: ExecutionPhysics

Physical characteristics of execution.

TypeDescriptionExample
IdempotencySafe to repeat”Compilation is idempotent”
SideEffectSide effects”Writes files to ontoskills/“
PerformanceProfilePerformance characteristics”O(n) on number of skills”

Dimension 5: Observability

How to observe and measure.

TypeDescriptionExample
SuccessIndicatorSuccess signals”.ttl file generated without SHACL errors”
TelemetryPatternTelemetry patterns”Log extraction time per skill”

Dimension 6: SecurityGuardrail

Security guardrails.

TypeDescriptionExample
SecurityImplicationSecurity implications”Requires API key in env var”
DestructivePotentialDestructive potential”Can overwrite existing files”
FallbackStrategyFallback strategies”Use cache if offline”

Dimension 7: CognitiveBoundary

Cognitive limits and ambiguity.

TypeDescriptionExample
RequiresHumanClarificationWhen to ask the user”Ambiguous intent → ask for confirmation”
AssumptionBoundaryAssumptions made”Assumes UTF-8 encoding”
AmbiguityToleranceAmbiguity tolerance”Accepts both .md and .MD”

Dimension 8: ResourceProfile

Resource profile.

TypeDescriptionExample
TokenEconomyToken usage”SPARQL query: ~100 tokens vs 50KB skill files”
ComputeCostCompute cost”LLM extraction: ~2s per skill”

Dimension 9: TrustMetric

Trust metrics.

TypeDescriptionExample
ExecutionDeterminismHow deterministic”SPARQL: 100% deterministic”
DataProvenanceData provenance”Compiled by Claude 4 with verified hash”

Dimension 10: LifecycleHook

Lifecycle hooks.

TypeDescriptionExample
PreFlightCheckPre-execution checks”Verify ANTHROPIC_API_KEY is set”
PostFlightValidationPost-execution validation”Validate .ttl with SHACL”
RollbackProcedureHow to roll back”Restore from .bak if validation fails”

Knowledge node structure

Each Knowledge Node has:

oc:kn_a1b2c3d4
a oc:Heuristic ;
oc:directiveContent "Prefer streaming for files >100MB" ;
oc:appliesToContext "When processing large files" ;
oc:hasRationale "Avoids OOM errors on low-RAM machines" ;
oc:severityLevel "HIGH" .
FieldDescription
directiveContentThe rule or insight
appliesToContextWhen it applies
hasRationaleWhy this rule exists
severityLevelImportance: CRITICAL, HIGH, MEDIUM, LOW

Modular architecture

The skill as module

Each compiled skill is a self-contained .ttl file:

ontoskills/
├── core.ttl # Core TBox (shared)
├── index.ttl # Manifest with owl:imports
├── pdf/
│ └── ontoskill.ttl # PDF skill module
├── markdown/
│ └── ontoskill.ttl # Markdown skill module
└── email/
└── ontoskill.ttl # Email skill module

Pluggable knowledge

  • Add a skill → Drop a .ttl file
  • Remove a skill → Delete the .ttl file
  • Update a skill → Replace the .ttl file

The global ontology grows by addition, not modification.


Querying the knowledge

Find skills by intent

SELECT ?skill WHERE {
?skill oc:resolvesIntent "create_pdf"
}

Get knowledge nodes for a skill

SELECT ?content ?type WHERE {
<skill:pdf> oc:impartsKnowledge ?node .
?node oc:directiveContent ?content .
?node a ?type .
}

Find all AntiPatterns

SELECT ?skill ?content WHERE {
?skill oc:impartsKnowledge ?node .
?node a oc:AntiPattern .
?node oc:directiveContent ?content .
}

Find all PreFlightChecks

SELECT ?skill ?content WHERE {
?skill oc:impartsKnowledge ?node .
?node a oc:PreFlightCheck .
?node oc:directiveContent ?content .
}

The value proposition

Before (Reading Files)After (Ontology Query)
Parse 50 SKILL.md filesSingle SPARQL query
~500KB text scan~1KB query
Non-deterministicExact results
Context overflowQuery only what you need
LLM interpretsGraph returns

Knowledge becomes queryable. Intelligence becomes democratized.