Hadto note

Original Research - Ontology Pipeline · 2026-05-09

Benchmark the ontology against the business

OntoMoBench is Hadto's open benchmark project for testing whether ontology-engineering systems model real business transitions, not just valid-looking schemas.

Why this matters

This post shows how handoff discipline and customer-facing work turn private founder skill into something the business can keep using.

Why this note is here

Evidence: Adds facts or examples behind an existing point.

What supports it: Uses evidence, definitions, and cause-and-effect.

Ontology work counts when it can be scored against real business transitions.

ontology researchbenchmarksowner operatorshadto

A valid ontology file is not enough.

It can parse. It can use tidy prefixes. It can contain classes that sound right. It can still fail the business.

Hadto is taking that lesson from the SIGOPS note on SysMoBench. SysMoBench asks whether language models can write TLA+ specifications for real systems, then tests the generated specs against syntax, runtime execution, trace conformance, and invariants. The hard part is not producing a formal-looking artifact. The hard part is matching the implementation’s actual behavior.

Ontology engineering has the same trap.

An agent can produce a plausible dental lifecycle, home-services dispatch vocabulary, or owner-first lead schema by remembering common business words. Plausible vocabulary is not the same as Hadto’s operating contract. The ontology has to accept state changes that really happen, reject impossible ones, preserve evidence, and answer the competency questions an operator depends on.

We created OntoMoBench to test that.

A buyer should be able to test whether the business model preserves owner proof, service state, and evidence across a real handoff. A seller handing over a home-services lead queue should leave a testable trail: verified owner relation, current request status, dispatch assignment, proof URL, proof date, and work evidence stay connected as the lead moves from intake to assignment. A model that drops owner proof, rewrites service state, or invents missing field evidence should fail the transition.

The first proof is the contract

The first shipped proof is not a scorer yet. It is the benchmark contract.

Hadto shipped the OntoMoBench design note in smb-ontology-platform PR #220. The merged artifact defines the first version of docs/operations/ontomobench-design.md.

That merged contract proves the benchmark has a public test shape before any score is claimed. Each fixture has to name the starting business state, the action being tested, the expected post-state, the required evidence, the competency questions, the invariants, and the diagnostics that should explain a failure. The PR is not proof that every ontology works; it is proof that Hadto has defined what an ontology must preserve when a business fact changes.

That note commits the benchmark to four checks:

  1. Syntax validity: RDF, OWL, SHACL, prefixes, manifests, and generated artifacts have to load cleanly.
  2. Runtime executability: competency questions, reports, vertical manifests, and extraction/evaluation harnesses have to run without hidden fallback data.
  3. Transition conformance: the ontology has to model business transition windows: pre_state, action, post_state, evidence, expected ontology deltas, CQs, invariants, and diagnostics.
  4. Invariant satisfaction: hard operating rules must survive every transition.

The invariant checks matter because they hold the business line.

An owner fact must be evidence-backed. An Active Contact needs a proof URL and date. A service request cannot jump from intake to complete without triage, dispatch, assignment, and work evidence. A dental appointment cannot become completed treatment just because it was scheduled. A generated prefix cannot become canonical because the model liked the name.

The unit of truth is a transition

The benchmark will not ask whether a whole ontology feels good.

It will cut business traces into transition windows.

One owner-first outbound window might start with a property record and an unverified owner candidate. The action verifies owner identity and activates a contact. The post-state must include the owner relation, source evidence, proof URL, proof date, and accepted contact status. If the agent invents the owner from marketing copy, the transition fails.

One home-services window might move a triaged HVAC request to a technician assignment. The post-state must show the dispatch assignment, technician fit, assignment evidence, and lifecycle change. If the system jumps straight to completed work, the transition fails.

One dental window might move an exam-completed appointment to an accepted treatment plan. The post-state must include consent evidence. A scheduled appointment alone cannot create completed treatment.

The shift is simple: ontology quality becomes observable at the action level.

The next proof is executable

The next project lane is the scorer.

The next public proof is an executable scorer: a tool that checks whether an ontology handles business transitions without inventing missing evidence. It will run fixture-backed regression tests and report deterministic results for syntax, runtime behavior, transition conformance, and invariant satisfaction.

The scorer will not use an LLM judge. It will return structured diagnostics that name the failed transition, mis-modeled action, changed CQ answer, broken invariant, rejected prefix, unsupported expansion, or missing evidence binding.

That gives Hadto a way to improve ontology agents without trusting their explanations. The proof has to show up in fixtures, commands, PRs, CI, and public notes.

How we will develop it in the open

OntoMoBench will ship in small increments. The release gates are contract, scorer, first fixture suite, expanded fixture suites, and a public note for each proof step. Each note has to say what shipped and what has not shipped yet.

This post covers the first gate. The contract is merged. The scorer is not.

The goal is not to make ontology work look formal. The goal is to find out when it actually models the business.


Source evidence used in this note: SIGOPS on SysMoBench, the SysMoBench repository, the SysMoBench arXiv paper, and Hadto’s shipped OntoMoBench benchmark-contract proof in smb-ontology-platform PR #220. Existing Hadto ontology posts checked for overlap included The ontology learned when the proof got better, AI should propose ontology candidates, not author the business model, and recent May 2026 ontology benchmark and business-fact notes.

← Back to all notes