Hadto Journal
Semi-structured sources still need a review gate
Keet’s latest bottom-up ontology lessons sharpen a practical rule for Hadto: spreadsheets, document stores, and property graphs can supply useful candidate signals, but they should not become ontology commitments without an explicit lifting and contradiction-review step.
A lot of software teams talk about “importing” knowledge as if the hard part were file format conversion.
Keet’s latest Chapter 7 study pass points to a stricter rule: a semi-structured source is not ontology truth. It is input to be reviewed.
That distinction matters for Hadto because the company is trying to turn messy operating reality into reusable owner-operator infrastructure. If the platform starts treating spreadsheet headings, document-store attributes, or graph predicates as if they were already clean statements of the business, it will formalize local accidents instead of repeatable business meaning.
Why semi-structured feels safer than it is
Semi-structured data often looks more expressive than a rigid relational schema.
A document store can carry optional attributes that seem close to real-world nuance. A property graph can look like relationship-rich domain structure. A spreadsheet can look approachable enough that a domain expert might help shape the model directly.
That apparent flexibility is useful. It is also risky.
Flexible sources can hide several different problems at once:
- the same idea appears under slightly different labels,
- optional fields quietly imply uncertain cardinality,
- contradictory source assertions coexist without an explicit repair path,
- column headings or node labels mix business meaning with local shorthand,
- what looks like a reusable class is actually only a one-off data value,
- what looks like ontology structure should really stay in application logic or instance data.
A format can be easy to ingest while still being semantically unstable.
The practical Hadto lesson
The new study notes extend Hadto’s earlier semantic-lifting lesson.
The first Chapter 7 takeaway was that bottom-up ontology work is not schema export. The newer Sections 7.2.1 and 7.2.2 make the warning sharper: even when the source is more flexible than a database table, the same governance problem remains.
For Hadto, that means three things.
1. Document and graph attributes are only candidates
A document attribute or graph predicate can suggest a useful concept, property, or workflow edge. It does not prove one.
Before it becomes an ontology commitment, someone still has to decide:
- what the term actually means,
- whether it is stable enough to reuse,
- whether it belongs in the ontology or only in source data,
- whether duplicate labels should be merged,
- whether conflicting source claims need to be resolved first.
Without that review, the ontology becomes a cleaner-looking copy of source drift.
2. Constraint checking has to happen before promotion
Hadto already has downstream validation surfaces for ontology and graph artifacts. Those are useful, but they are not the same as a pre-ontology review gate.
The key issue is timing.
If a semi-structured source is lifted into ontology form before contradiction review, cardinality review, or duplicate handling, the platform may already be treating a source quirk as part of the business model. By the time validation runs, the system is checking a commitment that should still have been under review.
In other words: SHACL-style conformance is not the same thing as deciding whether a candidate source assertion should have become ontology structure at all.
3. Spreadsheets can help, but only under a declared pattern contract
Spreadsheets are especially tempting because they feel operator-friendly.
That makes them powerful for Hadto’s mission. If domain experts can contribute through a familiar surface, the platform could eventually capture more of the business without demanding that every operator think in OWL.
But that only works if the spreadsheet is treated as a governed authoring surface, not a free-form ontology feed.
A useful spreadsheet workflow would need explicit patterns:
- which columns represent candidate terms,
- which rows express allowable axiom shapes,
- which values remain examples or data,
- which review step approves promotion into shared ontology artifacts,
- which provenance record explains what was accepted, rejected, merged, or revised.
Otherwise the spreadsheet becomes another place where accidental vocabulary hardens into platform structure.
Why this matters for owner-operators
Hadto is not building ontology infrastructure as an academic exercise. The point is to make business systems transferable.
An owner-operator should be able to inherit a business model that reflects how the company really works, not one that accidentally mirrors a vendor’s export format, a team’s naming habits, or a one-off spreadsheet built under deadline pressure.
That is why this issue connects directly to the mission of turning employees into business owners.
When semantic-lifting decisions stay implicit:
- operators inherit hidden modeling assumptions,
- automation reasons over yesterday’s implementation shortcuts,
- handoffs get weaker because nobody can explain why a term became part of the ontology,
- the business stays dependent on the people who remember the source-system quirks.
A proper review gate makes those choices visible. That is what allows knowledge to be handed off instead of merely copied.
The standard Hadto should uphold
A venture platform should be able to say, for every imported candidate structure:
- this became ontology because the business meaning is stable and reusable,
- this stayed instance data because it describes local facts, not reusable categories,
- this remained an application enum or model because it supports implementation, not shared conceptual structure,
- this was rejected or merged because the source was duplicative, contradictory, or under-specified.
That is the real boundary between learning from business systems and becoming trapped inside them.
The practical takeaway
Semi-structured sources are valuable precisely because they expose business language before it has been perfectly normalized.
But that is also why they need governance.
For Hadto, the next useful rule from Chapter 7 is simple: do not let flexible source formats skip the semantic-lifting discipline. A document store, property graph, or spreadsheet should feed a reviewed candidate queue, not silently define the ontology.
That is how a platform learns from real operations without letting raw source structure decide what the business is.
Source evidence used in this note: smb-ontology-platform/docs/plans/2026-03-31-keet-ontology-engineering-progress-tracker.md (2026-04-12 entries), smb-ontology-platform/docs/issues/ONT-026-add-semantic-lifting-governance-for-source-schema-classification-and-app-boundaries.md, and existing Hadto blog posts reviewed to avoid duplicating prior notes on semantic lifting, AI-assisted candidate generation, and relation-governance work.