Hadto note
The real data point is the control
King's College London researchers show why one outside data point can prevent model collapse. Hadto's lesson is source governance: reality has to stay in the loop.
Why this matters
This post shows how control rights, capital order, and review rules stay visible before launch and during downside scenarios.
Why this note is here
Principle: States a principle Hadto expects to keep using.
What supports it: Uses evidence, definitions, and cause-and-effect.
A real-world source anchor is not decoration. It is the control that keeps generated summaries and ontology changes from averaging the business away.
The most useful lesson in the King’s College London model-collapse result is not that one magic example fixes AI.
It is that a closed loop needs an outside witness.
King’s College London reported on research into AI “Data Cannibalism” and model collapse. The underlying Physical Review Letters paper, “Lost in Retraining: Closed-Loop Learning and Model Collapse in Exponential Families”, is narrow enough to take seriously. In Exponential Family statistical models, closed-loop maximum-likelihood training on model-generated data leads to collapse. Add one data point from outside the loop, or add a prior from previous knowledge, and collapse is prevented. Even an infinite volume of machine-generated data does not erase that outside point.
Researchers also found similar evidence in Restricted Boltzmann Machines and plan to test larger neural networks. This should not be treated as a direct production recipe for every large language model. It is a principle discovered in simpler models.
The principle is enough.
Danger starts when a system teaches itself from its own output until the original world becomes optional. More fluent output does not fix that. A reality anchor does, because the loop does not get to invent it.
Hadto runs near the same failure mode.
The summary is not the witness
Generated summaries are useful. A good one compresses source material into something an operator can read. It can compare sources, extract candidate concepts, draft playbooks, propose ontology changes, and prepare review packets.
They can also become a closed loop.
First, a summary drops a strange exception because it looks like noise. Next, another summary reads that omission as absence. An ontology proposal reads both and decides the normal case is the whole case. A dashboard shows cleaner categories. Soon the next agent sees the dashboard and concludes the business is simpler than it is.
Nobody had to fake anything. The system simply learned from its own cleaned-up reflection.
At that point, the business gets averaged away.
A real source anchor interrupts that pattern. It says: this claim came from this invoice, this dispatch note, this payer manual, this owner interview, this customer complaint, this photo, this state policy page, this email, this failed handoff. The generated version does not get to become the only version.
The source anchor is not a citation ornament. It is a control.
Ontology work needs outside witnesses
Hadto’s ontology work is built around business meaning: jobs, owners, claims, rules, exceptions, authority, proof, handoff, and closure. Generated ontology candidates can sound right while being wrong in the places that decide money and responsibility.
Take a home-services callback.
Suppose the model sees repeated notes about warranty visits and proposes one clean category: WarrantyCallback. That category may be useful. It may also hide five different business facts: workmanship failure, manufacturer defect, customer access failure, goodwill work used to save an account, and a sales promise that never should have reached dispatch.
A summary may flatten them. The business cannot.
Reality lives in the job record where the owner refused to bill the customer because the sales promise was bad. It lives in the photo packet that showed the technician did the work correctly and the part failed. It lives in the dispatch note where the office manager moved a loyal customer ahead of a normal queue because retention mattered more than route purity that day.
These cases are governance facts, not edge decoration.
If Hadto lets generated summaries promote a clean category without preserving those cases, the ontology becomes easier to read and worse to operate. The next owner inherits a tidy model that lost the distinctions the old operator used to protect margin, trust, and accountability.
Minority business facts need protection
This model-collapse result makes the minority case harder to dismiss.
One outside data point can matter even against an infinite pile of generated data. In business terms, a rare real case may carry more governing force than a large set of repeated summaries.
Automation systems dislike that because rare cases look inefficient. A rare case interrupts the normal pattern. It makes the type system less pretty. It forces the workflow to ask one more question. It makes dashboards less clean.
But owner/operator systems live or die on those cases.
A minority fact might be the only Medicaid plan lane where the source authority changes. Another might be the one dental attachment route where the proof timing matters. In a service business, it might be the promise that changes who can approve a discount, or the recurring customer exception that explains why a scheduler is doing something that looks wrong from the outside.
Treating those facts as noise trains the next operator to miss the business.
Source-backed exemplars exist for exactly this reason. Accepted ontology distinctions should have live examples behind them. Generated summaries should keep a path back to the source cases they compressed. Rejected ontology changes should say which real cases blocked them. Each rule should know whether it came from a statute, a payer manual, a customer promise, an owner interview, a job trace, or an inferred pattern.
The source role matters as much as the source link.
A real data point changes governance
If real data is an anti-collapse control, governance has to treat source intake differently. The question stops being: did the agent cite something? Better governance asks: which outside witness is still allowed to correct the loop?
A generated research note should not only summarize the source. It should preserve the source role, review status, capture date, domain lane, authority level, and open uncertainty. A generated ontology change should not only list proposed classes. It should attach exemplar cases, counterexamples, and source-backed competency questions. Dashboards should not only show clean categories. They should let the operator inspect the cases that made the category true.
The outside witness also changes promotion rules.
A summary can propose. A source-backed exemplar can constrain. A prior can guide. A review gate can decide. The generated artifact does not get to promote itself just because it is coherent.
That governance lesson sits under the research result.
Real data is not just training material. It is a veto surface. Outside evidence can stop the loop from concluding that the world is only the shape of its own output.
The Hadto rule
Turn that into a plain Hadto rule:
No generated summary, ontology proposal, dashboard category, or playbook change becomes governing memory unless the source anchor remains inspectable.
Practice follows from that rule.
Keep the raw source link when a public source exists. For sources that may change, keep the captured excerpt or manifest. Field-work facts need the operator interview note beside them. Preserve the negative case that blocked a category. Preserve the weird case that forced a split. Do not discard the minority plan lane, unusual claim path, exception approval, or handwritten office rule when those facts decide the work.
Then make the system use them.
When an agent proposes a category, ask for the outside witness. A summary that drops a distinction should name which source allowed the loss. A dashboard that looks cleaner after generated consolidation should show which minority cases were preserved. For any ontology update that claims a pattern, ask which real cases would falsify it.
Treat this as collapse prevention for business memory, not paperwork.
People usually frame model collapse as an AI safety or training-data problem. Hadto reads it as an operating doctrine.
Closed loops drift toward their own reflection. Real sources interrupt them. Provenance is not a footnote after the answer. It is the mechanism that lets the next operator prove the business is still attached to reality.
If Hadto wants AI systems that create owners instead of prettier reports, the source anchor has to stay in the loop.
One real data point can keep a model honest.
One real business case can keep an ontology honest.
Source evidence used in this note: King’s College London news release, Scientists come up with way to overcome AI ‘Data Cannibalism’, published 2026-05-14 and reviewed 2026-05-18. Physical Review Letters article, Lost in Retraining: Closed-Loop Learning and Model Collapse in Exponential Families, and the public arXiv preprint, reviewed 2026-05-18. Hadto interpretation: source anchors as governance controls for ontology promotion, owner/operator business memory, and generated projections as derived review surfaces rather than ontology contracts.
Follow this concept
- Read the senior lending path behind capital priority
Trace how collateral, covenants, reporting, and workout control sit above junior claims.
- Read the community investor rights and limits
Check how junior economic rights, information rights, and liquidity limits are explained.
Read next
- Henry is a good sign
Contrast: Shows a path Hadto does not want to copy.
- AI governance needs stress tests before authority
Operating rule: Turns an idea into a rule an owner or operator can use.
- AI systems need separate value and coordination surfaces
Operating rule: Turns an idea into a rule an owner or operator can use.