← Writing
AI People·~15 min·Hugh Mercer·1 May 2026

Hallucination is a room design problem.

When an AI person hallucinates, it is usually because we asked them to leave the room. Here is what we learned.

Hallucination gets treated as a model quality problem. A failure of the training data, a shortcoming in the architecture, something the next model version will fix. In my experience, most of the hallucination I see in production AI is not a model failure. It is a room design failure.

What I mean by room design

When we build a specialist AI co-worker at Graaft, we define the room: the specific domain, the specific task set, the specific kinds of questions the co-worker is expected to handle. The room is not a tag or a category. It is a hard constraint on what the co-worker is responsible for knowing well.

The co-workers who perform reliably are the ones whose room is well-designed. The co-workers who hallucinate reliably are the ones we asked to operate outside their room, usually because the room was too loosely defined to enforce a boundary.

A hallucination happens when a model is asked a question it cannot answer with precision, and it produces an answer anyway, in a confident register, because confident register is what the training rewarded. The question is: why was the model being asked that question in the first place? In most production deployments, the answer is that nobody defined the room tightly enough to make the question impossible.

Hallucination is what happens when the room has no walls. The model does not refuse. It improvises.

The room boundary is the safety mechanism

This is a different frame from the one most AI safety discourse uses. The dominant frame is: how do we make the model better at saying it does not know? This is a useful question. It is also insufficient on its own.

A model that is good at acknowledging its limits in a vacuum can still hallucinate in a deployment where the question scope is ill-defined. The model does not have a map of the territory it is responsible for. It only has the question in front of it. If the question is outside the room, the model does not know it is outside the room, because the room was never made explicit.

The room boundary is the safety mechanism. Not just an instruction to refuse on certain topics. A designed constraint on the question set, enforced through the architecture of the deployment, not just through the model's training.

What good room design looks like

The hallucination risk is lowest when the room is most clearly designed. When the co-worker knows exactly where they are, what they are responsible for, and where the boundary is, the conditions for improvised confident wrong answers almost disappear.

Most hallucination problems are room design problems. Fix the room first.

The room boundary is not just an instruction to refuse on certain topics. It is a designed constraint on the question set. A co-worker who handles questions at the edge of their domain by routing appropriately is more valuable than one who either answers from general knowledge or refuses everything. The difference is design: the boundary was drawn, the adjacent territory was considered, and the response for each zone was specified before the build.

This does not mean model quality is irrelevant. It means model quality operates inside the room. The room is prior. The boundary is the discipline. The hallucination rate is the report card on how well the room was designed.

Every time I see a deployed AI co-worker producing confident nonsense, I ask one question: who designed the room? Almost always, the answer is that nobody did. The room design failure is usually invisible at the start and obvious at scale.

The co-workers we build know their room. That is not a metaphor for being well-trained. It is a description of the design work that preceded the training, the work that made the training possible, the work we do before we call something a specialist.

How hallucination presents in production

The hallucination problem in production AI is not random. It has a pattern, and the pattern is diagnostic.

Hallucination follows a recognisable pattern in production. It appears at the edge of the domain: the question adjacent to the co-worker's territory but not inside it. A procurement co-worker asked about employment law. A clinical reporting co-worker asked about pharmaceutical dosing. These questions are within the general intelligence of the model. They are outside the room. The model, lacking a hard boundary, answers anyway, from general knowledge, in the confident register that the room's training produced. The answer sounds like the co-worker. It is not the co-worker.

It appears at the intersection of sectors. A co-worker built for construction is asked about a health and safety obligation that sits in the intersection of construction law and occupational health regulations. The room covers one part of the question and not the other. The model fills the gap from general knowledge and does not signal that it has done so. The response is partially correct and partially fabricated, presented as a single coherent answer. This is the most dangerous hallucination mode in high-stakes sectors: the confident blend of known and invented.

It appears at the edge of recency. The domain knowledge built into the co-worker reflects the regulatory and operational environment at the time the construction was done. Regulations change. Procedures update. The co-worker answers from the room as it was, not as it currently is. The answer was accurate twelve months ago. It is wrong today. Neither the co-worker nor the user knows which version they are working with.

Each of these failure modes has a room design cause. The first is a boundary definition problem: the boundary was not drawn explicitly enough to prevent the model from treating adjacent territory as its own. The second is a domain intersection problem: the room was designed with a single sector in mind without accounting for the places where that sector's work requires adjacent sector knowledge. The third is a room currency problem: the room was designed but not updated, and the co-worker is operating in a room that no longer matches the actual environment.

Hallucination is not random. The pattern tells you which part of the room design failed.

Room design failures

Room design failures produce hallucination in predictable ways. Naming them makes them preventable.

A mining operations control room: the kind of high-stakes environment where room-designed AI co-workers must know exactly what they are responsible for, and what they are not
A mining operations control room: the kind of high-stakes environment where room-designed AI co-workers must know exactly what they are responsible for, and what they are not

The room without walls is the deployment where the system prompt establishes a persona and a topic focus but does not define the boundary explicitly. The co-worker is told to be a mining specialist. It is not told what falls outside that specialism. When a question arrives at the edge of the stated domain, the co-worker has no designed response. It does what the model does without a boundary: it answers from general knowledge and presents that answer with the confidence of a specialist. The wall was assumed to exist. It does not. Every question that finds the missing wall produces a hallucination risk.

The room that is too large is the deployment where the room was defined but defined broadly enough that the model cannot actually hold it with specialist precision. A co-worker designed to know "all aspects of Australian construction industry compliance" has been given a room whose walls are miles apart. Within that room, the model's coverage is uneven: dense on the topics the training data covered heavily, sparse on the topics it covered lightly, and confidently wrong at the places where it has thin coverage and no signal that the coverage is thin. The room design failure is optimism about how much a single specialism can hold at genuine specialist depth.

The unreviewed room is the deployment where the room was designed carefully at launch and not maintained. Regulatory changes occurred. Operational procedures were updated. The connected systems the co-worker interfaces with changed their terminology or their structure. The room reflects a version of the domain that no longer exists. The co-worker answers correctly against the room as designed and incorrectly against the room as it currently is. The hallucination is temporally displaced: it was accurate, once. It is not accurate now.

The room without walls. The room too large. The unreviewed room. Each is preventable. Each requires the room design work to have been done.

Designing the boundary in practice

The boundary definition work is the part of room design most commonly skipped or abbreviated. It is also the part whose absence produces the most expensive hallucination.

The boundary is not a content filter or a refusal list. It is a designed specification of what the co-worker is responsible for knowing, how it handles questions at the edge of that responsibility, and when it should stop and name the limitation rather than proceed. These are different problems and they require separate design decisions.

The boundary definition is the most important part of the brief. It is what prevents the model from filling the gaps in the room with confident improvisation.

The boundary definition work is the stage most subject to pressure to compress, because clients naturally want to focus on what the co-worker will do rather than what it will not. That pressure produces the room without walls: a co-worker who knows how to answer but not where to stop.

The room without walls. The model does not refuse. It improvises.
The room without walls. The model does not refuse. It improvises.

The most expensive hallucination in production is the one that could have been prevented at the brief stage if the boundary had been drawn.

Hallucination rate as a design metric

Once room design is understood as the primary variable in hallucination risk, hallucination rate becomes a useful design metric rather than a product quality verdict.

A high hallucination rate in a production deployment is information about the room. The pattern of the errors tells you where the room design failed: whether the boundary was drawn too loosely, whether the domain was scoped beyond what a single specialism can hold with genuine precision, or whether the room was not maintained as the domain changed. The diagnosis is available because the room design work produced a baseline to audit against.

A deployment that was stood up from a generic system prompt without explicit room design has no baseline. The hallucination rate is a symptom of an unknown cause. Fixing it requires doing the room design work that should have been done before deployment.

Hallucination rate is a report card on room design. High rates diagnose the room, not the model.

This framing changes the relationship between the AI development team and the domain experts. In the standard model, domain experts are consulted at launch to review output quality. In the room design model, domain experts are involved from the brief stage and remain the appropriate resource when the hallucination rate indicates the room has drifted from the domain. Their role is ongoing, not consultative at launch.

The room after the model improves

Model capability is improving consistently. Each generation of foundation models is better at acknowledging uncertainty, more accurate across a wider range of domains, and more capable of maintaining precision in long-context conversations. A reasonable question: as models improve, does room design become less important?

The answer is that room design becomes more important as models improve, not less.

A more capable model fills the gaps in a poorly designed room more convincingly. The confident wrong answer from a capable model is harder to identify than the confident wrong answer from a less capable one. The seams in the room, the places where the boundary was not drawn and the model is improvising from general knowledge, are more invisible in the output of a capable model because the improvisation is more plausible.

The safety case for room design is strengthened by model capability improvements, not weakened by them. As the model gets better at producing convincing output, the gap between output that was generated from genuine domain knowledge and output that was generated from capable improvisation becomes harder to detect from the surface of the response. The room design is what makes the distinction structurally enforced rather than detectable only by expert review.

The model capability improvement that makes room design easier is better adherence to boundaries when boundaries are defined. A more capable model, given a well-designed room with an explicit boundary, is better at staying inside the room. The improvement in boundary adherence makes well-designed rooms perform better over time. The improvement in general capability makes poorly designed rooms more dangerous over time.

As models improve, room design matters more. Better capability produces more convincing gaps.

The co-workers we build are designed to perform well with current models and to perform better as models improve, because the room design is the structure that captures the capability improvement rather than the structure that is rendered unnecessary by it. We are not building rooms that will be bypassed by model improvement. We are building rooms that will hold a better model.

The accountability argument

There is a structural argument for room design that sits behind the quality and safety arguments: accountability.

When a specialist AI co-worker produces a harmful response in a high-stakes sector, the accountability question is not only whether the model failed. It is whether the deployment was designed responsibly. A deployment that included explicit room design, documented boundary definitions, and a calibrated domain construction can demonstrate that it was built to minimise the conditions under which hallucination occurs. The design decisions are auditable. The boundary was drawn. The adjacent territory response was specified. The core domain was calibrated against practitioner judgment. When something still goes wrong, it is diagnosable and the design process is defensible.

A deployment without room design cannot make this case. The response that caused harm was produced because the boundary did not exist. Nobody drew it. The domain was not calibrated. The hallucination was not the model surprising the design. The hallucination was the design's absence doing exactly what an absent design does.

In sectors where AI deployment errors have serious consequences, the accountability argument for room design is not a quality argument. It is a due diligence argument. The question is not whether room design produces perfect output. It is whether the absence of room design is a responsible deployment decision given the stakes involved. In health and aged care, in construction, in mining, in infrastructure: the answer is that it is not.

The room design work is documentation of responsible deployment. It is the record that the deployment was built for the specific context with specific care for what the specific context requires. That record is part of what it means to build AI co-workers for high-stakes sectors. The disclosure line says the work was done. The brief is the evidence that it was.

The accountability argument also makes room design part of the client conversation in a way that pure quality arguments do not. Clients who understand that room design is the foundation of a defensible deployment are more willing to invest time in the brief sessions, because the brief sessions are not only about producing a better product. They are about building the documentation that makes the deployment accountable. A client who signs off a well-designed brief is not only approving the co-worker. They are approving the design of the room, and taking on the shared accountability that comes with it.

Most of the most important work in building specialist AI happens before the model runs. Room design is that work. The hallucination rate is the report card. A low hallucination rate in production is not evidence of a capable model. It is evidence of a room that was designed well enough to prevent the conditions under which hallucination occurs. That distinction matters because it tells you where to invest: in the model, which you do not control, or in the room, which you do.

Hugh Mercer

Hugh Mercer

UX Designer

Hugh is Graaft's UX designer, with a focus on AI-interface experience: trust architecture, intent repair, and escalation flows. His view is that most onboarding is built for the people who designed it, not the people who actually use it.