AI People·~15 min·Dave Derman·3 May 2026

How we build a co-worker who actually knows the room.

Domain knowledge in specialist AI is constructed, not inherited. Here is how we do it.

An archive workroom with tall shelves of lever-arch folders and books. Pendant lights overhead. An open notebook of hand-drawn diagrams on a desk beside a leather satchel.

The question I get asked most often about our sector-specialist AI co-workers is some version of: how do you make it know what it knows? The assumption behind the question is that you feed a model a corpus of sector content and expertise appears. It does not work that way.

Domain knowledge in a specialist AI co-worker is not imported. It is constructed.

What construction means

A general-purpose model knows a great deal about most industries. It has been trained on technical manuals, regulatory documents, industry publications, and the accumulated written output of a hundred sectors. If you ask it about mining, it will answer competently. If you ask it about MSHA compliance timelines, it will make a reasonable attempt.

The problem is not ignorance. The problem is that general knowledge is not the same as operational knowledge. A mining engineer working on a specific operation knows things that are not in any published document: the way the site superintendent prefers to receive a safety observation, the vocabulary the ground-control team actually uses in a shift handover, the difference between what the ventilation plan says and what the crew checks first. General knowledge does not contain this. It cannot.

Domain knowledge, the kind that makes an AI co-worker genuinely useful on a working site rather than passably useful in a demo, is constructed from the specific. Not from the category, but from the instance.

General knowledge tells a co-worker what a sector knows. Domain knowledge tells them what this operation does.

How we construct it

Construction means building the domain from the ground up rather than relying on what a general model already contains. The co-worker needs to know things that are not publicly available: the vocabulary that practitioners actually use versus the vocabulary in the documentation, the gap between written procedure and how the work is actually done.

Calibrating the co-worker's responses against practitioners who know the sector from the inside is the stage that surfaces where the general model and the specialist diverge. Boundary definition is what makes the specialism real: domain knowledge without a clear boundary is dangerous. A co-worker who knows a great deal about two sectors simultaneously, but knows neither with specialist precision, is worse than a co-worker who knows one thoroughly and declines on the other.

Domain knowledge is not the model's memory. It is the design of the room the model operates in.

That distinction matters more than it sounds. When something goes wrong, it tells you where to look: the room, not the model. When something works well, it tells you what to replicate: the construction, not the raw capability.

The co-workers we build at Graaft know their sectors because we built the domain before we built the co-worker. The knowledge is not added at the end. It is the structure the co-worker is built inside.

Why this is slow

The honest answer is: longer than any client initially expects, and shorter than doing it without a structured process.

The time-consuming part is that the most important information is not usually written down. The gap between official procedure and actual practice exists in conversations with experienced practitioners, in the patterns of questions that come up repeatedly on a site, in the places where the documentation is technically correct but operationally misleading. Getting to that knowledge takes time that cannot be compressed.

Clients who rush this stage get a co-worker who passes the demo and fails the first edge case. Clients who invest in it get a co-worker who keeps improving as the deployment deepens, because the underlying domain was built correctly.

The vocabulary gap

The fastest way to tell whether a co-worker has genuine domain knowledge or only general knowledge is the vocabulary it uses.

Every sector has two vocabularies: the official vocabulary and the operational one. The official vocabulary is in the manuals, the regulatory documents, the training materials, the job descriptions. The operational vocabulary is what people actually say when they are doing the work. The gap between them is real, and it matters enormously for whether a co-worker feels like an insider or an outsider to the people using it.

In mining, the official vocabulary for ground support includes "rock bolts," "mesh," and "shotcrete." On the ground, the crew might call them "bolts," "mesh," and "gunite," with subtle local variations depending on the site's history and the backgrounds of the workers who set the vocabulary years ago. A co-worker who uses only official vocabulary is legible but alien. A co-worker who knows the operational vocabulary is present.

The operational vocabulary also carries information about the work that the official vocabulary omits. When a health and aged care worker says "she's been a bit off today," they are communicating a clinical observation with a specific register of urgency that the official equivalent would need three sentences to capture. A co-worker that can receive this vocabulary and respond in kind, without requiring the user to translate into official language, is doing something the official vocabulary cannot do.

The gap between official vocabulary and operational vocabulary is where domain knowledge lives. General models know the official. Construction is what it takes to know the operational.

The vocabulary gap is one of the first things we look for in the construction process. The divergences between official and operational vocabulary are where the domain knowledge that no published document contains actually lives.

Two vocabularies. The official one is in the manual. The operational one is on the margin.

Calibration in practice

Calibration is where the most consequential design decisions get made.

It involves experienced practitioners evaluating the co-worker's responses, not against a rubric developed by the studio, but against their own judgment. The question is not: is this response technically correct? The question is: is this the response that an experienced person in this sector would give, in this register, with this level of detail, in this situation?

The gaps between the co-worker's responses and what practitioners would say become the signal the construction responds to. Some gaps are knowledge gaps. More often they are register gaps: the co-worker has the right information but is expressing it in the wrong way for the room.

Calibration is where practitioner judgment becomes construction material.

Calibration also builds something that desk research alone cannot produce: a working relationship between the studio and the practitioners who know the sector from the inside. The practitioners evaluating the co-worker are doing more than identifying gaps. They are transferring the judgment that defines their expertise, the sense of what a correct response sounds like in this room, into the construction. Without that transfer, the domain is technically disciplined but not room-accurate. Both are required.

Calibration is the most subject to pressure to compress. It is the stage we protect most carefully.

What boundary definition produces

Boundary definition is the stage that has the most visible effect in production.

A co-worker with a well-designed boundary does something a general-purpose model cannot replicate reliably: it knows when a question is inside its domain and when it is not, and it handles both cases with appropriate confidence. Inside the domain, with the precision and register of an experienced practitioner. Outside it, by naming the limitation and routing the user to a better resource rather than producing a plausible-sounding response from general knowledge.

Boundary definition produces confident precision inside the domain and confident acknowledgment of limitation outside it. Both require the same design discipline.

The production benefit is a co-worker that users trust because it is consistently honest about what it knows and what it does not. Trust is built faster by a co-worker that says "I don't have that, here is where to find it" than by one that answers every question with apparent confidence. The boundary is not a restriction. It is the source of the co-worker's credibility.

The failure without construction

When a general model is deployed with a sector-specific system prompt but no underlying domain construction, the failure mode is predictable.

The model produces output that is accurate in the general sense. The facts it cites are real. The vocabulary is technically correct. The advice is consistent with published guidance. In the majority of interactions, the class of questions the published guidance covers directly, the output holds.

The failure appears in a recognisable pattern. Edge cases that require operational rather than textbook knowledge expose the gap first: the question that arises in a specific context, where the correct answer depends on details no published document contains because those details are specific to this operation, this site, this team. The general model answers these questions with confidence. It answers them generically. The experienced practitioner knows the generic answer is not right for this situation. The user often cannot tell the difference.

Register fails alongside accuracy. The general model responds in the register of a knowledgeable informant: accurate, well-structured, professional. It does not respond in the register of the room. A health and aged care worker asking about a resident who has been off their food for three days does not need a structured clinical overview of appetite changes in aged care. They need a response calibrated to the urgency of the situation, the specific observations worth recording, and the threshold at which the GP should be contacted. The general model does not know these things unless the room was constructed.

Trust follows. The first time a user encounters a response that is technically correct but wrong for the room, they flag it to a colleague. The colleague confirms it is wrong. The user adjusts their relationship with the co-worker: they begin treating it as a reference to be checked rather than a specialist to be relied on. The trust loss is structural. The co-worker was not built for the room, and the person in the room can feel it.

These three failures are not detectable in a standard quality assessment of the co-worker's output. Technical correctness, response completeness, and format adherence all pass. The failure only surfaces in the gap between what the output says and what an experienced practitioner in the room would have said. Measuring that gap requires knowing what the experienced practitioner would say, which requires the domain audit to have been run.

The general model deployment without domain construction is a common shortcut. It is faster to stand up and cheaper in the initial phase. It also produces co-workers who pass the capabilities review and fail the room: the failure that costs the most to recover from, because the user who stopped trusting the tool and returned to doing the work manually rarely comes back.

The cost of the shortcut is not the failure itself. Failures are diagnosable. The cost is the time spent diagnosing a failure whose root cause is not in the build but in the missing stage before the build began. When domain construction was skipped, debugging the deployment is an exercise in reverse-engineering the room design work that should have preceded it. That exercise is slower and less reliable than doing the design work correctly at the start, because the deployment context is already contaminated by user experience with the under-specified co-worker.

How construction changes the working pattern

The practical difference between a co-worker built with domain construction and one deployed as a general model with a sector prompt shows up most clearly in the third month of a production deployment.

Early in a deployment, both kinds of co-worker can appear similar in performance. The early questions tend to be standard questions: the ones the general model handles adequately and the specialist handles with precision. The difference is visible but not dramatic. Users are in the phase of learning what to ask.

As deployment matures, the question set expands to include the edge cases. The users who have built the most trust with the co-worker are asking the hardest questions: outside the official documentation, requiring operational knowledge rather than textbook knowledge, depending on what this operation actually does rather than what the sector standard says.

This is where the construction investment shows. The co-worker with genuine domain knowledge responds to these questions with the register and specificity of an insider. It has absorbed the operational vocabulary, the gap between procedure and practice, the kinds of edge cases that arise regularly in this sector and this kind of operation. The co-worker without domain construction responds with the register of a well-read outsider: technically adequate, contextually wrong.

The third-month divergence is also when the operational trust pattern becomes visible. Users who work with a genuinely specialist co-worker bring harder problems to it, because the track record of responses to hard problems has been good. The co-worker's use deepens. Users who work with a general model start routing around it: using it for the simple questions and handling the hard ones with existing resources. The use pattern flattens.

What construction produces, at its best, is a co-worker the operational team starts to rely on for the questions they would previously have reserved for a senior colleague. Not because the co-worker replaces that colleague, but because the co-worker holds the class of question that should not require escalation to a senior colleague, which is most of them. The senior colleague becomes available for the questions that genuinely require their judgment, because the co-worker is holding the surrounding territory.

What good domain knowledge enables

When the domain construction is done well, it enables something that general models cannot achieve even with extended prompting: context-carrying responses.

A co-worker with genuine domain knowledge does not treat each question as isolated. It carries the context of the sector, the operation, the workflow into each response. When a mining co-worker is told that the ventilation readings on Level 4 are elevated, it does not respond to that sentence in isolation. It connects the elevated readings to the standard for the ore type being mined, the maintenance schedule for the ventilation equipment, the shift handover that happened two hours ago. The response is integrated because the domain knowledge is integrated.

This is the practical difference between constructed domain knowledge and retrieval-augmented generation. A RAG approach can pull relevant documents when a question is asked. A co-worker with constructed domain knowledge holds the context continuously, so that the integration happens before the question is asked rather than in response to it.

The integration quality is also what makes specialist AI co-workers useful in the kinds of high-complexity, time-pressured situations where they are most needed. A mining engineer at a shift handover does not have time to ask good questions. They need a co-worker who already knows what is relevant and leads with it. That kind of performance requires domain knowledge that was built in, not retrieved on demand.

The production benefit of this is most visible in the conversations the co-worker is not having. A well-constructed specialist handles the standard questions efficiently, which is expected. It is the edge cases, handled with the confidence and register of an insider, that earn the relationship. Those responses are not a product of the model's general capability. They are a product of the specific work done before the build began: the domain audit, the calibration, the boundary definition. Slow work at the start. Specific value in production, for as long as the deployment runs.

The domain construction process is slow. It is also the only process that produces a co-worker who genuinely knows the room. Everything else produces a co-worker who knows about the room. That is a different thing to be, and a less useful one for the person who needs the answer at 5am with two supervisors waiting. Construction is the work that closes that distance. It is the only work that does.

Dave Derman

Co-founder, Product Innovation & Engineering

Dave is co-founder of Graaft, based in Perth. He sets the engineering and product-innovation direction, runs the front of every client engagement, and builds the infrastructure that makes AI products perform, evolve, and grow in production.