Digital Expert Panel, and worked example: Pain

Clinical decision making frequently undergoes expert reviews, often using Key Opinion Leaders, and other experts.

This is a human-led process, which applies in a variety of contexts such as guideline committees, Delphi rounds, and peer review. These methods, while valuable, suffer from:

limited reproducibility
inconsistent reviewer expertise
cognitive biases (anchoring, group-think, sunk-cost effects)
long turnaround times
limited reproducibility when evidence or information changes
and can be slow and hard to manage.

My development of a Digital Expert Panel (DEP) addresses these limitations by creating a structured, evidence-based, reproducible simulation of expert deliberation which enables uses to test out their thinking and likely outcomes before putting people in a room.

Technically, the DEP functions as a multi-agent reasoning machine learning system composed of digitally instantiated expert personas (drawn from global expertise), each grounded in published literature, guideline frameworks, and known disciplinary heuristics.

The core objective is to create an understandable and testable approach to a reasoning and decision making process. involving a mix of ‘minds’. The focus is where clinical plausibility of medicines, devices, protocols and any clinical topic is under scrutiny and where an expert consensus is desired, in this case a digitally simulated but high verisimilitude result.

DEP is designed to model decision making by experts in these illustrative areas:

Assess medication relevance, interactions, and therapeutic logic.
Identify omissions in dossiers, e.g., un-modelled variables or unrecognised risk modifiers.
Expose and mitigate cognitive biases of product development or in the human designers.
Generate consensus and dissent reports with justification and documenting the chain of logic or reasoning for later scrutiny.
Support review of regulatory-grade documentation demonstrating systematic expert oversight.

The DEP consists of modular expert personas modelled as semi-autonomous reasoning agents with:

distinct evidence bases and areas of expertise
distinct heuristics and cognitive style
distinct weighting preferences
a defined role in the ecosystem.

This creates an epistemic diversity similar to what we see in the membership of real committees, but with more explicit behaviour, fewer scheduling conflicts, and no risk of interpersonal politics (one of the great virtues of digital committees).

Heuristics: Group Think, Symmetry of Ignorance, JoHari Window

Expert groups can suffer from a variety of decision and reasoning failures, including groupthink and the unmanaged symmetry of ignorance. Groupthink has been widely discussed, but the symmetry of ignorance has received less attention. The basic idea is straightforward: the expert knows something important, but is also ignorant of something important that someone else knows.

This is not simply a problem between experts and non-experts either. In a room full of experts, the symmetry of ignorance can arise among the experts themselves. Each participant brings a particular disciplinary lens, body of evidence, professional experience, and set of assumptions. Experts often reason from what is visible within their own domain, and may energetically encourage colleagues to adopt their framing of the issue under discussion. The task of an expert panel, therefore, is not merely to aggregate individual opinions, but to assemble distributed knowledge and expertise in service of the decision at hand.

The symmetry of ignorance asks a crucial question: What does this expert not see because of where they stand?

The Johari Window provides a practical way to visualise this problem. In expert reasoning, the critical issue is not simply the difference between expert and non-expert knowledge, but the distribution of open, hidden, blind, and unknown knowledge across the whole reasoning system. Medical error often occurs when clinically relevant knowledge remains hidden, blind, or unknown, rather than being brought into the shared reasoning space.

Blind spots are particularly important. Within the symmetry of ignorance, expert blind spots are not peripheral defects; they are often central to the failure of collective reasoning. The Johari Window can therefore be used as a heuristic device to improve insight into what is known, what is assumed, what is concealed, and what remains outside the current frame of analysis.

One area for further work concerns the “unknown” domain. This is where AI models may have particular value: not by replacing expert judgement, but by detecting latent signals, patterns, or risk states that human decision-makers have not yet recognised. In this sense, AI can help move information from the unknown or blind domain into the open reasoning space, provided its outputs are interpretable, contestable, and clinically meaningful.

My model is designed specifically to address this symmetry problem. The symmetry of ignorance is the condition in which all participants in a complex problem possess some necessary knowledge and some consequential ignorance. Good reasoning therefore requires not merely expert analysis, but the structured integration of distributed perspectives, evidence, values, and context.

The DEP is designed as one way to bridge that challenge. Its purpose is to help human decision-makers structure a more productive reasoning process: one that exposes blind spots, integrates distributed knowledge, makes uncertainty visible, and improves the quality of collective judgement.

For more information, Dr Mike Tremblay, mike_tremblay@skythunder.net

The following is a worked example of a DEP engagement.

I have developed a pain predictor, described on this site under Tech Transfer heading. This is speculative work, and so was submitted to the DEP for analysis. The following is the output of the DEP. It has not been reviewed by a human.

The DEP encounter was structured as a three-step Delphi process, a common approach used in consensus panels. At the end of the process, I then feed the results of the DEP into OpenAI’s LLM using the clinical persona I developed. All the results are here for inspection. I’d be interested in comments and reactions.

DEP OUTPUT >

Panel membership and remit:
To review the current design of a clinical tool for assessing patient pain, with specific focus on:

Whether the relative weights assigned to key features (vital signs, prior pain, function, medication) are clinically defensible.
Whether the medication features (drug classes, dosing, timing) are clinically sound and sufficiently complete for a real-world use case.

DEP was given a machine learning pain predictor in all its algorithmic and clinical detail. There was not prompting.

Model Assumptions to be Reviewed

Predicts probability of moderate-to-severe pain over the next 24–48 hours.
Uses vital signs and time-series patterns as major objective inputs.
Includes medication features: opioid dose (MME-like), NSAID use, gabapentinoids, some antidepressant adjuvants.
Uses a non-linear exponent (k) to capture diminishing or amplifying effects of certain variables (e.g. meds, vitals).
Current design heavily weights vital signs and recent opioid dosing; psychosocial and functional factors are present but lighter.

Note: As we don’t have the literal numeric coefficients in front of us here, the panel evaluates the structure and relative emphasis rather than exact numbers.

Panel members (digital personas, names do not refer to any person living or dead)

Dr Alice Ward – Consultant in Pain Medicine
Domain expert in acute and chronic pain, perioperative and cancer pain.
Prof Ravi Patel – Clinical Pharmacologist
Focus on PK/PD, analgesic mechanisms, interactions, dose-response.
Dr Sarah Lind – Behavioural Medicine Psychologist
Expertise in pain behaviour, sleep, mood, adherence.
Dr Jonas Müller – Biostatistician & Machine Learning Methodologist
Focus on feature weighting, calibration, overfitting, non-linear models.
Dr Helen O’Connor – Patient Safety & Quality Lead
Focus on harm, opioid safety, over- and undertreatment, system risk.
Dr “C” – Contrarian Clinician–Scientist
Role is to challenge assumptions, surface blind spots, and test robustness.

Delphi Round 1 – Individual (blinded) assessments

In Round 1, each expert reviews the Pain Predictor independently and submits a structured assessment.

Dr Alice Ward – Pain Specialist

On feature weights: The current architecture places substantial weight on vital-sign deviations (heart rate, blood pressure, respiratory rate) as proxies for pain. Clinically, these variables can correlate with acute, severe pain, but in chronic pain—particularly in older adults or patients on beta-blockers—the relationship is often weak or absent.

I would:

Reduce reliance on vital signs as primary drivers, especially for chronic non-malignant pain.
Increase weight on:
- prior trajectory of pain scores (if available),
- functional markers (mobility, sleep disturbance, interference with activity),
- and response to previous medication changes.

The exponent (k) for medication effect seems too optimistic: it assumes a relatively smooth and monotonic reduction in predicted pain with increased dosing. In reality we see diminishing returns and toxicity risk, not a simple ‘more drug, less pain’ curve.”

On medication features

“The current medication set is a reasonable start: opioids, NSAIDs, gabapentinoids, some antidepressants. However:

It does not yet distinguish short-acting vs long-acting opioids in any meaningful way. That matters tremendously for prediction horizons.
There is no explicit separate handling of:
- SNRIs (duloxetine, venlafaxine) vs. classic TCAs
- topical agents (lidocaine, capsaicin)
- muscle relaxants (if present)
It also does not fully account for organ function constraints (renal/hepatic), which strongly modulate how we interpret NSAIDs and many adjuvants.

Overall: conceptually sound, but incomplete and currently too opioid-centric.”

Prof Ravi Patel – Pharmacologist

On feature weights: “The model appears to give a relatively strong negative weight to recent opioid dose—i.e. higher recent MME → lower predicted future pain. Pharmacologically, this is an oversimplification. There are at least three phenomena missing:

Tolerance: with chronic use, increasing dose yields less incremental analgesia.
Opioid-induced hyperalgesia: in some patients, higher dose may correlate with more pain over time.
Time since last dose and half-life: the relationship between dosage and predicted pain must be modelled as time–concentration–effect, not raw daily dose.

So the medication-to-pain weight should be time- and class-dependent, not a single static coefficient.”

On medication features: Key gaps and mis-specifications:

No explicit inclusion of paracetamol (acetaminophen) as a baseline analgesic. It may seem ‘weak’, but its presence or absence often indicates how seriously the pain is being managed and is a fundamental anchor in stepwise analgesia.
NSAIDs are not differentiated by risk profile or typical use case; but for prediction purposes that is less critical than knowing they are used and whether they are stopped.
Gabapentinoids are modelled as a simple ‘on/off plus dose’ variable; but their time to effect and tapering are slow. Their short-term effect on 24–48h pain prediction is modest.
No explicit dimension capturing co-prescribed sedatives (benzodiazepines, Z-drugs); these matter massively for safety, even if they do not reduce pain.

Verdict: structurally reasonable but pharmacodynamically naïve. The panel should recommend refining the medication features towards:

class-specific time-to-onset and half-life modelling,
diminishing returns at higher doses,
separate ‘safety’ weights vs ‘analgesic effect’ weights.”

2.3 Dr Sarah Lind – Behavioural Psychologist

On feature weights: The model, as currently described, treats pain largely as a physiological signal modifiable by medication and measurable via vital signs. It assigns relatively low or secondary weight to:

sleep disruption,
stress and mood indicators,
activity levels (e.g. drop-off in steps, mobility),
and catastrophising/expectation patterns (if any proxy data exist).

Yet, from the behavioural literature, these factors are often as predictive or more predictive of pain flares over 24–48h than marginal changes in drug dose.

I would:

Increase weight on sleep and circadian disruption.
Increase weight on sustained low mobility or abrupt changes in activity.
Treat medication changes as interacting with these psychosocial states, not as independent levers.”

On medication features: The model assumes medication is largely a linear driver of pain reduction. It under-weights:

adherence (what is prescribed is not necessarily what is taken),
psychological trust in the treatment plan,
and fear of medication, which affects patterns of sporadic use.

These are not easy to encode, but at minimum I would add:

a ‘recent change in regimen’ flag,
a ‘complex regimen’ signal (polypharmacy, many dosing times),
and where possible, a proxy for adherence (e.g. refill patterns, smart blister data, or self-report).”

Dr Jonas Müller – Methodologist

On feature weights: The current design appears to give:

Substantial positive weight to recently elevated pain scores (where available).
Moderate to strong weight to vital sign deviations.
Strong negative weight to recent higher analgesic dose, especially opioids.
Relatively smaller weights to sleep, activity, and psychosocial flags.

From a modelling perspective, this is the classic trap of over-privileging what is most easily measured and underweighting what is actually predictive.

I would recommend:

Regularisation or shrinkage that penalises overly large coefficients on noisy proxies such as heart rate.
Enforced non-linear saturation for the medication features, so that doubling dose does not ‘magically halve’ pain probability.
Introducing interaction terms (e.g. opioid dose × sleep disruption, NSAID use × inflammatory condition flags).
Calibrating weights against an empirical dataset where we can check discrimination and calibration, not just theoretical plausibility.”

On medication features: The feature-space for medications is too coarse:

Binary plus dose is insufficient; we need at least:
- class,
- route,
- formulation (immediate vs extended),
- timing relative to prediction window.

If we cannot measure them all, we should at least encode:

‘long-acting opioid present’,
‘short-acting opioid rescue within last 6 hours’,
‘NSAID use last 24 hours’,
‘adjuvant presence and duration of use’ (e.g. gabapentin > 2 weeks).”

2.5 Dr Helen O’Connor – Safety & Quality

On feature weights: The model currently treats analgesic escalation almost entirely as a good thing for future pain levels, without representing:

safe vs unsafe dose thresholds,
concomitant sedative co-prescription,
or the increased risk of falls, confusion, respiratory depression.

From a safety lens, the model’s high negative weight on opioids (more dose → less pain) is clinically dangerous if not complemented by explicit harm features.

I recommend:

Separate ‘analgesia weight’ and ‘harm weight’ branches.
Any escalation beyond guideline doses should increase a safety risk score even if it lowers predicted pain.

On medication features: We must explicitly model:

co-prescription of benzodiazepines, Z-drugs, strong sedatives,
polypharmacy count,
and where feasible, FRID-style risk indicators for falls and cognitive impairment in older adults.

These may not change the pain score directly, but they change whether a given medication strategy isacceptable.

Dr “C” – Contrarian

On feature weights: My concern is that we are trying to retrofit clinical common sense onto a model whose implicit prior is: ‘pain is what vital signs and drug doses say it is.’

Vital signs are noisy, easily confounded, and often flat in chronic pain. Medication is both a cause and an effect of pain (indication bias). The model therefore risks becoming a self-confirming echo of prescribing patterns rather than a genuine predictor of underlying pain physiology and lived experience.

I would radically down-weight:

raw vital signs,
raw opioid dose,

and instead construct latent factors from:

pain trajectory,
functional activity,
sleep architecture,
and pattern of requests for breakthrough medication.

In short: pain prediction should not simply be a dressed-up regression on what clinicians already do.

On medication features: The medication feature-set is still too ‘pharmacology textbook’ and not enough ‘real patient on a Tuesday morning’. No explicit sense of:

chaotic patterns of as-needed dosing,
self-management behaviour (extra pills, skipped doses),
abrupt stoppages (e.g. stock-out, side effects).

If we cannot measure those yet, the model should at least carry a ‘medication uncertainty’ feature and reflect that in the output confidence.

Delphi Round 2 – Cross-critique and discussion

In Round 2, the experts see each other’s anonymised comments and respond. Below is a curated transcript of key exchanges.

Transcript excerpts

Moderator (DEP):
“Thank you all. The key points emerging relate to: 1) over-weighting of vital signs and opioid dose; 2) under-representation of psychosocial and functional signals; 3) medication modelling that is linear and time-insensitive. I would like to invite reactions.”

Dr Ward (Pain Specialist):
“I agree with the pharmacology and method comments. For chronic pain especially, we simply do not see a reliable linear drop in pain with more opioid. In practice, dose escalations often reflect loss of control rather than good control. I would support applying a ceiling effect—beyond a certain dose, more opioid should not substantially reduce predicted pain.”

Prof Patel (Pharmacologist):
“Exactly. We can approximate this with a saturating function or an Emax-type curve: early increments matter, later increments mostly increase risk. The current linear coefficient is conceptually wrong for long-term use.”

Dr Müller (Methodologist):
“I can implement that mathematically. It also aligns with the general principle of monotone but non-linear dose–response, and we can empirically fit it once data arrive. In the interim, we can impose a prior that favours saturation.”

Dr Lind (Behavioural Psychologist):
“I would also emphasise that increasing medication dose without addressing sleep, mood, and activity often does nothing for medium-term pain. If we do not increase the weights on these behavioural factors, the model will keep recommending the wrong levers.”

Dr O’Connor (Safety):
“And in older adults it will recommend dangerous levers. I want a clear signal that at high opioid doses, the safety risk and falls risk escalate even if predicted pain slips a bit.”

Dr “C” (Contrarian):
“May I add that if we ‘reward’ the model with lower pain probability whenever opioids go up, we risk encoding an algorithmic version of the prescribing crisis. We must avoid building a polite opioid-maximiser.”

Moderator:
“Let us address vital signs. Currently they have substantial weight as objective pain correlates. Are any of you arguing for keeping them as a major driver?”

Dr Ward:
“For acute post-operative pain or short bursts of severe pain, yes, heart rate and blood pressure can be useful. For long-term trajectories, no. They should be contextual, not central.”

Prof Patel:
“I would keep them, but partly as a confounder check—e.g., high heart rate with no reported pain might signal infection, anxiety, or other pathology. That suggests a separate pathway rather than a direct weight.”

Dr Müller:
“Statistically, they are noisy and often poorly calibrated in real data, especially wearables. My suggestion:

Keep them, but apply strong regularisation.
Allow them to interact with acute clinical context (e.g. post-surgery flag) rather than driving chronic predictions alone.”

Dr “C”:
“I support that. Vital signs should modify the story, not write it.”

Moderator:
“Finally, on medication feature completeness: we have opioids, NSAIDs, gabapentinoids, and some antidepressants. What is the consensus on minimum viable clinical soundness?”

Prof Patel:
“At minimum:

Paracetamol explicitly included.
Distinction between short- and long-acting opioids and a time-since-last-dose window.
Presence vs absence of an adjuvant like gabapentinoid/SNRI/low-dose TCA.
Explicit flags for benzodiazepines/Z-drugs and sedating antipsychotics.”

Dr Ward:
“I agree. We do not need to model every molecule individually, but class + timing + simple dose banding is a very good start.”

Dr O’Connor:
“And a separate safety weighting branch for these sedatives and high opioid dose, feeding into a ‘do we really want this direction of travel?’ output.”

Dr Lind:
“Plus a marker for regimen complexity (e.g. number of daily dosing times). Complex regimens are notorious for adherence problems.”

Dr “C”:
“And a reminder that all of this should be wrapped in an uncertainty estimate. We must not pretend to be more certain than we are.”

Delphi Round 3 – Consensus and final positions

In Round 3, the panel was asked to state a clear position on two questions:

Are the current weights broadly correct?
Are the medication features clinically sound for a first real-world demonstrator?

Consensus on weights

Vital signs
- Consensus: Over-weighted.
- Recommendation: Reduce their “default” importance; maintain them as contextual modifiers, especially in acute pain. Regularise heavily.
Opioid dose
- Consensus: Currently over-simplified and over-weighted as a linear analgesic effect.
- Recommendation:
  - Implement non-linear saturation (ceiling effect).
  - Consider modelling tolerance and potential hyperalgesia at high chronic doses.
  - Avoid simple “more dose = less pain” logic.
Psychosocial and functional variables
- Consensus: Under-weighted in the current design.
- Recommendation:
  - Increase weights for sleep disruption, activity/mobility patterns, mood/stress proxies, and pain trajectory where available.
  - These should be central drivers of 24–48 hour pain prediction, not decorative variables.
Non-linearity exponent (k)
- Consensus: Conceptually appropriate, but must be constrained.
- Recommendation:
  - Learn (k) per patient where data allow, but restrict its range so that it cannot produce implausibly sharp medication effects.
  - Calibrate empirically once a dataset is present.

Consensus on medication features

Coverage
- Consensus: Reasonable core set but incomplete.
- Recommendation to add as minimum:
  - Paracetamol.
  - Separate features for short-acting vs long-acting opioids and timing of last dose.
  - Presence and duration of key adjuvants (gabapentinoids, SNRIs, TCAs).
  - Sedatives (benzodiazepines, Z-drugs, sedating antipsychotics).
Structure
- Consensus: Too linear and dose-only driven.
- Recommendation:
  - Implement time-aware medication modelling (half-life/last-dose windows).
  - Use banded doses (low/medium/high) instead of fine-grained numeric dose for early versions, coupled with ceiling/saturation behaviour.
  - Distinguish between analgesic effect features and safety risk features so that a strategy can lower pain but raise risk.
Safety signalling
- Strong consensus:
  - Incorporate a dedicated safety score (e.g. opioid dose > threshold, opioid + benzo, high FRID burden).
  - Display pain prediction and safety/risk side-by-side, not merged.

Final panel summary

Are the weights correct?

Short answer: No, not yet.
Panel judgement: The overall qualitative structure is defensible, but:
- Vital signs and raw opioid dose are over-weighted.
- Psychosocial/functional determinants are under-weighted.
- Medication effects are modelled too linearly.

The panel recommends rebalancing the model so that:

Pain trajectory, function, sleep, and activity are first-class features.
Medications exert bounded, non-linear, time-sensitive effects.
Vital signs are secondary modifiers except in clearly acute contexts.

Are the medication features clinically sound?

Short answer: Conceptually promising, but incomplete and pharmacologically naïve for deployment.
Panel judgement:
- The included drug classes are reasonable, but:
  - Omitting paracetamol and sedatives is a notable gap.
  - Lack of time–dose–effect modelling undermines predictive realism.
  - No explicit safety weighting is a risk in itself.

The panel does not advise deploying the current medication feature set in a live clinical demonstrator without:

Adding class + timing + safety parameters as above.
Implementing saturating/ceiling effects for dose–response.
Displaying a separate safety signal alongside predicted pain.

Comments from a clinical LLM reviewer

1. The central correction is conceptual, not just numerical.
The DEP correctly identifies that the current model risks treating pain as a physiological-and-pharmacological signal: vital signs rise, drugs are given, pain falls. That is too narrow. A clinically credible pain predictor should treat pain as a multidimensional state involving prior pain trajectory, function, sleep, activity, affective distress, medication response, and clinical context. This is consistent with contemporary chronic pain guidance, which emphasises function, self-management, psychological factors, shared decision-making, and quality of life rather than pain intensity alone. (nice.org.uk)

2. The strongest safety insight is the need to separate analgesic benefit from medication harm.
The panel’s most important architectural recommendation is that pain prediction and safety prediction should not be collapsed into a single score. A medication change may plausibly reduce pain while simultaneously increasing risk of sedation, falls, confusion, respiratory depression, dependence, or loss of function. This is particularly important for opioids, benzodiazepines, Z-drugs, sedating antipsychotics, gabapentinoids, and polypharmacy in older adults. CDC guidance stresses weighing benefits and risks of opioid therapy and using caution with concurrent opioid and benzodiazepine prescribing; FDA safety warnings similarly highlight serious risks from opioid–benzodiazepine combinations. (CDC)

3. The model needs a clinical-context gate before feature weighting.
The DEP says vital signs are over-weighted, but the more precise point is that their value is context-dependent. Tachycardia, hypertension, respiratory change, or disturbed sleep may mean different things in post-operative pain, cancer pain, neuropathic pain, chronic primary pain, anxiety, infection, withdrawal, delirium, or medication toxicity. The model should therefore first classify the pain context or phenotype, then apply different feature priors. Otherwise, the same variable may be assigned the wrong meaning in the wrong patient.

4. Medication variables are confounded by indication and behaviour.
Opioid dose, rescue medication use, missed doses, abrupt discontinuation, or regimen complexity are not just pharmacological facts; they are behavioural and clinical signals. They may indicate undertreated pain, tolerance, prescribing habit, poor adherence, fear of medication, supply problems, or clinician bias. The DEP usefully identifies this, but the final model should make it explicit: medication features are partly causal, partly reactive, and partly behavioural. Treating them as simple predictors risks encoding prior prescribing practice rather than discovering the patient’s pain trajectory.

5. The demonstrator should display uncertainty as a clinical feature, not as a technical footnote.
A real-world pain predictor should not present a single probability as if it were an oracle. It should display predicted pain probability, safety risk, confidence/uncertainty, key drivers, and “missing or weak evidence” indicators. This is especially important where adherence, psychosocial context, medication timing, or baseline pain history are incomplete. The DEP’s recommendation for uncertainty signalling should therefore be elevated into a core design requirement.

Overall clinical LLM judgement:
The DEP output is clinically coherent and appropriately cautious. Its main contribution is to prevent the Pain Predictor becoming a vital-sign-and-opioid-dose machine. The next version should be framed as a multidimensional pain-state predictor with separate safety modelling, context-sensitive feature weighting, medication uncertainty, and explicit clinical interpretability. In plainer terms: the model should predict pain without accidentally becoming an opioid escalator.