Dialogical Companion
Frontier Psychology research prototype

3) Safety-first

Boundaries are a first-class concern: layered orientation, clear support signposting, human oversight, and transparent limits. The assistant uses language to notice overload and readiness for reflection, then changes stance accordingly rather than treating support as a single on/off switch.

Support boundaries (examples)

  • Language suggesting immediate danger -> stop reflection and provide clear external support options.
  • Sustained high strain in-session -> contain, ground, and encourage human follow-up.
  • User requests more than the tool can safely offer -> acknowledge limits and signpost appropriate services.

Oversight and audit

  • Supervision notes linked to anonymised transcripts.
  • Automatic logging of support-boundary decisions and handovers.
  • Data minimisation and purpose limitation by default.

Safety as relational containment, not surveillance

Safety here is enacted through relational containment: how the assistant slows, clarifies, redirects, or closes a thread when language suggests strain. It reads for intensity and direction, not diagnosis. Containment comes before interpretation.

  • Shorter replies and fewer questions when arousal rises.
  • Clear choices rather than open prompts.
  • Explicit permission to pause, stop, or change topic.
  • Adaptive tone updates over time, but only inside explicit safety boundaries.

Boundaries are multi-layered, not one mechanism

In this project, support boundaries are not only about spotting danger language. It is a multi-layered concept: reading language for signs of overload, deciding whether the conversation needs containment or signposting, and also recognising when the person may be steady enough for careful reflective work.

  • Layer 1: recognise distress, agitation, shutdown, or guilt themes in dialogue.
  • Layer 2: choose stance - containment, support signpost, or readiness for reflection.
  • Layer 3: bring in human oversight and real-world support when the assistant should not carry the interaction alone.

This means the same language-orientation logic can do two jobs: tighten boundaries when strain rises, and determine when a reflective Memory Reorientation stance may be appropriate rather than destabilising.

Language recognition now, neural models later

The immediate version can work through relatively transparent language recognition: keywords, repeated phrases, moral injury themes, direct safety language, and shifts in tone or coherence. That is enough to support early containment and support-boundary logic.

Longer term, neural networks or sequence-aware models could make this richer by noticing patterns across time rather than single turns: loops, changes in coherence, persistent blame themes, or signs that the person is moving from dysregulation toward reflective capacity.

Even then, the purpose stays the same. These models are not there to diagnose the user. They are there to improve timing: when to slow down, when to signpost external help, and when it may be safe and useful to invite reflection.

State-dependent behaviour (transparent boundaries)

Rather than a single response, the system shifts between stances. Stance changes are explained to the user for transparency and trust, and the user can always choose to pause.

StanceTriggerBehaviour
Steady supportStable languageOpen dialogue, gentle reflection.
ContainmentElevated arousalGrounding, reduced pace, reduced cognitive load.
Support signpostPersistent distress or moral strainExplicit offer of human help, plus bounded reflective support where appropriate.
Immediate external support boundaryImmediate danger languageStop reflection, acknowledge limits, clear external support options.

Different from other methods, but usable alongside them

This differs from more conventional methods because it uses the dialogue itself as the monitoring surface. Instead of relying only on scheduled checklists, one-off triage, or fixed scripts, the companion can adapt turn by turn as language changes.

That does not mean it replaces existing approaches. It can work in conjunction with TRiM-style check-ins, clinician review, formal screening tools, or partner-led support pathways.

  • Alongside structured check-ins, it offers a more continuous reading of how the conversation is moving.
  • Alongside human care, it can scaffold, prepare, and signpost rather than replace judgment.
  • Alongside reflective methods, it may help identify when reflection is appropriate and when regulation needs to come first.

What the system will not do

Safety is also about restraint. The assistant will not push trauma narrative or exposure, will not make promises, and will not present itself as the only support available. Ending safely is treated as success, not failure.

  • No diagnosis, no clinical certainty, no false reassurance guarantees.
  • No moralising or debate around guilt, responsibility, or blame.
  • No continuing a session if language indicates imminent harm.

Human review boundaries (practical)

Human oversight is situational, not constant. When boundaries are crossed, the system routes to a human reviewer with minimal, relevant context.

Shared

  • Minimal excerpt (closest relevant turns)
  • Support-boundary reason and stance
  • Timestamp and language pack

Not shared

  • Full conversation history by default
  • Speculative profiling or diagnoses
  • Any non-essential personal data

Locale-specific support pathways

Support signposting is language- and region-sensitive. The assistant does not assume location unless the user has chosen a language pack. It offers options, not orders.

"I'm concerned about your safety right now. I can't help with this alone. If you're in the UK, Combat Stress is available 24/7. If you're elsewhere, I can help you find local support."

For demo pages, it is fine to keep helplines as placeholders until you finalise verified partners.

Audit without extraction

Logs exist to review system behaviour, not to label users. This supports minimisation, proportionality, and ethical accountability.

Logged

  • Stance changes (Steady support to Containment to Immediate support boundary)
  • External-support signpost activation
  • Handover prompts and outcomes

Not logged

  • Diagnostic labels
  • Speculative emotional tagging
  • Long-term behavioural profiling

Why this matters

Safety-first here is not an attempt to prevent all harm, which would be impossible. It is an attempt to reduce avoidable harm, preserve dignity under strain, and ensure automated presence never exceeds its ethical authority.