LLMs as Digital Coaches for Metacognitive Training

OpenAI. (2025). SORA [Artificial intelligence system].

For introduction and context: Large Language Models for Metacognitive for assessment of metacognitive skills in Acquired Brain Injury

Introduction

Conventional rehabilitation for metacognitive deficits depends on self-reflection with guidance, feedback, instruction in strategies, and practice of these across everyday situations. There is evidence that these metacognitive strategies can increase everyday task performance and participation outcomes for survivors of brain injury (Krasny-Pacini et al., 2014). The difficulty lies in that such treatments take a lot of resources and must often be individualized for each patient. This is where LLM-based tools may potentially supplement or scale-up metacognitive rehabilitation — by offering individualized prompts, feedback, and strategy coaching from an always-present conversational agent. Since metacognitive rehabilitation itself is an interactive process — calling for conversation (for feedback), strategy recommendations, reflection prompts, and motivation –it would naturally lend itself to a conversational agent. A chatbot-based rehabilitation tool can be made available to patients every day, not only during weekly clinic visits, for guided practice and strategy reinforcement. We describe a number of use cases where LLMs may aid or provide metacognitive intervention.

Guided Strategy Practice

An LLM coach can guide a patient through the process of a metacognitive strategy during actual or practice tasks. Using the Goal Management Training methodological approach, for instance, the chatbot might always start with a question such as, “Let’s pause and describe what your primary goal is for the moment.” It might next have the patient outline their plan, encouraging them to break the plan down into steps (soliciting suggestions if they have difficulty identifying them). While in the task (which could be an actual life event the patient will describe, or a task the chatbot will narrate), the LLM may inquire with questions such as, “What do you think happened with that last step? Was there something surprising that emerged?” If the patient becomes stuck on an issue (e.g., they were distracted), the chatbot can quietly point this out: “I saw that you paused — what were you thinking there?” This reflects the process a therapist would use to develop self-monitoring online. After the task, the chatbot can assist with self-assessment: “Let’s go over your goal. Did you accomplish what you had hoped to achieve? What worked, and what will you do differently next time?” By repeatedly working the patient through this plan-monitor-evaluate process, the LLM solidifies the habit of metacognitive strategy use. In particular, it can do this in an individualized way, tailoring its prompts to the patient’s awareness level — with a patient with impairments in awareness, for example, it could give more specific feedback (e.g. “You said your memory was okay, but you asked me three times to repeat the instructions — that may indicate some sort of difficulty with remembering, what do you think?”), but with one with greater awareness it would perhaps simply help them to make their own connections.

Scenario-Based Training and Role-Play

LLMs can create infinite scenarios or dilemmas for patients to solve, useful for practicing metacognitive skills. As an example, the chatbot might play out a typical everyday life task — “Imagine that you must make a meal whilst dealing with a phone call” — and have the patient outline in words what they would do. As the patient states what they would do, the chatbot could then fire off questions to prompt foresight (“What might go wrong here?”), thus training anticipatory awareness. If the patient misses a point (such as forgetting the food on the phone call), the chatbot might introduce that as a penalty in the exercise so the patient can perceive the penalty in a low-risk manner. This process is somewhat akin to problem-solving treatment or gaming simulations (such as the VR example one) but in the form of text-based chat. This allows patients to practice self-monitoring and strategy revision across a range of settings, under instant feedback from the AI. Preceding evidence supports the usefulness of such interactive online treatments: a 2021 study by Jang et al. employed a mobile chatbot to offer a 4-week self-management program for those with attention deficits (Jang et al., 2021). The chatbot presented users with psychoeducative modules and self-monitoring exercises, incrementing in difficulty. The findings indicated lessened attention deficit symptoms and improvement in the related areas of concentration, capacity for remembering, and even for managing emotions. This would indicate that even without the immediacy of a face-to-face therapist-patient relationship, an adequately designed chatbot would advance the type of self-reflection and practice of skill useful for measurable improvements in cognition.

Homework Support and Daily Check-ins

Rehabilitation may involve homework (e.g. monitoring for errors, attempting a strategy at home, reporting back). An LLM can serve as a homework buddy available 24/7. Daily, it may inquire of the patient, “Did you use any of your memory strategies today? How did it go?” If the patient failed to use them, it can inquire why, and remind the patient of the next time plan. It can ask the patient to establish a daily goal every morning (enhancing goal-set habits) and rate themselves on the evening of each day. By monitoring these conversations, the chatbot can buttress self-regulation habits. For example, if a patient had planned to take two rest stops during a task so as to not tire (a self-regulatory strategy), the chatbot can inquire: “You intended on taking breaks — did you remember to do so? What happened?” This instant feedback routine can solidify the meta-strategy. Furthermore, the chatbot can offer support and encouragement, which matters — battles with awareness may demotivate or demoralize patients. A well-trained LLM can employ an empathetic voice and positive feedback to encourage patients to persist with the rehabilitation process. Truly, keeping the patient engaged is paramount: one thesis wrote that generative AI technologies can keep patients motivated and engaged in treatment by offering interactive, tailored support (Maggio et al., 2024). Patients tend to react favorably to persistent support and the sensation that “someone” is monitoring their progress, even if that individual is a virtual coach.

Language and Communication Practice

For individuals with aphasia or deficits of cognition-communication, the use of LLMs (particularly when paired with speech interfaces) may provide a venue for practicing communication strategies under low-stakes conditions. ChatGPT, through conversation, has been observed to assist patients in enhancing the ability to communicate, and could therefore be valuable for individuals recovering from the loss of language after stroke (Maggio et al., 2024). As an example, an individual with mild aphasia may employ a chatbot to practice the generation of sentences or stories; the chatbot may provide subtle correction of error or requests for clarification, imitating a conversational partner who prompts use of strategies such as circumlocution or self-cueing about the semantics of what one says. Further, because metacognition in aphasia treatment encompases attention to language error and application of strategies for correction, the chatbot may ask the user to self-check: “I didn’t quite get that — can you think of an alternative phrasing for that?” or “You froze on a word; what could you do for yourself to retrieve it?” This replicates methods that a speech therapist might employ, and can offer practice of the sort that occurs many times a day. Wadams et al.’s (2022) systematic review found that incorporation of metacognitive paradigms into aphasia treatment is possible, and that employment of frameworks such as that of GMT or instruction in strategies had positive effects on the majority of cases. An LLM, with constrained vocabulary and possibly multimodal capacity (image for cues), could supplement this by offering ongoing conversational practice with the user’s level of complexity as the target, potentially facilitating enhanced carryover of the strategy to everyday communication. A consideration, though, will be to adjust the complexity of the language an LLM produces — perhaps it will have to simplify the wording for one with aphasia, something an adapting model or one fine-graded explicitly for easy wording could accommodate.
Examples of LLM-based coaching in the real world in the field of rehabilitation are in their infancy, but there are comparable successes in education and mental health. AI chatbots, for example, such as Woebot have provided cognitive-behavioral approaches through text, and a 2023 analysis indicated that an LLM-powered chatbot could effectively assist individuals in navigating the steps of cognitive restructuring with normal conversation (Wang et al., 2025). In the realm of executive functions, the 2025 systematic review of Pergantis et al. compiled several instances in which chatbots were employed to improve cognition. Users who were given interaction with a chatbot were prompted to examine themselves and reflect, effectively to notice what they were thinking whilst solving problems. Across several pieces of research, participants who had the chatbot displayed greater self-observation and regulation of the process of learning compared to those without. They concluded that conversational AI agents may enhance elements of working memory, attention, self-regulation, solving problems and metacognitive abilities where they were employed in training interventions. To illustrate, one of the studies within that review indicated that the employment of a chatbot for schoolwork assistance lessened the loads of cognitive and the stresses for students, with an increase in their executive performance as an indirect result (Rostami et al., 2023). Those findings strongly relate to the objectives of metacognitive rehabilitation.
Worth pointing out here is that LLM-based coaching isn’t designed to replace human therapists but rather to support and expand their scope. A human clinician would probably supervise the process, monitoring summaries of the patient’s chatbot conversations (which may be recorded with the patient’s agreement) and interpreting them clinically. The chatbot would act as an unflagging adjunct, providing patients with additional practice and feedback between sessions. Generalization of the learned skill would also be possible: since the AI can change up scenarios infinitely, patients practice the implementation of their strategies in new contexts, with the hope of better skill transfer into the real world (an ongoing issue identified in executive function rehabilitation (Krasny-Pacini et al., 2014)). Further, some patients may even feel less embarrassed disclosing struggles to an understanding AI agent, which may induce greater self-report candour on occasion. Of course, there will always be those who want human contact — individual differences will prevail — but presenting a hybrid solution (therapist + chatbot) could address more issues.

Technical and Ethical Considerations for LLM Integration

While the potential of LLMs in metacognitive rehabilitation is significant, several technical and ethical considerations must be addressed to ensure these tools are safe and effective:

Accuracy and Hallucinations

LLMs occasionally come up with made-up or incorrect information (known as hallucination). Giving an incorrect or misleading suggestion to a patient in a rehabilitation case could be harmful. The multi-agent GPT-4 method with guideline verification (Zhenzhu et al., 2024) is one strategy for preventing this risk through getting the LLM to cross-check answers with authoritative sources and clearly express uncertainty when it does not “know” a detail. Similar safeguards might be installed by developers of therapeutic chatbots: e.g., coupling the LLM with an existing database of rehabilitation guidelines, or restricting its answers to areas within validated procedures. Fine-tuning the model on clinically validated conversations can decrease the likelihood of the model going off-script. Crucially, the automatic patient performance feedback provided by the AI must itself be accurate — if an LLM incorrectly interprets what the patient communicated or performed, it might provide undeserved praise or unfair criticism. A supervised test phase (during which AI feedbacks would be cross-checked with human clinicians) should therefore precede deployment of such systems on a larger scale.

Personalization and Adaptivity

ABI patients are an heterogeneous population — their cognitive profiles can vary from mild impairments to significant impairments, and their languages may differ (particularly in the case of aphasia patients). A good LLM coach would have to dynamically adjust its own language and methodology to suit the individual’s requirements. Technically, this might include the model determining the level of the patient’s understanding (e.g. simplifying the use of vocabulary and sentences for a patient with issues with the process of understanding language, which an LLM can do through paraphrasing complex text into simple form). It might further change the complexity of situations or prompt intrusiveness in relation to performance (similar to a game that adjusts its difficulty as the player acquires skill). Reinforcement learning or few-shot prompting could be employed, with the model provided with examples of responding to patients of varying levels of awareness. To take an example, with a patient who persists in denial of deficits, the prompts could contain greater educational information and polite confrontation of inconsistencies; with a patient who is extremely aware but fearful, the prompts could emphasize encouragement and refining strategy use. LLM developers would want the model to cope with the sort of responding typical of survivors of brain injury — possibly with tangential or muddled entries — and yet continue to steer the conversation profitably.

Privacy and Data Security

The use of LLMs in therapy raises issues with patient data that should not come into the public or even secondary user space. Therapy with a rehabilitation chatbot may entail private health information, minutiae of daily life, disclosures of emotions, etc. If the cloud-based service of the LLM, the data will have to be encrypted and the service provider must adhere to health data standards (such as HIPAA). One great thing about certain contemporary LLM frameworks is the potential for local deployment — for instance, the hospital or clinic might deploy an open-source model on secure servers, with no data ever moving from their premises. Researchers already have on-device or on-premises solutions for mitigating the privacy risks of LLMs. Also, conscious patient assent and clarity on the use of their data by the AI will be ethical requirements. The chatbot ideally would remind users that they’re not human and explain limits on privacy (e.g. “I’m a computer program, our conversation here in the app is private, but I will forward summaries to your therapist if that’s what you wish…”). Part of an ethical LLM design would be the provision for patients to erase their data or opt out whenever they wish.

Ethical Use and Autonomy

An AI coach must respect patients’ autonomy and not coerce or deceive. It may tempt one, for example, to have the chatbot feign having suffered the same injury in order to establish rapport — but that crosses an ethical boundary into deception. The chatbot’s empathy must be real within the confines of its coding (recognizing feelings, encouraging them) without false pretence. There’s the potential, too, for patients to become over-dependent on the chatbot or to prefer its company to human contact. There does need to be reliance (we want them to practice with it on a regular basis), but not to the point of replacing human empathy and clinical judgment. Integration with the rehabilitation team needs to be done carefully: the AI is there to assist, not lead. The developers must also exercise care over the tone and style the LLM must develop: a fine line between being supportive and not giving false hope must be trodden. If, for example, a patient makes a wish that will never come about (because of their deficits), then the AI needs to address this in a truthful but compassionate way, such as a human therapist would — this will depend on careful prompt design and perhaps fine-tuning with examples of responding to delicate or delusional utterances.

Validation and Efficacy

From a research perspective, any intervention involving large language models (LLMs) must undergo rigorous testing within clinical trials. Outcome measures should encompass standard metacognitive assessments — such as awareness questionnaires, functional independence measures, and goal attainment scaling — as well as broader indicators like patient engagement and satisfaction. Novel metrics may also emerge, such as tracking chat transcripts to observe increased patient-initiated strategy use over time, or reduced prompting from the LLM as the individual gains independence — both of which may signal growing metacognitive competence.
It is also essential to identify which individuals are most likely to benefit from these tools. For example, a 2022 review on aphasia (Wadams et al.) suggested that individuals with milder language impairments and relatively intact non-verbal cognition may respond best to metacognitive training. By extension, patients with moderate self-awareness deficits might be ideal candidates for an AI-based coach, while those experiencing more profound denial may initially require human-led intervention. As the evidence base grows, the field can begin to establish formal guidelines for integrating LLM-based support into standard care — whether as a supplement during inpatient rehabilitation or as a continuity tool after discharge to help maintain therapeutic gains.

Interdisciplinary Development

For LLM-based tools to be effective in metacognitive rehabilitation, it is essential that their development involve close collaboration with clinical experts in neuropsychology and cognitive rehabilitation. The therapeutic utility of a chatbot hinges on how accurately it reflects evidence-based practices in its conversational behavior. Clinicians can play a pivotal role in shaping the model’s prompts, feedback strategies, and interaction patterns — drawing on established frameworks such as Goal Management Training (GMT), the Cognitive Orientation to daily Occupational Performance (CO-OP), and error-awareness training. These structured interventions can inform the creation of prompt libraries, training data, or rule-based scaffolds that guide the LLM’s behavior.
As Chakraborty et al. (2023) emphasize, while LLMs offer impressive technical capabilities, their use in healthcare must be grounded in user-centered design tailored to the needs of specific populations. For individuals with language or literacy impairments, for instance, this may include simplified user interfaces or multimodal input and output options. Ultimately, interdisciplinary collaboration is not just beneficial — it is essential. The goal must be to ensure that the LLM facilitates structured, goal-oriented interactions that meaningfully support metacognitive development, rather than offering undirected or surface-level conversation.

Conclusion

Large language models are opening up exciting new possibilities in cognitive rehabilitation, especially when it comes to helping people rebuild the ability to think about their own thinking — a skill often disrupted after brain injury. By engaging patients in natural, reflective conversations, tools like GPT-4 can offer consistent coaching, tailored feedback, and guided practice beyond the boundaries of traditional clinical sessions. This kind of support could meaningfully enhance key aspects of metacognition: helping patients recognize inconsistencies in their thinking, reflect on their performance in the moment, and build strategies for planning, monitoring, and adjusting their behavior. Early examples are encouraging — from chatbot-based programs improving attention and focus (Pergantis et al., 2025), to demonstrations of LLMs like ChatGPT reasoning through complex problems with surprising insight (Elyoseph et al., 2023). Perhaps most promising is the potential for LLMs to expand access to high-quality rehabilitation: to reach individuals who might otherwise receive little one-on-one support, and to deliver proven interventions like Goal Management Training in a scalable, individualized format.

That said, realizing this potential won’t happen automatically. Thoughtful, collaborative implementation is essential. These tools must be built — and deployed — with a clear understanding of the clinical landscape: what patients truly need, what therapists aim to achieve, and how to ensure privacy, safety, and ethical use at every step. LLMs should be seen not as substitutes for human therapists, but as helpful partners — extensions of the rehabilitation team. In the future, we might imagine a dynamic collaboration: a therapist tracking a patient’s chatbot-guided practice, using AI-generated summaries to tailor in-person sessions, and adjusting the chatbot’s approach as the patient makes progress. For developers, the challenge will be to continually refine LLMs so that they support, rather than confuse or overwhelm, users navigating recovery.

In the end, integrating LLMs into metacognitive rehabilitation is a frontier effort — bringing together neuroscience, psychology, and artificial intelligence in service of a deeply human goal: helping individuals regain control over how they think, act, and adapt in daily life. The early results are promising, pointing toward improved cognitive function, greater patient engagement, and a new kind of therapeutic relationship between people and machines. With continued research, clinical guidance, and thoughtful design, LLMs may become a vital part of the rehabilitation process — one that helps more people rebuild not only their skills, but also their confidence and independence after brain injury.

References

Krasny-Pacini, A., Chevignard, M., & Evans, J. (2014). Goal Management Training for rehabilitation of executive functions: a systematic review of effectivness in patients with acquired brain injury. Disability and rehabilitation, 36(2), 105–116.

Jang, S., Kim, J. J., Kim, S. J., Hong, J., Kim, S., & Kim, E. (2021). Mobile app-based chatbot to deliver cognitive behavioral therapy and psychoeducation for adults with attention deficit: A development and feasibility/usability study. International journal of medical informatics, 150, 104440.

Maggio, M. G., Tartarisco, G., Cardile, D., Bonanno, M., Bruschetta, R., Pignolo, L., … & Cerasa, A. (2024). Exploring ChatGPT’s potential in the clinical stream of neurorehabilitation. Frontiers in Artificial Intelligence, 7, 1407905.

Wadams, A., Suting, L., Lindsey, A., & Mozeiko, J. (2022). Metacognitive treatment in acquired brain injury and its applicability to aphasia: A systematic review. Frontiers in Rehabilitation Sciences, 3, 813416.

Wang, Y., Wang, Y., Xiao, Y., Escamilla, L., Augustine, B., Crace, K., … & Zhang, Y. (2025). Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals. arXiv preprint arXiv:2501.15599.

Pergantis, P., Bamicha, V., Skianis, C., & Drigas, A. (2025). AI Chatbots and Cognitive Control: Enhancing Executive Functions Through Chatbot Interactions: A Systematic Review. Brain Sciences, 15(1), 47.

Rostami, M., & Mehdi, A. P. (2023). The Impact of Doing Assignments with Chatbots on The Students’ Working Memory.

Zhenzhu, L., Jingfeng, Z., Wei, Z., Jianjun, Z., & Yinshui, X. (2024). GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation. Scientific Reports, 14(1), 7626.

Chakraborty, C., Pal, S., Bhattacharya, M., Dash, S., & Lee, S. S. (2023). Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Frontiers in artificial intelligence, 6, 1237704.

Elyoseph, Z., Hadar-Shoval, D., Asraf, K., & Lvovsky, M. (2023). ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in psychology, 14, 1199058.