Back to Journals » Nursing: Research and Reviews » Volume 15

On-Premises AI-Tool for Generating Nursing Care Summaries: A Phased-Implementation Study in Japan

Authors Hirata R ORCID logo, Oda Y, Morikawa S, Shigematsu K, Yamamoto D, Ito S, Tago M ORCID logo

Received 4 July 2025

Accepted for publication 21 November 2025

Published 16 December 2025 Volume 2025:15 Pages 215—222

DOI https://doi.org/10.2147/NRR.S551576

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 6

Editor who approved publication: Professor Ferry Efendi



Risa Hirata,1 Yoshimasa Oda,2 Shinichi Morikawa,3 Kaori Shigematsu,4 Daisuke Yamamoto,5 Suzunosuke Ito,6 Masaki Tago1

1Department of General Medicine, Saga University Hospital, Saga, Japan; 2Department of General Medicine, Yuai-Kai Foundation and Oda Hospital, Saga, Japan; 3Department of Information Management, Yuai-Kai Foundation and Oda Hospital, Saga, Japan; 4Department of Nursing, Yuai-Kai Foundation and Oda Hospital, Saga, Japan; 5Business DX Division, Service Marketing Department, Medical DX Unite, OPTiM Corporation, Tokyo, Japan; 6Office of the President, Research and Development Unit, OPTiM Corporation, Tokyo, Japan

Correspondence: Masaki Tago, Department of General Medicine, Saga University Hospital, 5-1-1 Nabeshima, Saga, Japan, Tel +81 952 34 3238, Fax +81 952 34 2029, Email [email protected]

Introduction: This study details the phased development and implementation of an Artificial Intelligence (AI)-driven nursing summary system designed to reduce nurses’ documentation workload.
Methods: Over a 1-year development period beginning in December 2023, we constructed a system that automatically generates nursing care summaries from patient records to support care continuity. The system utilized large language models with prompt tuning techniques and a subjective, objective, assessment, plan (SOAP) format integration.
Results: Through three iterations informed by clinical feedback, the system reduced the median summary creation time from 16 to 10 minutes (p < 0.001), representing an approximate 40% reduction. When validating the AI-generated nursing summaries, total scores for the five evaluation items in Versions 0.2 and 0.3 showed no significant difference.
Discussion: This system produces comprehensive summaries that consider both nursing care plans and changes in patient condition, simultaneously improving efficiency and enhancing documentation quality. This study demonstrates the importance of iterative development and expert knowledge integration in implementing AI in clinical practice, offering valuable insights for advancing digital healthcare transformation.
Conclusion: This study achieved an approximate 40% reduction in nursing record creation time and produced clinically acceptable summary documents through the phased development and implementation of an AI-driven nursing summary system.

Keywords: patient discharge summaries, artificial intelligence, nursing, large language models, electronic health records

Introduction

In healthcare settings, documentation demands have become a significant burden for nurses.1 A survey showed that over 20% of nurses’ working hours are devoted to documentation,2 reducing the time available for direct patient care. Nursing summaries are essential for ensuring continuity of patient care but require substantial time and effort. Nurses must concisely and accurately summarize changes in patients’ conditions, the interventions implemented, and their outcomes. These tasks require clinical judgment and strong documentation skills. With an aging population and growing healthcare demands, improving workflow efficiency has become increasingly urgent.

Artificial Intelligence (AI) applications in medicine now span image diagnosis support,3 drug discovery,4 and surgical assistance.5,6 Large language models (LLMs) and generative AI tools for medical documentation are also advancing rapidly globally.7–9 Recent applications of LLMs in medical text generation have demonstrated promising results in creating discharge summaries and clinical note assistance. It has been suggested that AI assistance in schedule coordination and documentation tasks may enable more efficient nursing workflows.10 However, challenges remain in nursing-specific documentation due to the complex, continuous nature of nursing care and the need for integration with nursing care plans and subjective, objective, assessment, plan (SOAP) format documentation.11,12 Additionally, an on-premises deployment approach is critical for maintaining data security and regulatory compliance in healthcare settings.

This study details the phased development and implementation of an on-premises LLM-based nursing summary system and validates its generated documents and effects.

Materials and Methods

This study was conducted at the Yuai-Kai Foundation and Oda Hospital, a core regional hospital in Japan. The facility implemented an electronic health record (EHR) system, where nurses routinely created documentation and summaries. The development of the AI-based nursing summary creation system began in December 2023, following a phased approach from prototype (Ver. 0.1) to production (Ver. 1.0). The project adopted an iterative development model, well-suited for implementing novel, untested mechanisms. The AI system was integrated into the existing EHR system (CSI Co., Ltd.) through application programming interface (API) connections, allowing direct access to nursing progress records and care plans while maintaining data security within the hospital’s network infrastructure. The system utilized a proprietary LLM developed by OPTiM Corporation, based on transformer architecture with optimized prompts for Japanese medical terminology and nursing documentation patterns.

Nursing summaries were created for randomly selected hospitalized patients either by nurses during routine clinical practice or by AI systems in development. Different patient cases were assigned to either AI-assisted or manual documentation groups to avoid overlap. The nurses who conducted the verification were nurse managers with extensive experience and skills. Ver. 0.2 and 0.3 of the AI-based summary system were used for these cases, and validation was conducted for each version, respectively. Versions 0 and 0.1 were not subjected to verification.

Patient data—including admission length of stay, AI-generated nursing summaries, and the nurses involved—were obtained from medical charts. We recorded the time nurses spent creating summaries manually and revising AI-generated ones. The primary endpoint was the time required for nursing summary creation. Total AI processing time plus revision time was compared with manual summary creation time using the Mann–Whitney U-test. Each nurse measured and recorded the time for task completion in minutes using a personal timepiece. Additionally, five charge nurses evaluated unmodified AI-generated summaries across five criteria—(1) factual accuracy, (2) clarity and readability, (3) appropriateness, (4) comprehensiveness, and (5) overall satisfaction—using a five-point Likert scale.13 Scores are reported as medians with quartiles. This study was approved by the facility’s ethics committee (Approval No. 20250313). Regarding safe medical data handling, we adopted an on-premises environment, similar to stringent security measures found in US HIPAA-compliant systems.14 Analyses were performed using IBM SPSS Ver. 25 (IBM Corp., Armonk, NY, USA).

This study was conducted in accordance with the Declaration of Helsinki and the Ethical Guidelines for Medical and Health Research Involving Human Subjects established by the Ministry of Health, Labour and Welfare and the Ministry of Education, Culture, Sports, Science and Technology of Japan. This study was approved by the Research Ethics Committee of Yuai-Kai Foundation and Oda Hospital (Approval No. 20250313). We disclosed the research information on the hospital’s website and allowed patients to opt out of the study.

In this study, a conflict of interest exists because employees of OPTiM Corporation participated as authors and the developed system was subsequently supplied as a product to Oda Hospital. To ensure objectivity, authors from multiple independent medical institutions collaborated on the research, with the corresponding author, who has no direct contractual relationship with the vendor, overseeing the entire process. Furthermore, during the development process, we adopted a “clinician-in-the-loop” approach that actively incorporated feedback from nurses in actual clinical settings and prioritized the clinical utility of the system to minimize bias from commercial interests.

Results

System Improvement Process (Figure 1)

Prototype Development Stage (Ver. 0 to 0.1)

Prototype development began in December 2023. In the early development stage, assigned nurses briefed technicians (OPTiM Corporation) on the purpose of nursing summaries, progress records, vital sign charts, nursing care plans, and basic patient admission databases. Discussions were held regarding the provision of patient chart information from the chart vendor (CSI Co., Ltd.) to technicians, reporting to the ethics committee, and obtaining patient consent, and decisions were made to submit existing nursing summary formats. By January 2024, CSI provided information from 40 patients who had consented to external use of their data, enabling technicians to generate AI-driven summaries from nursing progress records. During the initial stage, the system was implemented by having LLMs read recent hospitalization records to output summaries. However, this yielded fragmented and impractical documents such as “orthostatic hypotension present, caution during movement” and “infection prevention behaviors are maintained.” These lacked important context and had no clear linkage with corresponding nursing care plans, making them insufficient for understanding patient issues.

Figure 1 System improvement process.

Abbreviations: LLM, Large language models; SOAP, Subjective, objective, assessment plan; HER, Electronic health records.

In the final prototype stage (March 2024), a survey of 50 cases revealed that nurses spent a median time of 16 minutes creating summaries without using AI. Subsequently, implementation of Ver. 0.1 began at the end of March 2024, with a system that directly extracted and summarized nursing records from EHRs. However, the AI-generated summaries lacked clarity regarding which nursing care plans corresponded to the implemented records. They were sparse in content, failed to capture the essence of observational records or progress, and were evaluated as “not at a level that could be called a summary,” indicating very low practicality.

In April 2024, nurses revised 50 summaries created by the AI system and provided detailed feedback. This feedback identified challenges, including linking nursing care plans with nursing records, understanding medical terminology, and LLM execution times. Initially, the system attempted summarization using nursing records alone; however, the actual document structure showed that these records were connected to nursing care plans. Furthermore, there was a technical challenge: the initial nursing record-only summarization method required enormous input text, necessitating multiple LLM inference executions (eg, daily summarization of nursing records followed by weekly summarization of those results). This required substantial processing time, which was problematic.

Transition to Ver. 0.2 and Improvements (Ver. 0.1 to 0.2)

To address the challenges of Ver. 0.1, we transitioned to Ver. 0.2. Generally, nursing care plans are recorded with details such as observation, direct care, and education, divided into items such as #1, #2, and #3, and then documented in a SOAP format for each. After examining the feedback received in April, we concluded that a method where LLMs correctly interpreted these recording patterns and documented them by nursing problems would be beneficial. Data acquisition methods and prompt tuning were implemented. This method optimized the character count per execution, thus shortening overall LLM execution time.

However, a new challenge emerged: more care content outlined in the nursing care plans was output than the actual care content provided, and the documentation of implementation records was weak. Therefore, attempts were made to incorporate summaries of nursing content into the nursing care plan items, and the revised status of the prototype version became an agenda item.

Based on these challenges, Ver. 0.2 was released on July 19, 2024. At that time, feedback indicated that summaries focused on care content, making hospitalization progress unclear. We then proposed summarizing weekly nursing care plan evaluations. However, the content became centered on summarizing each item of “nursing care plan,” “progress,” and “nursing care plan evaluation,” while the patient’s condition became difficult to understand. Therefore, Oda Hospital provided OPTiM with materials regarding the SOAP definitions used in the summaries (Supplement 1), and the SOAP definition text was reflected as part of the prompt.

Ver. 0.3 Development and Evaluation (Ver. 0.2 to 0.3)

Based on the challenges faced by Ver. 0.2 and the newly provided SOAP definitions, Ver. 0.3 was released in late August 2024 to strengthen the interpretation of SOAP definitions and optimize the balance of nursing care plans, implementation, and progress. In Ver. 0.3, adjustments were made to emphasize the implementation content to resolve the challenges of the previous version.

In September 2024, we confirmed AI summary usage and usability. These summaries focused on nursing care plan goal achievement and rationales. However, clinical practice requires including patient condition changes, posing challenges in displaying nursing diagnosis names per number and managing lengthy content. At the end of October 2024, workload surveys indicated nurses spent a median time of 10 minutes creating and modifying summaries using the AI system.

Production Version (Ver. 1.0) Release (January 2025)

Following the improvements and evaluations of Ver. 0.3, the official version, Ver. 1.0, was released in January 2025. This final version newly implemented extended functions for medical information providers, significant user interface improvements, and complete integration with EHR system icons, in addition to previous nursing summary functions. Through these enhancements, the system was launched as a formal operation, seamlessly integrating into daily nursing operations.

Effectiveness Comparison with and without AI Nursing Summary Creation System Use

When validating the nursing summaries created by AI, the total scores for the five evaluation items in Ver. 0.2 and Ver. 0.3 showed no difference (Table 1). For individual items, items (1), (3), (4), and (5) demonstrated no changes in median scores between Ver. 0.2 and Ver. 0.3, whereas only item (2), clarity and readability, showed a median of 3 for Ver. 0.2 versus 3.5 for Ver. 0.3. When comparing the total scores for the five items using the Mann–Whitney U-test, no significant difference was found (p = 0.571).

Table 1 Nurse Evaluation of AI-Driven Nursing Summary System Ver. 0.2 and Ver. 0.3

Additionally, using AI significantly reduced the median time for nurses to create and modify summaries to 10 minutes, compared to 16 minutes without AI (p < 0.001, Mann–Whitney U-test; Table 2). The median time for AI-generated nursing summary creation alone was one minute (interquartile range, 1–2 minutes). While the median length of stay for patients in the AI-using group was 22 days (mean: 25.9 days) and 25 days for the non-AI group (mean: 30 days), this difference was not statistically significant (p = 0.432, Mann–Whitney U-test).

Table 2 Median Time and Interquartile Range Comparison for Nursing Summary Creation

Discussion

After approximately one year of phased improvements since prototype development began in December 2023, we successfully overcame initial challenges and completed a practical AI nursing summary system that specifically contributed to nurse workflow efficiency. The quality of summaries created using the AI system by nurses in Ver. 0.2 and Ver. 0.3 showed no significant differences. We anticipate that this development process will serve as a model for future medical AI development, demonstrating the successful integration of practical knowledge with AI technology.

In this study, development of the AI nursing summary system reduced nursing record creation time by approximately 40% (from 16 to 10 minutes). While international studies have reported more dramatic time reductions (up to 90%) in medical documentation,15,16 direct comparison is challenging due to fundamental methodological differences. Previous studies primarily measured AI generation time alone, without accounting for essential human verification and modification processes that are critical in clinical practice. Our study implemented a comprehensive “human-in-the-loop” approach, measuring total time including AI generation (one minute) plus nurse verification/modification time (eight minutes). Even with this rigorous human oversight requirement, which is essential for patient safety, we achieved a clinically meaningful 40% time reduction.

The phased development approach adopted in this study (evolution from Ver. 0.1 to 1.0) implemented the iterative development model recommended for healthcare AI system development.17 In particular, the decision to actively incorporate feedback from clinical field nurses aligns with the “clinician-in-the-loop” approach emphasized in US medical AI development—an approach in which healthcare professionals participate from the design stage to enhance clinical utility.18 This study’s improvement in interpreting nursing care plan number recording patterns offers a notable solution to the technical challenge of understanding medical document-specific structures. While multiple studies emphasize integrating medical domain knowledge for accurate AI medical document creation,19,20 this study demonstrates its importance in the specific case of understanding the SOAP format and nursing care plan structures.

Regarding quality assessment, our study demonstrated limited improvement between Ver. 0.2 and Ver. 0.3, with only clarity and readability showing enhancement (median 3.0 to 3.5). However, this finding requires careful interpretation within the developmental context. The transition from Ver. 0.1 to Ver. 0.2 showed dramatic qualitative improvements in content coherence and clinical relevance, although these were not systematically evaluated using our standardized assessment criteria. This suggests that Ver. 0.2 may have achieved a functional quality threshold that left limited room for further improvement in subsequent iterations.

From a clinical effectiveness perspective, achieving time reduction is extremely useful. Approaches such as Retrieval-Augmented Generation contribute to improving the accuracy of medical AI,21 and appropriately linking nursing care plans with nursing records likely enables more clinically valuable summaries. Additionally, insights regarding the balance of nursing summary content represent important achievements. Nursing summaries are not mere records; they are communication tools to ensure the continuity of patient care. From this perspective, comprehensive information provision—including changes in the patient’s condition—is an essential element for future care providers to understand the patient’s overall condition. The results demonstrate the importance of maintaining “comprehensive and balanced information” in AI-generated nursing summaries, an essential value of these documents. This provides a crucial guideline for balancing the often-conflicting goals of AI efficiency and clinical value.

The study’s limitations include generalizability constraints due to single-facility implementation, data privacy and security challenges, risks of excessive AI dependence, and the need for standardized evaluation indicators. In particular, validating adaptability in facilities with different EHR systems or documentation practices remains a crucial future challenge. Regarding AI-generated content confirmation and modification, it is essential to clarify AI’s role as a tool supporting clinical judgment, necessitating the establishment of effective collaboration models between AI systems and healthcare professionals. Regarding evaluation, although nurses with a certain level of experience and skills conducted the assessments, we were unable to establish unified evaluation criteria through pre-training or assess inter-rater reliability. Blinding was not feasible during the evaluation of nursing summaries. Sample size was not determined in advance, and the number of samples represents what could be collected during the development phase. Potential confounding factors were not evaluated. A critical limitation of our quality evaluation is the absence of direct comparison between AI-generated and human-created summaries for identical patients, making absolute quality assessment challenging. Future studies should incorporate head-to-head comparisons with human-generated summaries and develop objective quality metrics that correlate with patient care outcomes.

Conclusion

This study achieved an approximate 40% reduction in nursing record creation time and produced clinically acceptable summary documents through the phased development and implementation of an AI-driven nursing summary system. This success stemmed from integrating iterative development with medical professional collaboration and AI technology, which aided in understanding nursing record-specific structures. As digital healthcare transformation advances, this study’s phased, collaborative development approach provides crucial guidelines for future medical AI development, serving as a model for building AI systems that meet the needs of complex medical settings. Although this is a single-center study, the phased approach employed in this research suggests the potential generalizability of the system across diverse clinical settings. Furthermore, this work may serve as an important starting point for developing a collaborative human–AI model in which AI effectively supports rather than replaces human clinical judgment. Further validation is needed to improve the quality of this system.

Abbreviations

AI, Artificial Intelligence; LLMs, Large language models; EHR, Electronic health records; SOAP, subjective, objective, assessment plan.

Acknowledgments

We would like to express our gratitude to Ms. Yoshiko Kato, Ms. Yuka Hisamoto, Ms. Hitomi Shouho, and Mr. Shigeki Taniguchi for their cooperation in evaluating and verifying the summary. For this study, we utilized AI assistance (Claude 4.0 SONNET) to organize information, search the literature, and translate text into English.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Disclosure

Mr. Daisuke Yamamoto and Mr. Suzunosuke Ito, OPTiM Corporation employees, developed the AI-driven nursing summary system used in this study. This system has been provided as a product to Oda Hospital since 2025; consequently, a contractual relationship exists between OPTiM and Oda Hospital. The authors report no other conflicts of interest in this work.

References

1. De Groot K, De Veer AJE, Munster AM, Francke AL, Paans W. Nursing documentation and its relationship with perceived nursing workload: a mixed-methods study among community nurses. BMC Nurs. 2022;21(1):34. PMID: 35090442; PMCID: PMC8795724. doi:10.1186/s12912-022-00811-7

2. Cooper AL, Brown JA, Eccles SP, Cooper N, Albrecht MA. Is nursing and midwifery clinical documentation a burden? An empirical study of perception versus reality. J Clin Nurs. 2021;30(11–12):1645–1652. PMID: 33590554. doi:10.1111/jocn.15718

3. Lacroix M, Aouad T, Feydy J, et al. Artificial intelligence in musculoskeletal oncology imaging: a critical review of current applications. Diagn Interv Imaging. 2023;104(1):18–23. PMID: 36270953. doi:10.1016/j.diii.2022.10.004

4. Ocana A, Pandiella A, Privat C, et al. Integrating artificial intelligence in drug discovery and early drug development: a transformative approach. Biomark Res. 2025;13(1):45. PMID: 40087789; PMCID: PMC11909971. doi:10.1186/s40364-025-00758-2

5. Hamilton A. The future of artificial intelligence in surgery. Cureus. 2024;16(7):e63699. PMID: 39092371; PMCID: PMC11293880. doi:10.7759/cureus.63699

6. Cheruvu C, Davies A, Lu Y, Mak R, Ingley P. WP2. 10-the future of artificial intelligence (AI) in surgery. Brit J Surg. 2024;111(Supplement_8):znae197.132. doi:10.1093/bjs/znae197.132

7. Reunamo A, Peltonen LM, Mustonen R, et al. Text classification model explainability for keyword extraction - towards keyword-based summarization of nursing care episodes. Stud Health Technol Inform. 2022;290:632–636. PMID: 35673093. doi:10.3233/SHTI220154

8. Tung JYM, Gill SR, Sng GGR, et al. Comparison of the quality of discharge letters written by large language models and junior clinicians: single-blinded study. J Med Internet Res. 2024;26:e57721. PMID: 39047282; PMCID: PMC11306941. doi:10.2196/57721

9. Balloch J, Sridharan S, Oldham G, et al. Use of an ambient artificial intelligence tool to improve quality of clinical documentation. Future Healthc J. 2024;11(3):100157. PMID: 39371531; PMCID: PMC11452835. doi:10.1016/j.fhj.2024.100157

10. Mohammed RA, Ahmed SK, Nashwan AJ. Leveraging artificial intelligence to enhance nursing workflow in endoscopy units. Mesopotamian J Artif Intell Healthc. 2025;2025:93–95. doi:10.58496/MJAIH/2025/009

11. Moldskred PS, Snibsøer AK, Espehaug B. Improving the quality of nursing documentation at a residential care home: a clinical audit. BMC Nurs. 2021;20(1):103. PMID: 34154606; PMCID: PMC8215798. doi:10.1186/s12912-021-00629-9

12. Bjerkan J, Valderaune V, Olsen RM. Patient safety through nursing documentation: barriers identified by healthcare professionals and students. Front Comput Sci. 2021;3:624555. doi:10.3389/fcomp.2021.624555

13. Seo J, Choi D, Kim T, et al. Evaluation framework of large language models in medical documentation: development and usability study. J Med Internet Res. 2024;26:e58329. PMID: 39566044; PMCID: PMC11618017. doi:10.2196/58329

14. Elkourdi F, Wei C, Xiao L, Yu Z, Asan O. Exploring current practices and challenges of HIPAA compliance in software engineering: scoping review. IEEE Syst J. 2024;(2):94–104 doi:10.1109/OJSE.2024.3392691.

15. Sánchez-Rosenberg G, Magnéli M, Barle N, et al. ChatGPT-4 generates orthopedic discharge documents faster than humans maintaining comparable quality: a pilot study of 6 cases. Acta Orthop. 2024;95:152–156. PMID: 38597205; PMCID: PMC10959013. doi:10.2340/17453674.2024.40182

16. Jin H, Guo J, Lin Q, Wu S, Hu W, Li X. Comparative study of Claude 3.5-Sonnet and human physicians in generating discharge summaries for patients with renal insufficiency: assessment of efficiency, accuracy, and quality. Front Digit Health. 2024;6:1456911. PMID: 39703756; PMCID: PMC11655460. doi:10.3389/fdgth.2024.1456911

17. Dwivedi S. Software development life cycle models-A comparative analysis. Int J Adv Res Comput Communic Engineer. 2016;5(2):232–233.

18. Dermody G, Fritz R. A conceptual framework for clinicians working with artificial intelligence and health-assistive smart homes. Nurs Inq. 2019;26(1):e12267. PMID: 30417510; PMCID: PMC6342619. doi:10.1111/nin.12267

19. Xie X, Niu J, Liu X, Chen Z, Tang S, Yu S. A survey on incorporating domain knowledge into deep learning for medical image analysis. Med Image Anal. 2021;69:101985. PMID: 33588117. doi:10.1016/j.media.2021.101985

20. Cabitza F, Campagner A, Ronzio L, et al. Rams, hounds and white boxes: investigating human-AI collaboration protocols in medical diagnosis. Artif Intell Med. 2023;138:102506. PMID: 36990586. doi:10.1016/j.artmed.2023.102506

21. Liu S, McCoy AB, Wright A. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. J Am Med Inform Assoc. 2025;ocaf008. PMID: 39812777; PMCID: PMC12005634 doi:10.1093/jamia/ocaf008.

Creative Commons License © 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.