A Novel 3D-Printed Mouse Model for Surgical Training: Multicenter Construct, Face, and Content Validation Study
Advancements in laboratory animal training increasingly incorporate technological innovations aiming to better align training standards with the 3Rs (Replacement, Reduction, and Refinement). This trend is shifting away from traditional reliance on live animals and cadavers toward simulation-based methods. This study introduces and assesses the validity of a novel 3D-printed rodent surgical simulator designed for the practice and training of basic rodent surgical skills. To evaluate its potential to partially replace animal use, refine rodent surgical training, and reduce the number of animals needed, a multicenter validation study across 5 European and US research academic centers was conducted. The study assessed the simulator’s face, content, and construct validity, involving participants inexperienced and experts in rodent surgery. The construct validity was evaluated through task completion times and blinded quality assessments across multiple training iterations. The results revealed that inexperienced participants demonstrated significant improvements in both speed and quality of surgical tasks with repeated simulator use, eventually reaching performance levels comparable to experts’ initial attempts. Expert participants consistently outperformed the inexperienced group. Face and content validity were supported by postuse surveys, with high ratings from both groups regarding the simulator’s anatomic realism and its perceived usefulness for the acquisition and development of fundamental surgical skills. Overall, the findings of this study support that this 3D-printed rodent surgical simulator offers a realistic, effective, and ethically sound alternative for basic rodent surgical skills training and competency assessment.
Introduction
In recent years, advancements in laboratory animal training have focused on leveraging technological innovations to enhance adherence to the 3Rs (Replacement, Reduction, and Refinement). Traditionally, skill development in laboratory animal biomethods have depended heavily on the use of live animal models and cadavers. While these methods remain valuable, there is a growing interest in incorporating novel inorganic simulation approaches to reduce or even replace animal use in early-stage training to fully align with the 3R principles. This shift in laboratory animal training emphasizes reduced reliance on live animals and cadavers and exploring alternatives such as basic simulation-based training1–5 and advanced inorganic simulation models.6–10 These innovative training tools replicate anatomy and appearance with high accuracy, providing safer, more ethical, and effective training methods.
The Guide for the Care and Use of Laboratory Animals stipulates that research personnel must be adequately trained before engaging in animal work.11 Furthermore, competence in laboratory animal science is mandated under Directive 2010/63/EU for those working with animals in scientific procedures, which is essential to promoting animal welfare.12 Despite these guidelines, competency in rodent surgery is often gained through short training courses, where trainees (predominantly without medical or surgical backgrounds13) must quickly develop essential surgical skills.
Surgical skills are known to be learned through deliberate practice,14 defined as a repetitive performance of psychomotor skills with rigorous assessment, specific feedback, and a progressive level of difficulty.15,16 Most rodent surgical training courses rely on either low-fidelity simulators that barely resemble basic rodent characteristics (appearance, dimension, and absence of layers) or on using animals through cadaver laboratories or terminal surgical training procedures. Although using animals provides the most realistic training approach, cadavers quickly develop rigor and autolysis, and the use of live animals, even as a terminal procedure, not only requires more resources (an approved animal use protocol, anesthesia, respiratory protection, adequately equipped surgical/animal facilities, etc) but also represents a fundamental ethical dilemma in the context of the 3Rs.
To address these challenges, a novel 3D-printed rodent surgical simulator was developed at the University of Arizona, aimed at providing an affordable, realistic, animal-free platform for the deliberate practice of basic surgical techniques (for example, incision and suturing). At a local level, this simulator has enabled both in-person and remote feedback while offering a standardized platform for assessment of basic rodent surgical skills competency.
To evaluate the simulator’s effectiveness and broader applicability, a multicenter validation study was conducted across 5 academic research institutions in Europe and the United States. The primary aim of this study was to assess the simulator’s face, content, and construct validity using a systematic, evidence-based approach.
The central hypothesis of this study was that the 3D-printed mouse simulator would have sufficient validity (specifically, “face,” “content,” and “construct validity”) to function as a reliable tool for foundational rodent surgical training. “Face validity” refers to the perceived realism of the simulator, while “content validity” reflects how well the simulator recreates the components and steps of real-life rodent surgery, as judged by subject matter experts. “Construct validity” represents the simulator’s ability to differentiate between users of different experience levels, which would enable its use in assessing skill acquisition.
This study evaluates participants’ perceptions of realism and educational relevance (face and content validity), as well as performance metrics across novice and expert users (construct validity), to determine the simulator’s utility for both training and competency-based assessment.
Ethical review.
This study was reviewed and approved by the University of Arizona Institutional Review Board, which complies with the Federalwide Assurance with the Office for Human Research Protections. The project was deemed exempt from full Institutional Review Board review, with a risk level of minimal risk to participants. The study did not involve animal subjects, and no IACUC approval was required. All research procedures were conducted following the approved protocol, the Belmont Report, and applicable regulations outlined in 45 CFR 46.111 and 21 CFR Part 50. The study adhered to ethical standards for human subject research, ensuring participant safety and confidentiality throughout the project.
Materials and Methods
Simulator design and setup.
The midlaparotomy mouse model simulator used in this study was conceptualized and developed by Dr. Celdran at the University Animal Care Department of the University of Arizona (Tucson, AZ). The simulator was designed based on a mouse cadaver fixed in dorsal recumbency (the standard surgical position for midline laparotomy procedures), maintaining 1:1 scale and external morphologic fidelity. For this validation study, early-stage prototypes were produced using fused deposition modeling with standard biocompatible thermoplastic (polylactic acid) using a P1P model printer (Bambu Labs, Austin, TX). A commonly available synthetic material (0.2032-mm-thick nitrile gloves) was used to simulate the structural composition (skin and muscular layers) of a mouse abdominal wall. The inorganic layers were held in place with interchangeable custom-fit inserts that allow for the replacement of the layers and repeated practice. The design of the “3R Mouse” simulator (Figure 1A) tested in this study has been submitted for full patent protection with the US Patent and Trademark Office.


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
Study design.
Each of the collaborating centers recruited 6 inexperienced and 3 expert subjects for a total of 45 participants. Inexperienced subjects were defined for this study as trainees requiring rodent surgery training who had never performed any surgical procedure. The expert subjects were defined for this study as experienced and proficient rodent surgeons with over 50 rodent surgeries performed during the 3 mo before the simulator assessment. The subjects were assigned to each group based on the above-mentioned criteria and independently of their educational or professional background.
A comprehensive and standardized approach was used to harmonize the data collection and testing procedure among participating institutions. All enrolled subjects attended a standardized orientation/training session in which the same presentation and aiding materials were used. Each center’s study coordinator, a laboratory animal veterinarian with extensive rodent surgical training experience, explained the study objectives and methodology, enrolled participants, provided standardized guidance on proper surgical instrument and needle/suture handling, and identical testing materials. The standardized presentation also included step-by-step video demonstrations of all the standardized tasks to be performed by the participants as part of the validation study.
Construct validity assessment.
To evaluate the construct validity of the 3R Mouse as a simulator for surgical training, we aimed to determine (1) whether it is effective for training basic rodent surgical skills, (2) whether it helps the user to acquire key surgical competencies, and (3) whether it can distinguish between different skill levels (for example, inexperienced compared with expert). To do this, all enrolled subjects completed the same 3 tasks during each simulator iteration. To perform the tasks, the simulators were secured in place with tape and covered with a translucent plastic draping (Press ‘n Seal; SC Johnson Professional, Racine WI) to replicate the standardized conditions typically encountered in rodent aseptic surgery (Figure 1B). All subjects were provided with the same standardized materials (Figure 2) to complete all tasks in every iteration.


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
Task 1 involved creating a window through the transparent draping, making a 20-mm-long incision through the premarked simulated skin layer, placing and fixing the provided skin retractors, and making a 15-mm-long incision through the simulated muscular layer along a premarked line. Task 2 involved closing the simulated muscular layer with a single, continuous, simple running suture. Task 3 involved closing the simulated skin layer using simple interrupted surgeon’s knots.
The same video demonstration used during the standardized orientation training session, along with identical step-by-step printed reference material, was available at all testing facilities for the subjects to consult during and between task exercises. Subjects in the inexperienced group completed a total of 5 iterations (performing all 3 tasks on each iteration) within a maximum period of 10 d, but completion times and standardized images were only collected for the first, third, and fifth iterations. Subjects in the expert group completed only 3 iterations within an equal 10-d period, with completion times and standardized images collected for all 3 iterations.
The study coordinator at each participating center recorded the total time taken to complete each task, the total time needed to fully complete all 3 tasks, and standardized photographs once each task was completed (Figure 3).


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
The standardized photographs from tasks 2 and 3 were scored using preset criteria (Table 1). Each task received a quality score, and these individual scores were then combined to calculate an overall quality score for each simulator iteration. Once all standardized images were collected, the image folders were deidentified and renamed with computer-generated aleatory alphanumeric codes to ensure that the scorer remained unaware of the participant’s location, iteration, or group. Scoring was then performed in a double-blind manner across all institutions. Each study coordinator was randomly assigned to score images from 3 other sites. The final quality score for each category was calculated as the average of the 3 independent scores for each image.
| Parameter | Criteria/score (points) | |||
|---|---|---|---|---|
| Number of passes | Insufficient | Slightly insufficient | Slightly too many | Ideal |
| Total number of suture passes to achieve an adequate and complete closure of the incision. | 0 | 1 | 2 | 3 |
| Suture spacing | Poorly spaced | Somewhat uneven | Somewhat even | Evenly spaced |
| Consistency and appropriateness of spacing between individual suture passes along the incision. | 0 | 1 | 2 | 3 |
| Needle placement accuracy | Too far | Too close | Generally adequate but inconsistent | Consistently adequate |
| Distance of the needle entry and exit points from the incision edge, reflects precision and symmetry. | 1 | 2 | 3 | 4 |
| Suture tension | Too lose | Too tight | Adequately tensioned | |
| Appropriateness of tension applied to the suture. Too loose (gaping)/too tight (tissue strangulation/tearing). | 1 | 2 | 3 | |
Construct validity was assessed by analyzing both task completion times, evaluated for each individual task as well as cumulatively, and quality scores, which were similarly assessed per task and in aggregate. Quality scores were based on blind evaluations of posttask photographs.
Face and content validity assessment.
To evaluate the face and content validity of the 3R mouse simulator, all participants filled out a post hoc questionnaire survey that included basic de-identified demographic data (that is dominant hand, level of education, current position, and experience working with laboratory animals), 3 questions to rate the perceived level of realism and accuracy of the simulators (face validity), 4 questions to assess the usefulness of the simulator as a tool for acquiring basic rodent surgical skills (content validity), and 5 additional questions were added to gauge the perceived value and impact of simulation-based rodent surgical training (perceived utility, emotional comfort, skill acquisition, and endorsement of the 3R Mouse simulator). A 5-level Likert scale was used to rate the quality of each of the listed aspects.
Statistical analysis.
Statistical analysis was conducted using R software (R Core Team, 2023), version 4.4.1. The analysis consisted of linear mixed-effects models to evaluate the effects of 2 fixed variables, group (experienced and inexperienced) and iteration, as well as their interaction, on the response variables time and quality across the different tasks performed (task 1, task 2, and task 3) and each iteration as a whole (total). The models also included location and individual as a random effect to account for variability between locations while focusing on the primary variables of interest. Linear models were run using the function lmer (package:lme4).17
Results
Participant demographics.
A total of 44 participants were included in the study: 29 in the inexperienced group and 15 in the expert group. One individual initially allocated to the inexperienced group was excluded after it was determined that, although limited, that individual had prior surgical experience, thus not meeting the inclusion criteria.
All participants completed the demographic questionnaire. Among the final cohort, 41 (93.1%) were right-handed, 2 (4.5%) were ambidextrous, and 1 (2.3%) was left-handed. In the expert group, 14 (93.3%) participants were right-handed, and 1 (6.7%) was ambidextrous. In the inexperienced group, 27 (93.1%) were right-handed, 1 (3.4%) was ambidextrous, and 1 (3.4%) was left-handed.
Regarding the education level, within the expert group, 5 (33.3%) participants held a doctoral degree (PhD), 5 (33.3%) held a professional medical degree (DVM, MD, or DDS), 2 (13.3%) held a master’s degree, 1 (6.7%) had an associate degree, and 2 (13.3%) reported a high school diploma as their highest level of education.
In the inexperienced group, 7 (24.1%) participants held a doctoral degree (PhD), 2 (6.9%) held a professional medical degree (DVM, MD, or DDS), 12 (41.4%) held a master’s degree, 2 (6.9%) held a bachelor’s degree, 2 (6.9%) had completed “some college” with no degree, and 4 (13.8%) reported a high school diploma as their highest educational attainment.
In relation to their current employment position, the participants in the expert group included 1 (6.7%) doctoral student, 3 (20.0%) postdoctoral researchers, 5 (33.3%) principal investigators, 4 (26.7%) research technicians, and 2 (13.3%) veterinarians. The inexperienced group included 11 (37.9%) doctoral students, 7 (24.1%) postdoctoral researchers, 1 (3.4%) master’s student, 5 (17.2%) research technicians, 4 (13.8%) veterinary technicians, and 1 (3.4%) classified under “other.”
Finally, in regards to the overall experience working with animals in research, in the expert group 12 (80.0%) participants reported more than 5 y of experience, 2 (13.3%) had 3 to 5 y of experience, and 1 (6.7%) had 1 to 3 y of experience. In the inexperienced group, 8 (27.6%) participants reported more than 5 y of experience, 5 (17.2%) had 3 to 5 y, 9 (31.0%) had 1 to 3 y, 4 (13.8%) had less than 1 y, and 3 (10.3%) reported no prior experience with research animals. A detailed survey response distribution for all the demographic data obtained is reported in Table 2.
| Expert group (n = 15) | Inexperienced group (n = 29) | Total (n = 44) | |
|---|---|---|---|
| Dominant hand | |||
| Right-handed | 14 (93.3%) | 27 (93.1%) | 41 (93.1%) |
| Ambidextrous | 1 (6.7%) | 1 (3.4%) | 2 (4.5%) |
| Left-handed | 0 (0%) | 1 (3.4%) | 1 (2.3%) |
| Highest education level | |||
| Doctorate (PhD) | 5 (33.3%) | 7 (24.1%) | 12 (27.3%) |
| Medical degree (DVM, MD, DDS) | 5 (33.3%) | 2 (6.9%) | 7 (15.9%) |
| Master’s degree | 2 (13.3%) | 12 (41.4%) | 14 (31.8%) |
| Bachelor’s degree | 0 (0%) | 2 (6.9%) | 2 (4.5%) |
| Some college, no degree | 0 (0%) | 2 (6.9%) | 2 (4.5%) |
| Associate degree | 1 (6.7%) | 0 (0%) | 1 (2.3%) |
| High school diploma | 2 (13.3%) | 4 (13.8%) | 6 (13.6%) |
| Current position | |||
| Doctoral student | 1 (6.7%) | 11 (37.9%) | 12 (27.3%) |
| Postdoctoral researcher | 3 (20.0%) | 7 (24.1%) | 10 (22.7%) |
| Master’s student | 0 (0%) | 1 (3.4%) | 1 (2.3%) |
| Principal investigator | 5 (33.3%) | 0 (0%) | 5 (11.4%) |
| Research technician | 4 (26.7%) | 5 (17.2%) | 9 (20.5%) |
| Veterinary technician | 0 (0%) | 4 (13.8%) | 4 (9.1%) |
| Other | 2 (13.3%) | 1 (3.4%) | 3 (6.8%) |
| Years of experience with research animals | |||
| >5 y | 12 (80.0%) | 8 (27.6%) | 20 (45.5%) |
| 3–5 y | 2 (13.3%) | 5 (17.2%) | 7 (15.9%) |
| 1–3 y | 1 (6.7%) | 9 (31.0%) | 10 (22.7%) |
| <1 y | 0 (0%) | 4 (13.8%) | 4 (9.1%) |
| No experience | 0 (0%) | 3 (10.3%) | 3 (6.8%) |
Construct validity assessment.
Construct validity was assessed by analyzing both task completion times and quality scores, which were derived from blinded evaluations of post-task photographs. The marginal R2 values, representing the proportion of variance explained by fixed effects (group and iteration), ranged from 0.042 to 0.518. The conditional R2 values, which account for variance explained by both fixed and random effects (including individual participant differences and location variability), ranged from 0.572 to 0.911.
Total time to complete all tasks.
The expert group completed all tasks across 3 iterations in an average time of 19:43 ± 3:37 min, while the inexperienced group averaged 31:60 ± 7:11 min. On the first iteration, the expert group completed all tasks significantly faster than the inexperienced group (21:17 ± 3:01 compared with 40:02 ± 9:45 min, P < 0.05). By the third iteration, both groups showed improvement; however, the expert group continued to perform significantly faster (18:03 ± 3:25 compared with 29:10 ± 5:37 min, P < 0.05) (Figure 4). By the fifth iteration, the inexperienced group’s time, although still longer, was not statistically different from the expert group’s time in the first iteration.


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
Within-group comparisons revealed that the inexperienced group significantly reduced their total task time between the first (40:02 ± 9:45 min), third (29:10 ± 5:37 min), and fifth (26:47 ± 6:10 min) iterations (P < 0.05) (Figure 5A). The expert group showed consistent but nonsignificant improvement from the first to the third iteration (from 21:17 to 18:03 min) (Figure 5B).


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
Time to complete individual tasks.
The inexperienced group demonstrated continuous improvement across repetitions. For task 1, the average completion time decreased from 4:07 ± 1:47 min on the first iteration to 3:20 ± 1:30 min on the third and 2:60 ± 1:23 min on the fifth. This reduction reached statistical significance only when comparing the first and fifth iterations. For task 2, the completion time decreased from 16:46 ± 4:28 min in the first iteration to 11:27 ± 2:41 min (third iteration) and 10:17 ± 2:52 min (fifth iteration), with both reductions reaching statistical significance (P < 0.05). Similarly, for task 3, the time dropped from 19:09 ± 5:22 min in the first iteration to 14:46 ± 3:17 min (third iteration) and 13:30 ± 3:48 min (fifth iteration), also showing statistically significant improvement (P < 0.05) (Figure 5C).
The expert group’s task times remained relatively stable across iterations. For task 1, the average time decreased from 3:39 ± 1:27 min in the first iteration to 3:04 ± 1:07 min in the second and 2:48 ± 1:03 min in the third. The time recorded for task 1 on the third iteration was significantly different from the first (P < 0.05). For task 2, the average time was 8:05 ± 1:53 min in the first iteration, 7:03 ± 1:54 min in the second, and 6:10 ± 1:23 min in the third. For task 3, the expert group recorded times of 9:33 ± 1:22 min, 9:43 ± 2:38 min, and 9:06 ± 1:49 min for the first, second, and third iterations, respectively. None of these differences were statistically significant (Figure 5D).
Total quality scores.
Regarding overall quality, the inexperienced group averaged 17.6 ± 2.6 points across all 3 iterations, while the expert group achieved an average score of 21.7 ± 1.3 points. On the first iteration, the expert group’s quality score was significantly higher than that of the inexperienced group (20.5 ± 1.6 compared with 14.2 ± 2.5, P < 0.05). By the third iteration, both groups showed improvement, but the expert group continued to perform significantly better (22.7 ± 0.8 compared with 18.1 ± 2.9, P < 0.05) (Figure 6). By the fifth iteration, the total quality score of the inexperienced group (20.5 ± 2.3) was not statistically different from the expert group’s score in their first iteration (20.5 ± 1.6).


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
The inexperienced group exhibited a statistically significant increase in average quality scores from the first iteration (14.2 ± 2.5 points) to both the third (18.1 ± 2.9 points) and fifth (20.5 ± 2.3 points) iterations (P < 0.05; Figure 7A). The expert group also showed an increase in average scores across iterations, from 20.5 ± 1.6 points on the first iteration to 21.8 ± 1.4 points on the second and 22.7 ± 0.8 points on the third. However, these changes only reach statistical significance when comparing the first and third iterations (Figure 7B).


Citation: Journal of the American Association for Laboratory Animal Science 64, 5; 10.30802/AALAS-JAALAS-25-064
Quality scores for individual tasks.
The inexperienced group showed consistent improvement in quality scores for both scored tasks across repetitions. For task 2, the average quality score increased from 6.3 ± 1.5 points in the first iteration to 8.8 ± 1.5 points in the third and 9.7 ± 1.6 points in the fifth iteration, with both increases being statistically significant compared with the first iteration (P < 0.05). For task 3, the quality score improved from 8.1 ± 1.5 points in the first iteration to 9.2 ± 1.9 points in the third and 10.8 ± 1.3 points in the fifth iteration. All improvements, when compared with the first iteration, reached statistical significance (P < 0.05) (Figure 7C).
The expert group’s quality scores remained stable across iterations for both scored tasks. For task 2, the average quality score was 9.5 ± 1.0 points in the first iteration, 10.6 ± 1.1 points in the second, and 11.1 ± 0.8 points in the third iteration, with a significant increase observed between the first and third iterations (P < 0.05). For task 3, the quality score was 11.1 ± 1.1 points in the first iteration, remaining consistent at 11.1 ± 0.7 points in the second and 11.6 ± 0.6 points in the third iteration, with none of these differences reaching statistical significance (Figure 7D).
Face and content validity assessment.
For face validity, participants answered 3 questions related to the simulator’s realism and surface credibility. The first question inquired about the accuracy of the anatomic size and position. Among experts, 53.3% rated it as “very accurate” and 40.0% as “extremely accurate,” while only one participant (6.7%) rated it “moderately accurate.” The inexperienced group showed similar perceptions, with 51.7% selecting very accurate, 44.8% extremely accurate, and one individual (3.4%) reporting it as moderately accurate.
The second question asked the participants to rate the accuracy of the tissue feel and haptic feedback. Ratings for tissue feel and haptic feedback were more varied. Among experts, 46.7% rated it very accurate, 46.7% moderately accurate, and 6.7% “not accurate at all.” In the inexperienced group, 44.8% found it moderately accurate, 34.5% very accurate, 13.8% “slightly accurate,” and 6.9% extremely accurate.
The third question was centered on assessing the overall anatomic realism of the simulator. Here, 53.3% of experts rated the simulator as very accurate, 13.3% as extremely accurate, and 33.3% as moderately accurate. Inexperienced participants responded similarly: 58.6% selected very accurate, 34.5% moderately accurate, and 6.9% extremely accurate.
Content validity was assessed with 4 questions addressing the simulator’s perceived educational value. First, the participants were asked about the usefulness of the simulator for learning general rodent surgery fundamentals. Among experts, 53.3% rated the simulator as “extremely useful,” 26.7% as “very useful,” and 20.0% as “moderately useful.” Inexperienced participants responded similarly, with 48.3% rating it extremely useful, 48.3% very useful, and 3.4% moderately useful. When asked about the usefulness of the simulator for learning instrument and tissue handling skills, expert ratings were 53.3% extremely useful, 40.0% very useful, and 6.7% moderately useful. The inexperienced group responded even more favorably, with 75.9% selecting extremely useful, 20.7% very useful, and 3.4% moderately useful.
The participants also judged favorably the usefulness of the simulator for the acquisition of basic suturing skills. Most participants found the simulator helpful for suturing practice: 66.7% of experts and 79.3% of inexperienced users rated it extremely useful, while 33.3% and 20.7%, respectively, rated it very useful. In regard to the overall usefulness of rodent surgery training, 53.3% of experts and 65.5% of inexperienced participants rated the simulator as extremely useful, while 40.0% and 34.5%, respectively, rated it very useful. Only one expert (6.7%) considered it moderately useful.
The complete distribution of the participant responses to the Likert-scale statements assessing face (accuracy) and content (usefulness) validity of the simulator is reported in Table 3.
| Expert group (n = 15) | Inexperienced group (n = 29) | Total (n = 44) | |
|---|---|---|---|
| Accuracy of size and anatomic position | |||
| Extremely accurate | 6 (40.0%) | 13 (44.8%) | 19 (43.2%) |
| Very accurate | 8 (53.3%) | 15 (51.7%) | 23 (52.3) |
| Moderately accurate | 1 (6.7%) | 1 (3.4%) | 2 (4.5%) |
| Slightly accurate | — | — | — |
| Not accurate at all | — | — | — |
| Accuracy of tissue feel and haptic feedback | |||
| Extremely accurate | — | 2 (6.9%) | 2 (4.5%) |
| Very accurate | 7 (46.7%) | 10 (34.5%) | 17 (38.6% |
| Moderately accurate | 7 (46.7%) | 13 (44.8%) | 20 (45.5%) |
| Slightly accurate | — | 4 (13.8%) | 4 (9.1%) |
| Not accurate at all | 1 (6.7%) | — | 1 (2.3%) |
| Overall anatomic accuracy and realism | |||
| Extremely accurate | 2 (13.3%) | 2(6.9%) | 12 (27.3%) |
| Very accurate | 8 (53.3%) | 17 (58.6%) | 10 (22.7%) |
| Moderately accurate | 5 (33.3% | 10 (34.5%) | 1 (2.3%) |
| Slightly accurate | — | — | 5 (11.4%) |
| Not accurate at all | — | — | 9 (20.5%) |
| Usefulness for learning rodent general surgery fundamentals | |||
| Extremely useful | 8 (53.3%) | 14 (48.3%) | 22 (50%) |
| Very useful | 4 (26.7%) | 14 (48.3%) | 18 (40.9%) |
| Moderately useful | 3 (20.0%) | 1 (3.4%) | 4 (9.1%) |
| Slightly useful | — | — | — |
| Not useful at all | — | — | — |
| Usefulness for acquisition of basic instrument and tissue handling skills | |||
| Extremely useful | 8 (53.3%) | 22 (75.9%) | 30 (68.2%) |
| Very useful | 6 (40.0%) | 6 (20.7%) | 12 (27.3%) |
| Moderately useful | 1 (6.7%) | 1 (3.4%) | 2 (4.5%) |
| Slightly useful | — | — | — |
| Not useful at all | — | — | — |
| Usefulness for the acquisition of basic suturing skills | |||
| Extremely useful | 10 (66.7%) | 23 (79.3%) | 33 (75.0%) |
| Very useful | 5 (33.3%) | 6 (20.7%) | 11 (25.0% |
| Moderately useful | — | — | — |
| Slightly useful | — | — | — |
| Not useful at all | — | — | — |
| Overall usefulness | |||
| Extremely useful | 8 (53.3%) | 19 (65.5%) | 27 (61.4%) |
| Very useful | 6 (40.0%) | 10 (34.5%) | 16 (36.4%) |
| Moderately useful | 1 (6.7%) | — | 1 (2.3%) |
| Slightly useful | — | — | — |
| Not useful at all | — | — | — |
Perceived value and impact of simulation-based training.
Participants were surveyed regarding their perspective on the value and impact of simulation as part of rodent surgical training. In terms of perceived utility, when asked whether ex vivo simulation should be a mandatory first step, 29 (65.9%) strongly agreed, 9 (20.5%) somewhat agreed, 2 (4.5%) neither agreed nor disagreed, 2 (4.5%) somewhat disagreed, and 2 (4.5%) strongly disagreed.
Regarding the emotional comfort and perceived pressure while training, a majority of participants (32; 72.7%) strongly agreed that they felt less pressured using artificial simulators, 7 (15.9%) somewhat agreed, and 5 (11.4%) neither agreed nor disagreed. When asked whether a variety of realistic models is needed to gain surgical skills and competencies, 23 (52.3%) strongly agreed, 16 (36.4%) somewhat agreed, 4 (9.1%) neither agreed nor disagreed, and 1 (2.3%) somewhat disagreed.
In terms of confidence gained from simulator training, 19 (43.8%) participants strongly agreed that they felt confident to perform skin and muscle sutures after using the simulator, 16 (36.4%) somewhat agreed, 5 (11.4%) neither agreed nor disagreed, and 4 (9.1%) somewhat disagreed. Finally, all participants endorsed the simulator, indicating that they would recommend the simulator to peers, with 38 (86.4%) strongly agreeing and 6 (13.6%) somewhat agreeing.
The complete distribution of the participant responses to Likert-scale statements assessing attitudes toward simulation-based training is reported in Table 4.
| Expert group (n = 15) | Inexperienced group (n = 29) | Total (n = 44) | |
|---|---|---|---|
| Ex vivo simulators should be a mandatory first step | |||
| Strongly agree | 8 (53.3%) | 21 (72.4%) | 29 (65.9%) |
| Somewhat agree | 4 (26.7%) | 5 (17.2%) | 9 (20.5%) |
| Neither agree nor disagree | — | 2 (6.9%) | 2 (4.5%) |
| Somewhat disagree | 2 (13.3%) | — | 2 (4.5%) |
| Strongly disagree | 1 (6.7%) | 1 (3.4%) | 2 (4.5%) |
| I feel less pressured when training on artificial simulators | |||
| Strongly agree | 8 (53.3%) | 24 (82.8%) | 32 (72.7%) |
| Somewhat agree | 4 (26.7%) | 3 (10.3%) | 7 (15.9%) |
| Neither agree nor disagree | 3 (20.0%) | 2 (6.9%) | 5 (11.4%) |
| Somewhat disagree | — | — | — |
| Strongly disagree | — | — | — |
| A variety of realistic models is needed to gain rodent surgical skills and competencies | |||
| Strongly agree | 7 (46.7%) | 16 (55.2%) | 23 (52.3%) |
| Somewhat agree | 6 (40.0%) | 10 (34.5%) | 16 (36.4%) |
| Neither agree nor disagree | 2 (13.3%) | 2 (6.9%) | 4 (9.1%) |
| Somewhat disagree | — | 1 (3.4%) | 1 (2.3%) |
| Strongly disagree | — | — | — |
| After training on the simulator, I feel confident in performing skin/muscle sutures | |||
| Strongly agree | 6 (40.0%) | 13 (44.8%) | 19 (43.2%) |
| Somewhat agree | 5 (33.3%) | 11 (37.9%) | 16 (36.4%) |
| Neither agree nor disagree | 1 (6.7%) | 4 (13.8%) | 5 (11.4%) |
| Somewhat disagree | 3 (20.0%) | 1 (3.4%) | 4 (9.1%) |
| Strongly disagree | — | — | — |
| I would recommend the simulator to my peers | |||
| Strongly agree | 12 (80.0%) | 26 (89.7%) | 38 (86.4%) |
| Somewhat agree | 3 (20.0%) | 3 (10.3%) | 6 (13.6%) |
| Neither agree nor disagree | — | — | — |
| Somewhat disagree | — | — | — |
| Strongly disagree | — | — | — |
Discussion
The effectiveness of simulation-based clinical and surgical training18–20 for both medical21 and veterinary22–25 students has been widely reported, having a significant impact on the training curricula of veterinary and medical26,27 students alike. It has not been until the last decade that a growing emphasis on leveraging new technologies to foster innovation in laboratory animal sciences has led to the development and use of advanced simulation in lab animal training, aiming to enhance adherence to the 3Rs. The use of 3D printing to develop realistic simulation models has emerged as a key component of this shift,6–10 and the application of this technology to develop a rodent surgical training model provides a promising alternative to traditional training methodologies that still rely heavily on live animals and cadavers. The surgical simulation tool presented in this study removes the ethical concerns of using animals and provides a versatile, convenient, yet reliable training platform that can be successfully used in a variety of training environments. While this surgical simulator does not fully recreate all the nuances of live animal surgeries, it offers a highly valuable platform to acquire and develop basic skills, significantly lowering the steep learning curve for novice rodent surgeons.
The 3D-printed mouse simulator evaluated in this study was developed to support early-stage rodent surgical training but also to contribute to standardized approaches for skills acquisition and competency assessment in the laboratory animal field. To improve the validity of the findings, a multicenter study design was chosen. Including instructors and trainees from diverse training environments allowed for a more representative evaluation of the simulator’s educational utility. This approach helped us mitigate biases stemming from institution-specific practices and varying baseline training standards, providing a broader perspective on the applicability and acceptance of simulation-based training across different settings.
The multicenter validation approach, while offering significant advantages,28 also presents several challenges, including a more elaborate study design, complicated coordination and logistics, additional efforts to ensure uniformity in protocols and supplies, and advanced statistical analysis to address intersite and population variability.29 To mitigate these challenges, we implemented robust strategies, including consistent communications across the geographically distant testing sites, a balanced study design that ensured equivalent participant numbers at each location, and the use of identical training and practice materials. In addition, study data were deidentified at collection, and a double-blind scoring system and statistical models accounting for variables such as location and individual differences as random effects were applied, enabling us to address site-specific variability while maintaining the focus on the primary variables of interest. These measures helped us ensure the reliability and robustness of the data obtained, reinforcing the validity of our findings.
Validation process and results.
The validation process of a surgical simulator typically involves assessing its face, content, and construct validity using a combination of subjective and objective measures across participants with varying levels of expertise.30–32
Face validity.
The face validity of the 3R-Mouse simulator was supported by structured postuse surveys centered on the level of realism of the simulator. The simulator received consistently high ratings across both groups for anatomic accuracy, size, positioning, and haptic feedback. Over 90% of respondents in each group rated the anatomic size and positioning as very or extremely accurate. Although the evaluations of tissue feel and haptic feedback were slightly more variable, the expert feedback on haptic realism also ranged from moderately to very accurate, with only a single negative rating, suggesting general agreement on the simulator’s realism. Despite the overall positive feedback, the face validity results highlight opportunities to improve the materials used to replicate muscle and skin layers. The prototype tested in this study used simple, low-cost materials (for example, nitrile gloves) mainly for accessibility and because of the tear-prone nature of the nitrile when applying excessive pressure on the suture. This was perceived as a positive quality by the trainers, since it encourages a gentle approach to tissue handling and suture tension in the trainees. Nevertheless, based on the face validity results obtained, we are currently exploring alternative materials that better mimic the visual and elastic properties of mouse tissues.
Content validity.
Content validity was evaluated based on participants’ perceptions of the simulator’s usefulness across various training domains. Both expert and inexperienced participants consistently rated the simulator as highly beneficial for learning fundamental rodent surgical skills. Over 90% of respondents in each group rated it as either very or extremely useful for acquiring general surgical competencies, with similarly high ratings for its role in developing instrument and tissue handling skills. These favorable ratings, although inherently subjective, are substantiated by the objective improvements documented in the construct validity assessment, which supports the credibility of the simulator’s educational value.
Inexperienced users were particularly enthusiastic, with more than 75% rating the simulator as extremely useful, reflecting the simulator’s strong perceived impact at the early stages of surgical training and the high receptiveness of the trainees to using simulations during the early stages of training. However, it is important to acknowledge that content validity ideally relies more on evaluations from experts since they are expected to possess a deeper understanding of the required competencies. In this context, the more tempered endorsement from the expert group (only 53% rated the simulator extremely useful) likely reflects a more accurate and realistic representation of its strengths and limitations.
The high rating received from both groups as a suturing practice training tool highlights the simulator’s value for task-specific skill development, and the similarly high rating on the overall usefulness for rodent surgical training reinforces the relevance of the 3R-Mouse simulator as a significant step forward in ethically grounded, accessible, and effective early-stage rodent surgical training.
Construct validity.
Construct validity was established through an objective assessment of participant performance over repeated simulator use, based on task completion times and blinded quality scores. The inexperienced participants’ significant improvements in both quantitative (time) and qualitative (quality score) measures demonstrate progressive skill acquisition and support the simulator’s utility as an effective tool for early-stage training.
It is worth highlighting that the average scores obtained by the inexperienced participants on the fifth iteration reached the level of the results obtained by the experts in their first use of the simulator (both values were not significantly different for either the time or the quality scores). This improvement underscores the value of the simulator as a training tool and hints at the level of iteration at which the inexperienced trainees start reaching a learning plateau using the simulator. While these results are highly encouraging, we acknowledge that further work is needed to assess if the skills gained with the simulator adequately transfer to procedures performed on live animals.
The expert group demonstrated significantly shorter completion times and consistently higher quality scores (Figures 4 and 6), which supports the simulator’s construct validity through its ability to distinguish between experience levels. While some performance gains were observed among experts, statistically significant improvements were limited to comparisons between the first and third iterations in only 2 separate occasions (Figures 5 and 7). The absence of generalized significant changes in expert performance suggests that the simulator represents a limited challenge for experienced rodent surgeons. Taking into account their established technical proficiency, we believe that the isolated improvements observed are likely attributable to a brief adaptation to the simulator’s synthetic materials rather than true skill acquisition.
Survey insights on the perceived value and impact of simulation-based training.
Beyond the technical validation metrics, this study offers valuable insights into participants’ perceptions of simulation-based rodent surgical training. As experienced trainers, the authors of this study were not surprised to corroborate that both expert and inexperienced users generally rated the simulator as extremely useful for acquiring basic rodent surgery skills. These findings are consistent with prior reports demonstrating broad trainee support for simulation-based learning in laboratory animal science.33
The majority of participants strongly agreed that ex vivo simulation should be a mandatory first step in surgical training, which, in our opinion, is a loud call to action that adds empirical support for the integration of simulation approaches into rodent surgery training programs.
A balanced interpretation of these findings requires considering both the subjective nature of self-reported usefulness ratings and the differences detected between groups. The high enthusiasm among inexperienced participants is encouraging and underscores the strong receptivity to simulation-based training. In contrast, the slightly lower ratings from experts may reflect a more critical perspective or perhaps an acknowledgment of the simulator’s limitations in fully replicating the complexity and intricacies of real in vivo surgery. Although live animal models remain the gold standard for advanced surgical training (that is, microsurgical techniques), evidence supports that prior ex vivo practice effectively consolidates foundational surgical skills and enhances overall training outcomes.34
Limitations.
This study has several limitations to acknowledge. First, the number of participants (particularly in the expert group) was limited, which may constrain the generalizability of the statistical comparisons. Another limitation is that although the quality assessments were double-blinded and based on a standardized scoring rubric, they remain inherently subjective. To mitigate individual bias, 3 independent instructors evaluated each image, and the average score was used for analysis. While this approach improves reliability, the development and implementation of automated or algorithm-based scoring systems would optimize consistency and objectivity in future studies.
Another notable limitation is the inconsistency in training session formats. While some participants completed their tasks individually, others did it as part of a group setting. In such environments, the potential peer pressure may have influenced performance metrics. Prior research has demonstrated that stress can impact performance during surgical simulations,35 so peer-induced stress may have introduced variability in our outcomes. Future studies should consider standardizing the simulator training/testing environment, ideally, as individualized sessions, to better isolate true performance and eliminate social influences.
Finally, the conditional R2 values obtained, ranging from 0.572 to 0.911, indicate that a substantial portion of the variance in performance can be attributed to both fixed and random effects, including participant- and site-specific factors. Although this reflects the overall robustness of our statistical models, it also highlights the influence of variables unrelated to the simulator itself, which limits the ability to fully isolate its effect on training outcomes.
Conclusion.
Overall, this study supports the validity of the 3R-Mouse simulator as a reliable tool for early-stage rodent surgical training. High face validity ratings, significant performance improvements among novice users, and clear differentiation between skill levels collectively demonstrate the simulator’s educational effectiveness. Its repeatability, accessibility, and alignment with the 3Rs principles position it as a scalable and ethical alternative to traditional training approaches.
The simulator offers a repeatable, accessible, and low-risk environment for foundational surgical skill development, and its potential application in competency-based assessments addresses a critical need for standardization across training programs. Essential skills such as proper instrument handling, different basic suture patterns practice, and principles of aseptic techniques can be acquired and their competency assessed without the use of any animals.
The authors firmly believe that embracing advanced training methods, such as simulation, not only enhances and standardizes educational outcomes and user confidence but also plays a vital role in advancing the 3Rs principles.
As simulation technologies and synthetic models continue to evolve, they promise increasingly realistic and effective training experiences. Conducting well-designed studies to evaluate their validity and impact is essential to ensure their broader adoption and to shape a future where high-quality surgical training and animal welfare go hand in hand.

3R Mouse simulator. (A) Prototype used in the validation study presenting a sutured nitrile layer. (B) Simulator in use during a training session featuring the plastic cover used to recreate a standard rodent aseptic surgical environment.

Set of standardized materials used in the simulator training sessions across all sites.

Representative images used for standardized quality scoring. Example photography taken following the completion of the task. (A) Task 1. (B) Task 2. (C) Task 3.

Average total time (±SD) to complete all tasks, by group, across all iterations. *P < 0.05.

Task completion times by group, across iterations. (A) Average total time (±SD) to complete all tasks across 3 assessment points (iterations 1, 3, and 5) in the inexperienced group. *P < 0.05. (B) Average total time (±SD) to complete all tasks across 3 assessment points (iterations 1, 2, and 3) in the expert group. (C) Average time (±SD) to complete each task, by iteration, in the inexperienced group. *P < 0.05. (D) Average time (±SD) to complete each task, by iteration, in the expert group. *P < 0.05.

Average total quality score (±SD) combining both tasks, by group, across all iterations. *P < 0.05.

Quality score by group, across iterations. (A) Average total quality score (±SD) across 3 assessment points (iterations 1, 3, and 5) in the inexperienced group. *P < 0.05. (B) Average total quality score (±SD) across 3 assessment points (iterations 1, 2, and 3) in the expert group. *P < 0.05. (C) Average quality score (±SD) for each scored task, by iteration, in the inexperienced group. *P < 0.05. (D) Average quality score (±SD) for each scored task, by iteration, in the expert group. *P < 0.05.
Contributor Notes
