Introduction: Improving public access and training for epinephrine auto-injectors (EAIs) can reduce time to initial treatment in anaphylaxis. Effective use of EAIs by the public requires bystanders to respond in a timely and proficient manner. We wished to examine optimal methods for assessing effective training and skill retention for public use of EAIs, including the use of microskills lists. Methods: In this prospective, stratified randomized study, 154 participants at 15 sites receiving installation of public EAIs were randomized to one of three experimental education interventions: A) didactic poster (POS) teaching; B) poster with video teaching (VID), and C) Poster, video, and simulation training (SIM). Participants were tested by participation in a standardized simulated anaphylaxis scenario at 0-months, immediately following training, and again at follow-up at 3 months. Participants’ responses were videoed and assessed by two blinded raters using microksills checklists. The microskills lists were derived from the best available evidence and interprofessional process mapping using a skills trainer. The interobserver reliability was assessed for each item in a 14 step microskill checklist composed of 3-point and 5-point Likert scale questions around EpiPen use, expressed as Kappa Values. Results: Overall there was poor agreement between the two raters. Being composed or panicked had the highest level of agreement K = 0.7, but a result that did not reach statistical significance (substantial agreement, p = 0.06) calling for EMS support has the second highest level of agreement, K = 0.6 (moderate agreement, p = 0.01), the remainder of the items had very low to moderate agreement with a Kappa value range of -103 to 0.48. Conclusion: Although microskills chesklists have been shown to identify areas where learners and interprofessional teams require deliberate practice, these results support previously published evidence that the use of microskills checklists to assess skills has poor reproducibility. Performance will be further assessed in this study using global rating scales, which have shown higher levels of agreement in other studies.