The advent of generative artificial intelligence (AI) models holds potential for aiding teachers in the generation of pedagogical materials. However, numerous knowledge gaps concerning the behavior of these models obfuscate the generation of research-informed guidance for their effective usage. Here, we assess trends in prompt specificity, variability, and weaknesses in foreign language teacher lesson plans generated by zero-shot prompting in ChatGPT. Iterating a series of prompts that increased in complexity, we found that output lesson plans were generally high quality, though additional context and specificity to a prompt did not guarantee a concomitant increase in quality. Additionally, we observed extreme cases of variability in outputs generated by the same prompt. In many cases, this variability reflected a conflict between outdated (e.g. reciting scripted dialogues) and more current research-based pedagogical practices (e.g. a focus on communication). These results suggest that the training of generative AI models on classic texts concerning pedagogical practices may bias generated content toward teaching practices that have been long refuted by research. Collectively, our results offer immediate translational implications for practicing and training foreign language teachers on the use of AI tools. More broadly, these findings highlight trends in generative AI output that have implications for the development of pedagogical materials across a diversity of content areas.