The emergence of large language models, exemplified by ChatGPT, has garnered growing attention for their potential to generate feedback in second language writing, particularly automated written corrective feedback (AWCF). In this study, we examined how prompt design – a generic prompt and two domain-specific prompts (zero-shot and one-shot) enriched with comprehensive domain knowledge about written corrective feedback (WCF) – influences ChatGPT’s ability to provide AWCF. The accuracy and coverage of ChatGPT’s feedback across these three prompts were benchmarked against Grammarly, a widely used traditional automated writing evaluation (AWE) tool. We find that ChatGPT’s ability in flagging language errors grew considerably with prompt sophistication driven by the integration of domain-specific knowledge and examples. While the generic prompt resulted in substantially lower performance than Grammarly, the zero-shot prompt achieved comparable results to it and the one-shot prompt surpassed it considerably in error detection. Notably, the most pronounced improvement in ChatGPT’s performance was observed in its detection of frequent error categories, including those of word choice or expression, direct translation, sentence structure and pronoun. Nonetheless, even with the most sophisticated prompt, ChatGPT still displayed certain limitations when compared to Grammarly. Our study has both theoretical and practical implications. Theoretically, it lends empirical evidence to Knoth et al.’s (2024) proposition to separate domain-specific AI literacy from generic AI literacy. Practically, it sheds light on the pedagogical application and technical development of AWE systems.