A vocabulary acquisition learning activity was designed and a learning system featuring image-to-text recognition technology to support the activity was developed. The effectiveness of the system with regard to facilitating vocabulary acquisition was tested. The perceptions of learners toward this tool and the affordances of the system for vocabulary acquisition were also explored. To this end, we designed an experiment in which 40 native speakers of Russian learning English as a foreign language from an elementary school participated. They were assigned to either a control condition or an experimental condition. All learners learned new vocabulary in class and then applied their new knowledge to contexts with a realistic simulation of the real world by completing a learning task. The learners in the control group used a traditional approach (e.g. the learners learned vocabulary from corresponding pictures in a textbook), whereas the learners in the experimental group used the proposed learning system (e.g. the learners learned vocabulary using the system). A pre-test–post-test/delayed post-test design was employed to test the effectiveness of the treatment on vocabulary acquisition. Learner perceptions and perceived affordances of the system for vocabulary acquisition were explored through a questionnaire survey and interviews. The quantitative results showed that the learners in the experimental group outperformed their counterparts on both the vocabulary post-test and delayed post-test. The qualitative results revealed that most learners in the experimental group had positive perceptions of the system. In addition, the qualitative results showed the three main categories of affordances. Based on these results, several suggestions and implications are provided for the teaching and research community.