Deformable models have been studied in image analysis over the last decade and used for recognition of flexible or rigid templates under diverse viewing conditions. This article addresses the question of how to define a deformable model for a real-time color vision system for mobile robot navigation. Instead of receiving the detailed model definition from the user, the algorithm extracts and learns the information from each object automatically. How well a model represents the template that exists in the image is measured by an energy function. Its minimum corresponds to the model that best fits with the image and it is found by a genetic algorithm that handles the model deformation. At a later stage, if there is symbolic information inside the object, it is extracted and interpreted using a neural network. The resulting perception module has been integrated successfully in a complex navigation system. Various experimental results in real environments are presented in this article, showing the effectiveness and capacity of the system.