All systems of scoring animal units (groups, farms, slaughter plants, etc) according to the level of the animals’ welfare are based inevitably on normative decisions. Similarly, all methods of labelling, in terms of acceptability, are based on choices reflecting ethical values. The evaluative dimension of scoring and labelling does not mean that we should reject them, but it does mean that we need to make the normative and ethical background explicit. The Welfare Quality® scoring system is used as a case study in order to highlight the role of underlying value-based decisions. In this scoring system, which was designed in accordance with assessments and judgments from experts in animal and social sciences and stakeholders, we identify value-based decisions at the following five levels. First, there are several definitions of animal welfare (eg hedonist, perfectionist, and preferentialist), and any welfare scoring system will reflect a focus upon one or other definition. In Welfare Quality®, 12 welfare criteria were defined, and the entire list of criteria was intended to cover relevant definitions of animal welfare. Second, two dimensions can structure an overall evaluation of animal welfare: the individual animals and the welfare criteria (here 12). Hence, a choice needs to be made between the aggregation of information at the individual level (which results in a proportion of animals from the unit in a good vs bad state) and the aggregation at criterion level (which results in a proportion of criteria to which the unit complies vs does not comply). Welfare Quality® opted for the second alternative to facilitate the provision of advice to farmers on solving the welfare problems associated with their farms. Third, one has to decide whether the overall welfare assessment should reflect the average state of the animals or give priority to worse-off animals. In the Welfare Quality® scoring system the worse-off animals are treated as much more important than the others, but all welfare problems, major or minor, count. Fourth, one has to decide whether good scores on certain criteria can compensate for bad scores on others. In the opinion of most people, welfare scores do not compensate each other. This was taken into account in the Welfare Quality® scoring system by using a specific operator instead of mere weighted sums. Finally, a scoring system may either reflect societal demands for high levels of welfare or be based on what can be achieved in practice — in other words, an absolute assessment or a relative one may be proposed. Welfare Quality® adopted an intermediate strategy: absolute limits between welfare categories (Not classified, Acceptable, Enhanced, or Excellent level of welfare) were set, but the rules governing the assignment of an animal unit to a category take into account what had been observed on European farms. The scientists behind Welfare Quality® are keen to make the value-based choices underlying assessments of animal welfare transparent. This is essential to allow stakeholder groups to understand the extent to which their views are acknowledged and acted upon.