Semantically tagging a corpus is useful for many intermediate NLP tasks such as: acquisition
of word argument structures in sublanguages; acquisition of syntactic disambiguation cues;
terminology learning; etc. The general idea is that semantic tags allow the generalization of
observed word patterns, and facilitate the discovery of recurrent sublanguage phenomena and
selectional rules of various types. Yet, as opposed to POS tags in morphology, there is no
consensus in the literature about the type and granularity of the semantic tags to be used. In
this paper, we argue that an appropriate selection of semantic tags should be domain-dependent. We propose a method by which we select from WordNet an inventory of semantic
tags that are ‘optimal’ for a given corpus, according to a scoring
function defined as a linear
combination of general and corpus-dependent performance factors. We believe that an optimal
selection of a category inventory is a necessary premise for obtaining better results in all
lexically learning algorithms that are based on, or concerned with, semantic categorization of
words. Furthermore, an adequate inventory (one which intuitively ‘fits’ with the semantics of
a domain, e.g. phenomenon for Natural Science, or part, piece for a technical handbook) may
facilitate the manual annotation of large corpora.