Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-26T05:50:24.612Z Has data issue: false hasContentIssue false

Experiments with Language-based Aids in Information Retrieval Systems

Published online by Cambridge University Press:  22 December 2008

Tove Fjeldvig
Affiliation:
Statens Datasentral a.s., Ulvenveien 89 B, N-0581 Oslo 5, Norway.
Anne Golden
Affiliation:
Institutt for norsk som fremmedspråk, Universitetet i Oslo, Blindern, N-0316, Oslo 3, Norway.
Get access

Abstract

The fact that a lexeme can appear in various forms causes problems in information retrieval. As a solution to this problem, we have developed methods for automatic root lemmatization, automatic truncation and automatic splitting of compound words. All the methods have as their basis a set of rules which contain information regarding inflected and derived forms of words – and not a dictionary. The methods have been tested on several collections of texts, and have produced very good results. By controlled experiments in text retrieval, we have studied the effects on search results. These results show that both the method of automatic root lemmatization and the method of automatic truncation make a considerable improvement on search quality. The experiments with splitting of compound words did not give quite the same improvement, however, but all the same this experiment showed that such a method could contribute to a richer and more complete search request.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1988

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Fjeldvig, T. & Golden, A. 1984. Automatisk rotlemmatisering — et lingvistisk hjelpemiddel for tekstsøking. CompLex 9/84. Oslo: Universitetsforlaget.Google Scholar
Fjeldvig, T. 1986. Tekstsøking — teori, metoder og systemer. Oslo: Universitetsforlaget.Google Scholar
Fjeldvig, T. & Golden, A. 1986. Automatisk splitting av sammensatte ord — et lingvistisk hjelpemiddel for tekstsøking. In Karisson, F. (1986: 7382).Google Scholar
Fjeldvig, T. 1987. Effektivisering av tekstsøkesystemer. Utvikling av språkbaserte metoder. CompLex 13/87. Oslo: Universitetsforlaget.Google Scholar
Gavare, R. 1979. Automatisk lemmatisering utan stamlexikon — Några synspunkter tio år efteråt. In Maegaard, B. (1979: 123131).Google Scholar
Hellberg, S. 1971. Automatisk lemmatisering — En modell för upprättande av böjningsserier i ett frekvenslexikon. Gøteborg: Språkdata.Google Scholar
Karisson, F. (ed.) 1986. Papers from the 5th Scandinavian Conference of Computational Linguistics. Helsinki: University of Helsinki, Department of General Linguistics.Google Scholar
Källgren, G. 1985. En algoritm för delning av sammansatta ord i svenskan. Institutionen för lingvistik. Stockholm: Stockholms Universitet.Google Scholar
Maegaard, B. 1979. Nordiske datalingvistikdage i København 6–10. oktober 1979. Foredrag. København: Institut for anvendt og mateinatisk lingvistik, Københavns Universitet 1979.Google Scholar
Munthe, S. K. M. 1972. Sammensatte ord. En kvantitativ undersøkelse av norsk litteratur og sakprosa. Hovedfagsoppgave ved Nordisk institutt, Universitetet i Bergen og Oslo.Google Scholar
Niedermair, G. T., Thurmair, G. & Büttel, I. 1984. MARS —A Retrieval Tool on the Basis of Morphological Analysis. In van Rijsbergen (1984: 369382).Google Scholar
Salton, G. 1968. Automatic Information Organization and Retrieval. McGraw- Hill Computer Series.Google Scholar
van, Rijsbergen C. J. 1984. Research and Development in Information Retrieval. Proceedings of the Third Joint BCS and ACM Symposium King's College, Cambridge 2–6 07 1984. British Computer Society Workshop Series.Google Scholar