Article contents
A new approach for textual feature selection based on N-composite isolated labels
Published online by Cambridge University Press: 29 April 2019
Abstract
Textual Feature Selection (TFS) aims to extract relevant parts or segments from text as being the most relevant ones w.r.t. the information it expresses. The selected features are useful for automatic indexing, summarization, document categorization, knowledge discovery, so on. Regarding the huge amount of electronic textual data daily published, many challenges related to the semantic aspect as well as the processing efficiency are addressed. In this paper, we propose a new approach for TFS based on Formal Concept Analysis background. Mainly, we propose to extract textual features by exploring the regularities in a formal context where isolated points exist. We introduce the notion of N-composite isolated points as a set of N words to be considered as a unique textual feature. We show that a reduced value of N (between 1 and 3) allows extracting significant textual features compared with existing approaches even for non-completely covering an initial formal context.
- Type
- Article
- Information
- Copyright
- © Cambridge University Press 2019
References
- 1
- Cited by