Distributional Hypothesis and Representation Learning

Mihai Surdeanu; Marco Antonio Valenzuela-Escárcega

doi:10.1017/9781009026222.009

8 - Distributional Hypothesis and Representation Learning

Published online by Cambridge University Press: 01 February 2024

Mihai Surdeanu and

Marco Antonio Valenzuela-Escárcega

Show author details

Mihai Surdeanu: Affiliation:
University of Arizona
Marco Antonio Valenzuela-Escárcega: Affiliation:
University of Arizona

Book contents

Get access

Summary

All the algorithms we covered so far rely on handcrafted features that must be designed and implemented by the machine learning developer. This is problematic for two reasons. First, designing such features can be a complicated endeavor. Second, most words in any language tend to be very infrequent. In our context, this means that most words are very sparse, and our text classification algorithm trained on word-occurrence features may generalize poorly. For example, if the training data for a review classification dataset contains the word great but not the word fantastic, a learning algorithm trained on these data will not be able to properly handle reviews containing the latter word, even though there is a clear semantic similarity between the two. In this chapter, we will begin to addresses this limitation. In particular, we will discuss methods that learn numerical representations of words that capture some semantic knowledge. Under these representations, similar words such as great and fantastic will have similar forms, which will improve the generalization capability of our machine learning algorithms.

Keywords

distributional hypothesis representation learning word2vec low-rank approximation

Information

Type: Chapter
Information: Deep Learning for Natural Language Processing
A Gentle Introduction
, pp. 117 - 131

DOI: https://doi.org/10.1017/9781009026222.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2024

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.