Search

3 results

13 - Using Transformers with the Hugging Face Library
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 194-215
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

One of the key advantages of transformer networks is the ability to take a model that was pretrained over vast quantities of text and fine-tune it for the task at hand. Intuitively, this strategy allows transformer networks to achieve higher performance on smaller datasets by relying on statistics acquired at scale in an unsupervised way (e.g., through the masked language model training objective). To this end, in this chapter, we will use the Hugging Face library, which has a rich repository of datasets and pretrained models, as well as helper methods and classes that make it easy to target downstream tasks. Using pretrained transformer encoders, we will implement the two tasks that served as use cases in the previous chapters: text classification and part-of-speech tagging.

14 - Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 216-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapters 10 and 12, we focused on two common usages of recurrent neural networks and transformer networks: acceptors and transducers. In this chapter, we discuss a third architecture for both recurrent neural networks and transformer networks: encoder-decoder methods. We introduce three encoder-decoder architectures, which enable important NLP applications such as machine translation. In particular, we discuss the sequence-to-sequence method of Sutskever et al. (2014), which couples an encoder long short-term memory with a decoder long short-term memory. We follow this method with the approach of Bahdanau et al. (2015), which extends the previous decoder with an attention component, which produces a different encoding of the source text for each decoded word. Last, we introduce the complete encoder-decoder transformer network, which relies on three attention mechanisms: one within the encoder (which we discussed in Chapter 12), a similar one that operates over decoded words, and, importantly, an attention component that connects the input words with the decoded ones.

12 - Contextualized Embeddings and Transformer Networks
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 178-193
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As mentioned in Chapter 8, the distributional similarity algorithms discussed there conflate all senses of a word into a single numerical representation (or embedding). For example, the word bank receives a single representation, regardless of its financial (e.g., as in the bank gives out loans) or geological (e.g., bank of the river) sense. This chapter introduces a solution for this limitation in the form of a new neural architecture called transformer networks, which learns contextualized embeddings of words, which, as the name indicates, change depending on the context in which the words appear. That is, the word bank receives a different numerical representation for each of its instances in the two texts above because the contexts in which they occur are different. We also discuss several architectural choices that enabled the tremendous success of transformer networks: self attention, multiple heads, stacking of multiple layers, and subword tokenization, as well as how transformers can be pretrained on large amounts of data through through masked language modeling and next-sentence prediction.

Search Results

Refine search

Refine search

Actions for selected content:

3 results

13 - Using Transformers with the Hugging Face Library

Summary

14 - Encoder-Decoder Methods

Summary

12 - Contextualized Embeddings and Transformer Networks

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

3 results

13 - Using Transformers with the Hugging Face Library

Summary

14 - Encoder-Decoder Methods

Summary

12 - Contextualized Embeddings and Transformer Networks

Summary