Search

3 results

15 - Implementing Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 229-245
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we implement a machine translation application as an example of an encoder-decoder task. In particular, we build on pretrained encoder-decoder transformer models, which exist in the Hugging Face library for a wide variety of language pairs. We first show how to use one of these models out-of-the-box to perform translation for one of the language pairs it has been exposed to during pretraining: English to Romanian. Afterward, we fine-tune the model to a new language combination that is has not seen before: Romanian to English. In both use cases, we use the T5 encoder-decoder model, which has been pretrained for several tasks, including machine translation.

14 - Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 216-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapters 10 and 12, we focused on two common usages of recurrent neural networks and transformer networks: acceptors and transducers. In this chapter, we discuss a third architecture for both recurrent neural networks and transformer networks: encoder-decoder methods. We introduce three encoder-decoder architectures, which enable important NLP applications such as machine translation. In particular, we discuss the sequence-to-sequence method of Sutskever et al. (2014), which couples an encoder long short-term memory with a decoder long short-term memory. We follow this method with the approach of Bahdanau et al. (2015), which extends the previous decoder with an attention component, which produces a different encoding of the source text for each decoded word. Last, we introduce the complete encoder-decoder transformer network, which relies on three attention mechanisms: one within the encoder (which we discussed in Chapter 12), a similar one that operates over decoded words, and, importantly, an attention component that connects the input words with the decoded ones.

15 - Deep Learning
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 494-517
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

NN models with more hidden layers than the traditional NN are referred to as deep neural network (DNN) or deep learning (DL) models, which are now widely used in environmental science. For image data, the convolutional neural network (CNN) has been developed, where in convolutional layers, a neuron is only connected to a small patch of neurons in the preceding layer, thereby greatly reducing the number of model weights. Popular architectures of DNN include the encoder-decoder and U-net models. For time series modelling, the long short-term memory (LSTM) network and temporal convolutional network have been developed. Generative adversarial network (GAN) produces highly realistic fake data.