Deep Learning
advanced

Transformer

Neural network architecture that uses self-attention mechanisms to process sequential data.

Detailed Explanation

Transformers are a type of neural network architecture introduced in the paper "Attention Is All You Need" (2017). Unlike previous sequence-to-sequence models that used recurrent or convolutional layers, transformers rely entirely on an attention mechanism to draw global dependencies between input and output. This architecture has become dominant in natural language processing tasks due to its ability to process all words in a sentence simultaneously (rather than sequentially), allowing for more parallelization and thus more efficient training on larger datasets.

Examples

  • BERT
  • GPT models
  • T5

Tags

attention mechanism
sequence modeling
parallelization

Category Information

Deep Learning

Neural network architectures with multiple layers