Attention Is All You Need - Summary
The paper proposes a new network architecture called Transformer that relies solely on attention mechanisms, dispensing with recurrence and convolutions entirely. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality while requirin