Julian's Polymath Explorations

Associative Memory and Attention Function in Machine Learning - Key, Value & Query

The fascinating but terse 2017 paper on Machine Translation “ Attention Is All You Need ” , by Vaswani et al, has generated a lot of interest – and plenty of head-scratching to digest it! As far as I can tell, the Attention Function , with its Key , Value and Query inputs, is one of the obstacles in wrapping one's head around the Deep-Learning " Transformer " architecture presented in that paper. In brief: whereas an RNN (Recursive Neural Network) processes inputs sequentially, and uses each one to update an internal state, the Transformer architecture takes in all inputs at once, and makes extensive use of “Key/Value/Query memory spaces” (called “Multi-Head Attention” in the paper.) Advantages include faster speed, and overcoming the problem of “remembering early inputs” that affects RNN’s. That Attention Function , and related concepts, is what I will address in this blog entry. I will NOT discuss anything else about the Transformer architecture... but at the

Julian's Polymath Explorations

Search This Blog

Posts

Associative Memory and Attention Function in Machine Learning - Key, Value & Query