Abstract: Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multilayer perceptron ...