Seq2Seq 笔记

前言

本文是Seq2Seq的最经典论文Sequence to Sequence Learning with Neural Networks的笔记和自己代码实践中的总结

为什么输入序列要反转?

https://stackoverflow.com/questions/51003992/why-do-we-reverse-input-when-feeding-in-seq2seq-model-in-tensorflow-tf-reverse

The idea is originated for machine translation (I’m not sure how it plays out in other domains, e.g. chatbots). Think of the following scenario (borrowed from the original paper). You want to translate,

1
A B C -> alpha beta gamma delta

In this setting, we have to go through the full source sequence (ABC) before starting to predict alpha, where the translator might have forgotten about A by then. But when you do this as,

1
C B A -> alpha beta gamma delta

You have a strong communication link from A to alpha, where A is “probably” related to alpha in the translation.

Note: This entirely depends on your translation task. If the target language is written in the reverse order of the source language (e.g. think of translating from subject-verb-object to object-verb-subject language) to , I think it’s better to keep the original order.

Decoder

Decoder部分其实是只解码一个字符,所以Decoder的输入是维度大小是
[batch_size, 1, embedding_dim], 也就是max_seq_len为1.

Teacher forcing

The words in the decoder are always generated one after another, with one per time-step. We always use for the first input to the decoder, $y_1$, but for subsequent inputs, $y_{t>1}$, we will sometimes use the actual, ground truth next word in the sequence, $y_t$ and sometimes use the word predicted by our decoder, $\hat{y}_{t-1}$. This is called teacher forcing.

目的就是为了保持训练和预测阶段能保持一致性

tensor([[ 2, 11, 4, 13, 3, 1, 1, 1, 1, 1, 1, 1],
[ 2, 11, 11, 7, 10, 9, 12, 13, 5, 3, 1, 1],