The dataset is based on the UM-Corpus, which is a Large English-Chinese Parallel Corpus for Statistical Machine Translation. It provides two million English-Chinese aligned corpus categorized into ...