模型:

TransQuest/monotransquest-da-multilingual

任务:

文本分类

类库:

PyTorch Transformers

语言:

multilingual-multilingual

其他:

xlm-roberta Quality Estimation monotransquest da Quality+Estimation

许可:

apache-2.0

模型介绍文件清单

英文

TransQuest: 基于跨语言Transformer的翻译质量估计

翻译质量估计（QE）的目标是在没有参考翻译的情况下评估翻译的质量。高精度的QE对于许多商业翻译工作流来说是一个缺失的组成部分，因为它们具有许多潜在的用途。它们可以用于在多个翻译引擎可用时选择最佳翻译，或者通知终端用户关于自动翻译内容的可靠性。此外，QE系统可用于决定在给定上下文中是否可以直接发布翻译，或者是否需要人工后编辑或人工重新翻译。质量估计可以在不同的层面上进行：文档级、句子级和单词级。

我们通过TransQuest开源了我们在翻译质量估计方面的研究成果，该成果还在句子级直接评估质量估计共享任务中取得了 WMT 2020 的胜利。TransQuest在表现上超过了当前开源质量估计框架如 OpenKiwi 和 DeepQuest 。

特点

句子级翻译质量估计在两方面进行：预测后编辑工作和直接评估。
单词级翻译质量估计能够预测源单词、目标单词和目标间隙的质量。
在所有实验的语言中，表现优于当前最先进的质量估计方法，如DeepQuest和OpenKiwi。
提供了十五种语言对的预训练质量估计模型，可以在 HuggingFace. 中获取。

安装

通过pip安装

pip install transquest

通过源码进行安装

git clone https://github.com/TharinduDR/TransQuest.git
cd TransQuest
pip install -r requirements.txt

使用预训练模型

import torch
from transquest.algo.sentence_level.monotransquest.run_model import MonoTransQuestModel


model = MonoTransQuestModel("xlmroberta", "TransQuest/monotransquest-da-multilingual", num_labels=1, use_cuda=torch.cuda.is_available())
predictions, raw_outputs = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
print(predictions)

文档说明

有关更多详细信息，请参阅文档。

Installation - 使用pip在本地安装TransQuest。

架构 - 查看在TransQuest中实现的架构

Sentence-level Architectures - 我们发布了两种架构；MonoTransQuest和SiameseTransQuest，用于执行句子级质量估计。

Word-level Architecture - 我们发布了MicroTransQuest用于执行单词级质量估计。

示例 - 我们提供了如何在最近的WMT质量估计共享任务中使用TransQuest的几个示例

Sentence-level Examples

Word-level Examples

预训练模型 - 我们为十五种语言对提供了预训练的质量估计模型，涵盖句子级和单词级

Sentence-level Models

Word-level Models

Contact - 如果在使用TransQuest过程中有任何问题，请联系我们

引用

如果您使用单词级架构，请考虑引用本文，该文已被 ACL 2021 接受。

@InProceedings{ranasinghe2021,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {An Exploratory Analysis of Multilingual Word Level Quality Estimation with Cross-Lingual Transformers},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
year = {2021}
}

如果您使用句子级架构，请考虑引用以下文章，这些文章在 COLING 2020 和 WMT 2020 的EMNLP 2020会议上发表。

@InProceedings{transquest:2020a,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest: Translation Quality Estimation with Cross-lingual Transformers},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
year = {2020}
}

@InProceedings{transquest:2020b,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest at WMT2020: Sentence-Level Direct Assessment},
booktitle = {Proceedings of the Fifth Conference on Machine Translation},
year = {2020}
}

作者:

TransQuest

数据集大小:

6.27 GB