Joeys2t：与Joeynmt的简约演讲到文本建模

论文标题

Joeys2t：与Joeynmt的简约演讲到文本建模

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

论文作者

Ohta, Mayumi, Kreutzer, Julia, Riezler, Stefan

论文摘要

Joeys2t是用于语音到文本任务的Joeynmt扩展，例如自动语音识别和端到端的语音翻译。它继承了Joeynmt的核心理念，Joeynmt是一种极简主义的NMT工具包，建立在Pytorch上，寻求简单性和可访问性。 Joeys2t的工作流程是独立的，从数据预处理开始，超过模型培训和预测到评估，并无缝集成到Joeynmt的紧凑而简单的代码库中。除了Joeynmt的最先进的变压器编码器架构架构之外，Joeys2t还提供了面向语音的组件，例如卷积层，规格，CTC-LOSS和WER评估。尽管与先前的实施相比，Joeys2T具有简单性，但在英语语音识别和英语到德语语音翻译基准方面竞争性地表现。该实现伴随着漫游教程，并在https://github.com/may-/joeys2t上找到。

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.

下载PDF全文

下载文献需遵守相关版权规定

论文标题