论文标题
Joeys2t:与Joeynmt的简约演讲到文本建模
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
论文作者
论文摘要
Joeys2t是用于语音到文本任务的Joeynmt扩展,例如自动语音识别和端到端的语音翻译。它继承了Joeynmt的核心理念,Joeynmt是一种极简主义的NMT工具包,建立在Pytorch上,寻求简单性和可访问性。 Joeys2t的工作流程是独立的,从数据预处理开始,超过模型培训和预测到评估,并无缝集成到Joeynmt的紧凑而简单的代码库中。除了Joeynmt的最先进的变压器编码器架构架构之外,Joeys2t还提供了面向语音的组件,例如卷积层,规格,CTC-LOSS和WER评估。尽管与先前的实施相比,Joeys2T具有简单性,但在英语语音识别和英语到德语语音翻译基准方面竞争性地表现。该实现伴随着漫游教程,并在https://github.com/may-/joeys2t上找到。
JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.