Conformer-2

December 12, 2023

Conformer-2 is an advanced AI model for automatic speech recognition (ASR), trained on 1.1 million hours of English audio. It outperforms its predecessor, Conformer-1, with improved recognition of proper nouns, alphanumerics, and noise robustness. Inspired by DeepMind's Chinchilla paper, Conformer-2 leverages sufficient training data for large language models. It adopts an ensemble technique, using multiple strong teachers to generate labels instead of relying on a single teacher model. This reduces variability and enhances performance on unseen data during training. Despite its larger size, Conformer-2 maintains faster processing, with up to 55% reduction in relative processing duration for all audio file lengths. In real-world applications, Conformer-2 demonstrates significant improvements in user-oriented metrics, achieving a 31.7% boost in alphanumerics recognition, 6.8% improvement in proper noun error rate, and 12.0% enhancement in noise robustness. As a valuable component for generative AI pipelines utilizing spoken data, Conformer-2 excels in providing accurate speech-to-text transcriptions.

Alternatives

No items found.