Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

A.I. Black GuyMarch 15, 2023

0 1 1 minute read

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.

Source link

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Related

A.I. Black Guy

Leave a Reply Cancel reply

Project Mugetsu Legendary Orb Guide – Ultimate Reroll Item

WWE SuperCard QR Codes – 2023!

Bloodtide Secret Codes – Bunker, Vault, and Subway

Widgetable APK/iOS + MOD 1.4.030 (Premium) Download

Camp Buddy MOD APK/iOS v2.2.4 (Unlock All Characters)

Related

A.I. Black Guy

Destiny 2 Root of Nightmares loot table

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Related Articles

A Unifying Theory of Distance from Calibration

Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart

Bayesian AB Testing with Pyro. A primer in Bayesian thinking and AB… | by Fraser Brown | Nov, 2023

A novel family of auxiliary tasks based on the successor measure to improve the representations that deep reinforcement learning agents acquire

Leave a Reply Cancel reply