Abstract: The increasing ability of deep learning models to produce realistic-sounding synthetic speech poses serious problems for privacy, public trust, and digital security. To counter this danger, ...
Abstract: In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance ...
May, 2023: We have released demo for our audio large language model LTU (listen, think, and understand) that can do zero-shot audio classification and advanced reasoning. Try the online interactive ...