TCParaGSE: Target-Controllable Parallel Generative Speech Enhancement

Abstract

Speech enhancement (SE) aims to recover high-quality speech from degraded recordings corrupted by noise, reverberation, bandwidth limitation, and so on. Existing SE methods usually assume a fixed clean-speech target and tend to remove all degradation components uniformly. However, not all acoustic interferences are necessarily harmful in real-world communication. For example, reverberation may provide useful spatial cues and perceptual naturalness, while current SE systems still lack accurate control over whether such characteristics should be removed or preserved. To address this problem, we propose TCParaGSE, a target-controllable parallel generative SE framework that performs controllable enhancement in the discrete codec-token space. To improve the efficiency of codec-token-based enhancement, TCParaGSE adopts a group-vector-quantization-based neural codec to produce independent token groups for parallel prediction. Given degraded speech, the codec extracts grouped acoustic tokens, while the spectral feature extraction module provides global conditioning. An explicit control signal specifies the desired enhancement target and is injected into the prediction network to guide parallel prediction branches to predict target tokens, which are then decoded into the enhanced waveform. We further construct a low-latency version by causalizing the main temporal modeling components. Experimental results show that TCParaGSE achieves competitive performance for clean-speech enhancement and clear advantages for reverberation-preserving enhancement, especially in perceptual quality and subjective MOS evaluation. Efficiency analysis further demonstrates that the proposed parallel prediction strategy significantly reduces inference time.

Audio Demo

Choose a task code and compare Input, Target, baseline systems, and TCParaGSE side by side.

NNoise RReverberation BBandwidth limitation CClean speech
Selected control task

N→C

Remove noise to obtain clean speech.

Input
N
Target
C
Controldenoise
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

R→C

Remove reverberation to obtain clean speech.

Input
R
Target
C
Controldereverb
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

B→C

Perform BWE to obtain clean speech.

Input
B
Target
C
ControlBWE
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

N+B→C

Remove noise and perform BWE.

Input
NB
Target
C
Controldenoise + BWE
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

N+R→C

Remove both noise and reverberation.

Input
NR
Target
C
Controldenoise + dereverb
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

N+R→R

Remove noise while preserving reverberation.

Input
NR
Target
R
Controldenoise; preserve reverb
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

R+B→R

Perform BWE while preserving reverberation.

Input
RB
Target
R
ControlBWE; preserve reverb
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

R+B→C

Remove reverberation and perform BWE.

Input
RB
Target
C
Controldereverb + BWE
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

N+R+B→R

Remove noise and perform BWE while preserving reverberation.

Input
NRB
Target
R
Controldenoise + BWE; preserve reverb
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Selected control task

N+R+B→C

Remove all degradations to obtain clean speech.

Input
NRB
Target
C
Controldenoise + dereverb + BWE
ExampleInputTargetDEMUCSCMGANMP-SENetUDSETCParaGSE Ours
Example01
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example02
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example03
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example04
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours
Example05
Input
Target
DEMUCS
CMGAN
MP-SENet
UDSE
TCParaGSE Ours