Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder