| Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions

on affective computing 5(4), 377–390 (2014) 2, 5
Wang, K., Wu, Q., Song, L., Yang, Z., Wu, W., Qian, C., He, R., Qiao, Y., Loy,
C.C.: Mead: A large-scale audio-visual dataset for emotional talking-face generation.
In: European Conference on Computer Vision. pp. 700–717. Springer (2020)