I also encountered this problem. It seems that you used the 'Use emotion vector' mode. Maybe you did not provide a voice reference, which acts as a timbre for the generation result in this mode. As ...