Total: 1
English names and expressions are frequently inserted into Swedish text. Humans intuitively adjust the degree of English pronunciation of such insertions. This work aims at a Swedish text-to-speech synthesis (TTS) capable of similar controlled adaptation. We focus on two key aspects: (1) the development of a TTS system with controllable degrees of perceived English-accentedness (PEA); and (2) the exploration of human preferences related to PEA. We trained a Swedish TTS voice on Swedish and English sentences with a conditioning parameter for language (English-accentedness, EA) on a scale from 0 to 1, and estimated a psychometric mapping of the perceived effect of EA to a perceptual scale (PEA) through perception tests. PEA was then used in Best-Worst listening tests presenting English insertions with varying PEA. The results confirm the effectiveness of the training and the PEA scale, and that listener preferences change with different insertions.