Total: 1
We present a database of code-switched conversational human–machine dialog in English–Hindi and English–Spanish. We leveraged HALEF, an open-source standards-compliant cloud-based dialog system to capture audio and video of bilingual crowd workers as they interacted with the system. We designed conversational items with intra-sentential code-switched machine prompts, and examine its efficacy in eliciting code-switched speech in a total of over 700 dialogs. We analyze various characteristics of the code-switched corpus and discuss some considerations that should be taken into account while collecting and processing such data. Such a database can be leveraged for a wide range of potential applications, including automated processing, recognition and understanding of code-switched speech and language learning applications for new language learners.