Total: 1
High-quality speech conversational datasets are essential for developing and evaluating Speech-LLMs. However, collecting real-world recordings presents significant challenges including high costs, privacy concerns, and inconsistent quality, while existing synthetic approaches often lack authenticity due to limited acoustic variety and insufficient paralinguistic information. We present SpeechDialogueFactory, a framework that addresses these limitations through a three-stage pipeline: generating comprehensive metadata, creating detailed scripts, and producing utterances enriched with paralinguistic features. Our framework retrieves speaker voices from a voice bank and leverages paralinguistic tags for expressive TTS. We also introduce an automated evaluation protocol that shows strong correlation with human assessments. Experimental results demonstrate that our synthesized dialogues achieve quality comparable to human recordings while offering greater flexibility and control.