Optimizing AI Text-to-Speech in the Development of Interactive Chinese Learning Media: A Collaborative Pedagogical Approach
Rizky Wardhani (a*), Nuryansyah Adijaya (b*), Rendy Aditya(c*), Icha Fais Nurul Karimah (d*), Erfan Agus Munif (e*), Syafa Aditya (f*), Aisyah Luthfiyyah (g*), Lusi (h*)

(a,c,d,f,g,h) Universitas Negeri Jakarta
(b) Universitas Borobudur
(e) Balai Guru dan Tenaga Kependidikan DKI Jakarta


Abstract

Language learning with mastery of tonal variations (shengdiao) is a major challenge in learning Chinese. The lack of access to native speakers causes learners to lack a consistent pronunciation model. In response to this problem, this study offers a strategic solution through the optimization of Neural Network-based AI Text-to-Speech (TTS) technology (TTSmaker) which is able to produce natural, emotional, and near-standard sound synthesis (Putonghua). The novelty of this research lies in the integration between teachers and learners as well as the tools of TTSmaker which are pedagogical tools to ensure linguistic and sound accuracy. This research uses a Research and Development (R&D) approach with the ADDIE (Analysis, Design, Development, Implementation, Evaluation) model. The evaluation of product effectiveness at the trial stage was analyzed using the N-Gain test. The target output of this research is in the form of a prototype of interactive audio learning media. Through this study, learners are facilitated to conduct independent learning flexibly, such as shadowing (imitation) and dictation (Tingxie) techniques. This innovative solution is expected to significantly improve vocabulary retention, tonal accuracy, and listening competence of learners in the digital era.

Keywords: AI Text-to-Voice, TTSmaker, Chinese, Learning Media, ADDIE.

Topic: AI for Learning

ICTL 2026 Conference | Conference Management System