In K–12 education, Automatic Speech Recognition (ASR) is increasingly being applied to support learning, engagement, and accessibility. However, developing ASR systems for children continues to poses monumental challenges, especially considering the stringent requirement resulting from various use cases in children’s education settings. Moreover, representative speech datasets from these populations are particularly scarce, while data collection is further complicated by ethical considerations such as privacy and informed consent.