ラジコヲスキ カツペル パエル
2019年度共同研究者：Osamu Yoshie, Robert Nowak
研究成果概要：Automatic speech recognition (ASR) systems achieve high accuracy rates, depending on the methodology applied and da...Automatic speech recognition (ASR) systems achieve high accuracy rates, depending on the methodology applied and datasets used. The score decreases significantly when the same ASR system is being used with a non-native speaker of the language to be recognized. The main reason behind that is a specific pronunciation and accent features related to the mother tongue of such a speaker. At the same time, the limited volume of labeled non-native speech datasets makes it difficult to train sufficiently accurate ASR systems for non-native speakers, from the ground up. In the research we addressed the problem, using the dual supervised learning and style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it resembles the native speech to a higher extent. The publications cover experiments for the accent modification using different experimental setups and different approaches. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology can be used as a real-time wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity).