English mispronunciation detection module using a Transformer network integrated into a chatbot

Today it is crucial to have up-to-date information for companies to be more competitive in this business world. There are applications based on speech recognition that allows access to data stored in databases. However, the proper functioning of these applications lies in good pronunciation, a skill...

Full description

Saved in:
Bibliographic Details
Main Author: Sánchez Solís, Julia Patricia
Other Authors: Quezada, Marcos, Rivera Zarate, Gilberto, Florencia, Rogelio, Lopez Orozco, Francisco
Format: Artículo
Language:en_US
Published: 2022
Subjects:
Online Access:https://www.ijcopi.org/ojs/article/view/268
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today it is crucial to have up-to-date information for companies to be more competitive in this business world. There are applications based on speech recognition that allows access to data stored in databases. However, the proper functioning of these applications lies in good pronunciation, a skill that most people do not have. In this paper, the architecture of an English mispronunciation detection module integrated into a chatbot is proposed. It allows users to enter the audio of the phrases in which they want to evaluate their pronunciation. The output is the mispronounced words, thus helping the user to practice their English language pronunciation. The proposed architecture consists of an Automatic Speech Recognizer (ASR) model based on a Transformer network that converts the audio signal to text and an algorithm for string alignment that identifies mispronounced words using the Levenshtein distance. The Transformer network was trained using the LibriSpeech and L2-ARTIC datasets. The module was evaluated using the Accuracy metrics, reaching 90%, and the Character Error Rate metric, reaching 9.5%. Additionally, its performance was evaluated on a group of real users, showing promising results.