Springer papers


Daniel Spasovski , Goran Pesanski , Gjorgji Madjarov and Dejan Gjorgjevikj


Over five decades the scientists attempt to design machine that clearly transcripts the spoken words. Even though satisfactory accuracy is achieved, machines cannot recognize every voice, in any environment, from any speaker. In this paper we tackle the problem of robustness of Automatic Speech Recognition for isolated Macedonian speech in noisy environments. The goal is to exceed the problem of background noise type changing. Five different types of noise were artificially added to the audio recordings and the models were trained and evaluated for each one. The worst case scenario for the speech recognition systems turned out to be the babble noise, which in the higher levels of noise reaches 81.10% error rate. It is shown that as the noise increases the error rate is also increased and the model trained with clean speech, gives considerably better results in lower noise levels.


speech recognition robustness isolated speech signal-to-noise ratio background noise