Automatic speech recognition (ASR) systems became an important part of our lives and are used by millions of people. However, scientists still try to improve their accuracy using many different techniques. In this paper, we focus on the influence the training set size has on the performance of Hidden Markov Model (HMM) based digit recognition system in Macedonian. The experiments are conducted using dataset consisting of 3093 samples divided in several different-sized training sets and one test set. Additionally, the behavior of several classification techniques was evaluated for the same issue. The best result was 19.9% error rate for 1500 samples in the training set using HMM based ASR system. This indicates that for this particular problem using the specified dataset the ideal number of samples for the training set is around 1500.
automatic speech recognition training set size Hidden Markov Model Word Error Rate