Web proceedings papers

Authors

Andreja Naumoski , Georgina Mirceva and Kosta Mitreski

Abstract

Rule induction algorithms have been always appealing for the machine learning scientists because the models are in a human understandable form (IFTHEN rules). However, as any machine learning algorithm, they consist of several parts, and one of them is the heuristic metric that influence on the accuracy of the model and thus directly contributes to the quality of the knowledge discovery process. In this direction, the paper aims to inspect the influence of different similarity metrics using particular rule induction algorithm applied in the field of life sciences. The results obtained from the classification algorithm are evaluated using standardized method for error estimation of classification accuracy. According to the experimental results, the set of metrics (Manhattan, Euclidean, Squared Euclidean and Sorensen) proved to be important players in the process of improving the model accuracy for these two life sciences datasets. Additionally, we present some of the obtained multi-target rule models for both datasets. In future, we plan to investigate the influence of these metrics on different machine learning tasks, as well as implementing other metrics for further improving this rule induction algorithm.

Keywords

Predictive Clustering Rules Life Sciences Classification Accuracy.