Web proceedings papers


Genc Hamzaj , Zamir Dika and Goce Armenski


Assessing and improving the quality of data through matching and linking personal records becomes very difficult task due to the very large volume and different data sources with different data structure. Due to the many existing algorithms for data matching and linking records between different data sources, the main focus of this paper is on algorithms that treat large amount of data such are Damerau – Levenshtein distance (DL) algorithm and Levenshtein distance (LV) algorithm. Through this paper we will conduct an experiment in two large datasets with more than 3 million records for comparing the performance and quality of the results of stated algorithms. We are also evaluating the data quality dimensions, as a prerequisite for the effective implementation of our measure-ments.


Data matching, Data linking, Distance Algorithms, Blocking Varia-bles.