Comparison of the selected techniques for matching and linking large datasets of personal records from multiple resources

Genc Hamzaj; Zamir Dika; Goce Armenski

Web proceedings papers

Authors

Genc Hamzaj , Zamir Dika and Goce Armenski

Abstract

Assessing and improving the quality of data through matching and linking personal records becomes very difficult task due to the very large volume and different data sources with different data structure. Due to the many existing algorithms for data matching and linking records between different data sources, the main focus of this paper is on algorithms that treat large amount of data such are Damerau – Levenshtein distance (DL) algorithm and Levenshtein distance (LV) algorithm. Through this paper we will conduct an experiment in two large datasets with more than 3 million records for comparing the performance and quality of the results of stated algorithms. We are also evaluating the data quality dimensions, as a prerequisite for the effective implementation of our measure-ments.

Keywords

Data matching, Data linking, Distance Algorithms, Blocking Varia-bles.

Innovations

Comparison of the selected techniques for matching and linking large datasets of personal records from multiple resources

Authors

Abstract

Keywords

Download

Export citation

Conferences