Web proceedings papers

Authors

Vesna Gega , Ilija Kumbaroski , Ivan Chorbev and Danco Davcev

Abstract

This paper describes the use of a highly structured document collection in XML (IMDb) for simulating retrieval of non-overlapped XML multimedia objects. In our experiments conducted in Data Centric Ad Hoc Task, we inspect two types of dynamically formed units of retrieval, combining them with different models for searching and ranking. Our purpose is to find the combination that leads to most effective retrieval of XML multimedia objects from very complex documents. We concluded that retrieval is more effective using the Dirichlet Smoothing as Language Model rather than the Pivoted Cosine as Vector Space Model. Also, better scores are achieved using the textual content of the whole document because it supplies more terms and better identification of the multimedia object compared to the textual content contained in smaller document's parts as the root's first level descendants.

Keywords

XML, INEX, IMDb, Data Centric Track, Ad Hoc Task, relevant, retrieval