Utilizing Vision Large Language Models for Automatic Image Annotations: A Comparative Study

Ali Abd Almisreb; Tarik Namas; Ozge Buyukdagli; Alessandro Cantelli-Forti; Edmond Jajaga; Nurlaila Ismail

Web proceedings papers

Authors

Ali Abd Almisreb , Tarik Namas , Ozge Buyukdagli , Alessandro Cantelli-Forti , Edmond Jajaga and Nurlaila Ismail

Abstract

Image annotations can be a time-consuming task. This study looks into how well the OWLv2 and Grounding-DINO-Tiny models can annotate objects in four categories: airplanes, birds, drones, and helicopters. We revealed the preliminary results or findings as follows by comparing the confidence scores and the detection rate. The Grounding-DINO-Tiny model was quite successful, offering no empty frames and relatively high confidence scores most of the time for the distinguished categories such as the helicopter and drone. Still, it fared poorly in birds, having lower confidence scores or more annotations with a value less than 50% which signifies the model’s weakness in identifying birds. The proposed model, OWLv2, had fairly moderate outcomes and the quality of data differed from one category to the other which undermined the reliability of the model. For the enhancement of the future performance, there are several recommendations that we make; these include; improving the ability to identify birds, eliminating inconsistency in the datasets, and improving on the quality of the data gathered.

Keywords

image annotation, OWLv2, Grounding-DINO-Tiny.

Innovations

Utilizing Vision Large Language Models for Automatic Image Annotations: A Comparative Study

Authors

Abstract

Keywords

Download

Export citation

Conferences