Writing survey and review papers is a tedious process requiring methodological and systematic collection and analysis of large quantities of scientific papers. Selection of relevant papers is a manual work that requires a lot of time. This paper proposes a framework based on natural language processing that aims to simplify the process of selection of relevant papers and identification of key properties. For a provided list of phrases, first it queries three digital libraries (i.e. Google Scholar, IEEE Xplore and Springer) and selects papers with at least one citation. Then it crawls all relevant information for the selected papers. Next, by searching for a list of predefined properties (i.e. keywords) and their synonyms obtained from Word Net, it processes the abstract. Note that to make the search more robust, stemming and lemmatization is applied. Finally, it aggregates the processed papers per year, property, citations, publisher, etc. and provides various charts and figures. This greatly simplifies the work of the researcher and allows to find relevant papers quickly that could be analyzed in more detail manually. Furthermore, it identifies publishers, conferences and journals that are relevant for a particular topic. The framework was evaluated with a use case from connected health and the results allowed identifying various trends very quickly.
taxonomy, crawling, word net, natural language processing