Analyzing huge amount of data are a big challenge. On one hand we are faced with the problem of storing a large amount of data, and on another to process it in a reasonable or even real time. Real time analytics can be defined as the capacity to use all available enterprise data and sources in the moment they arrive or happen in the system. In this paper, we present an infrastructure that we have implemented in order to analyze data from big log files in real time. The main components of the infrastructure are Redis, Logstash, Elasticsearch and Kibana. Redis is used for temporary buffering of the log data, Logstash utilizes different filters to manipulate and analyze the data, Elasticsearch is used for indexing and storing the data and Kibana is a user interface used to visualize the results. We explore implementation of several filters in order to post-process the log information and produce various statistics that suit our needs in analyzing log files containing SQL queries from a big national system in education. The post-processing of the SQL queries is mainly focused on preparing the log information in adequate format and information extraction. The purpose of the analysis is to monitor performance and detect unusual behavior in order to alert or prevent possible unwanted activities, or to develop (in future) triggers that can indicate or even prevent possible problems in real time.
Big data, log data, real time processing, Redis, Logstash, Elasticsearch, Kibana.