Web proceedings papers

Authors

Zirije Hasani , Margita Kon-Popovska and Goran Velinov

Abstract

Defining the environment for analyzing streamed big data in real time is not an easy task. There are many architecture proposals for real time big data analytic, but the most interesting one for our problem is Lambda Architecture. In this paper we are presenting motivation for developing such architecture, how it works and our practical work for implementing it. Lambda Architecture is comprised by three layers batch, speed and serving layer. Thus far we have implemented the batch layer employing Hadoop framework. We also briefly review the other two layers in order to implement them in the next phase of our work, where for serving and speed layer we conclude that Storm is the best choice. Practical example demonstrates the analytical process in Hadoop for analyzing Wikipedia text data.

Keywords

Hadoop, Lambda Architecture, text data, Storm.