Real-Time Big Data Analytics for Data Stream Challenges: An Overview

##plugins.themes.bootstrap3.article.main##

  •   Alaa Abdelraheem Hassan

  •   Tarig Mohammed Hassan

Abstract

The conventional approach of evaluating massive data is inappropriate for real-time analysis; therefore, analysing big data in a data stream remains a critical issue for numerous applications. It is critical in real-time big data analytics to process data at the point where they are arriving at a quick reaction and good decision making, necessitating the development of a novel architecture that allows for real-time processing at high speed and low latency. Processing and anlayzing a data stream in real-time is critical for a variety of applications; however, handling a large amount of data from a variety of sources, such as sensor networks, web traffic, social media, video streams, and other sources, is a considerable difficulty. The main goal of this paper is to give an overview of the current architecture for real time big data analytics, real-time data stream processing methods available, including their system architectures Lambda, kappa, and delta large data stream processing.


Keywords: Apache spark, Apache storm, Delta, Hadoop, Kappa, Lambda

References

Laney D. 3D data management: Controlling data volume, velocity and variety. META group research note. 2001; 6(70): 1.

Gantz J, Reinsel D. Extracting value from chaos. IDC iview. 2011; 1142(2011): 1-2.

Hariri RH, Fredericks EM, Bowers KM. Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data. 2019; 6(1): 1-6.

Data IB, Hub A. Extracting business value from the 4 V's of big data. 2016; 19: 2017.

Snow D. Dwaine Snow's Thoughts on Databases and Data Management. 2012.

Gandomi A, Haider M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management. 2015; 35(2): 137-44.

Pokorný J, Škoda P, Zelinka I, Bednárek D, Zavoral F, Kruliš M, et al. Big data movement: a challenge in data processing. InBig Data in Complex Systems 2015: 29-69.

Chen M, Mao S, Liu Y. Big data: A survey. Mobile Networks and Applications. 2014; 19(2): 171-209.

Bakshi K. Considerations for big data: Architecture and approach. In2012 IEEE aerospace conference 2012: 1-7.

Elgendy N, Elragal A. Big data analytics: a literature review paper. InIndustrial conference on data mining 2014: 214-227.

Plattner H, Zeier A. In-memory data management: technology and applications. Springer Science & Business Media; 2012.

Watson HJ. Tutorial: Big data analytics: Concepts, technologies, and applications. Communications of the Association for Information Systems. 2014; 34(1): 65.

Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, et al. Visual analytics for the big data era—A comparative review of state-of-the-art commercial systems. In2012 IEEE Conference on Visual Analytics Science and Technology (VAST) 2012: 173-182.

Elgendy N. Big Data Analytics in Support of the Decision Making Process. M. S. Thesis. German University. 2013.

He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, et al. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In2011 IEEE 27th International Conference on Data Engineering 2011: 1199-1208.

Tallat R, Latif RM, Ali G, Zaheer AN, Farhan M, Shah SU. Visualization and Analytics of Biological Data by Using Different Tools and Techniques. In2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) 2019: 291-303.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung Byers A. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. 2011.

Shen Z, Wei J, Sundaresan N, Ma KL. Visual analysis of massive web session data. InIEEE symposium on large data analysis and visualization (LDAV) 2012: 65-72.

Mohamed S, Ismail O, Hogan O. Data equity: Unlocking the value of big data. London, UK: Centre for Economics and Business Research. 2012.

Unit EI. The deciding factor: Big data & decision making. Capgemini Reports. 2012: 1-24.

Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E. Deep learning applications and challenges in big data analytics. Journal of Big Data. 2015; 2(1): 1-21.

Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing. 2016; 2016(1): 1-6.

Kaur N, Singh G. A Review Paper on Data Mining And Big Data. International Journal of Advanced Research in Computer Science. 2017; 8(4).

Rao JN, Ramesh M. A Review on Data Mining & Big Data. Machine Learning Techniques. Int. J. Recent Technol. Eng. 2019; 7: 914-6.

Tseng FM, Hu YC. Comparing four bankruptcy prediction models: Logit, quadratic interval logit, neural and fuzzy neural networks. Expert Systems with Applications. 2010 Mar 15; 37(3): 1846-53.

Ratra R, Gulia P. Big data tools and techniques: A roadmap for predictive analytics. International Journal of Engineering and Advanced Technology (IJEAT). 2019; 9(2): 4986-92.

Marcu OC, Costan A, Antoniu G, Pérez-Hernández M, Tudoran R, Bortoli S, et al. Storage and Ingestion Systems in Support of Stream Processing: A Survey. Ph. D. Thesis. INRIA Rennes-Bretagne Atlantique and University of Rennes 1.

Etzion O, Niblett P. Event Processing in Action, Stamford.

Linington PF, Milosevic Z, Tanaka A, Vallecillo A. Building enterprise systems with ODP: an introduction to open distributed processing. CRC Press; 2011.

Luckham DC. Event processing for business: organizing the real-time enterprise. John Wiley & Sons; 2011.

Milosevic Z, Chen W, Berry A, Rabhi FA, Buyya R, Calheiros RN, Dastjerdi AV. Real-time analytics. Big Data: Principles and Paradigms. 2016: 39-61.

Murphy BM, O'Driscoll C, Boylan GB, Lightbody G, Marnane WP. Stream computing for biomedical signal processing: A QRS complex detection case-study. In2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2015: 5928-5931.

Brown PC. Architecting Complex-Event Processing Solutions with TIBCO®. Addison-Wesley; 2013.

Margara A, Cugola G, Tamburrelli G. Learning from the past: automated rule generation for complex event processing. InProceedings of the 8th ACM international conference on distributed event-based systems 2014: 47-58.

Ellis B. Real-time analytics: Techniques to analyze and visualize streaming data. John Wiley & Sons; 2014.

Chrysos G, Papapetrou O, Pnevmatikatos D, Dollas A, Garofalakis M. Data stream statistics over sliding windows: How to summarize 150 Million updates per second on a single node. In2019 29th International Conference on Field Programmable Logic and Applications (FPL) 2019: 278-285.

Braud RE. Query-based debugging of distributed systems. University of California, San Diego; 2010.

Traub J, Grulich PM, Cuéllar AR, Breß S, Katsifodimos A, Rabl T, et al. Efficient Window Aggregation with General Stream Slicing. InEDBT 2019; 19: 97-108.

Grimaila MR, Myers J, Mills RF, Peterson G. Design and analysis of a dynamically configured log-based distributed security event detection methodology. The Journal of Defense Modeling and Simulation. 2012; 9(3): 219-41.

Chen W, Rabhi FA. Enabling user-driven rule management in event data analysis. Information Systems Frontiers. 2016; 18(3): 511-28.

Lassinantti J, Ståhlbröst A, Runardotter M. Relevant social groups for open data use and engagement. Government Information Quarterly. 2019; 36(1): 98-111.

Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: Fault-tolerant streaming computation at scale. InProceedings of the twenty-fourth ACM symposium on operating systems principles 2013: 423-438.

Landset S, Khoshgoftaar TM, Richter AN, Hasanin T. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data. 2015; 2(1): 1-36.

Ounacer S, Talhaoui MA, Ardchir S, Daif A, Azouazi M. A new architecture for real time data stream processing. International Journal of Advanced Computer Science and Applications. 2017; 8(11): 44-51.

Rahman H, Begum S, Ahmed MU. Ins and outs of big data: A review. InInternational Conference on IoT Technologies for HealthCare 2016: 44-51.

Mohammed EA, Far BH, Naugler C. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData mining. 2014; 7(1): 1-23.

Khezr SN, Navimipour NJ. MapReduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. Journal of Grid Computing. 2017; 15(3): 295-321.

Lakhe B, Lakhe. Practical Hadoop Migration. Berkeley: Apress; 2016.

Grover P, Kar AK. Big data analytics: A review on theoretical contributions and tools used in literature. Global Journal of Flexible Systems Management. 2017; 18(3): 203-29.

Kreps J. Questioning the lambda architecture. Online article, July. 2014; 205.

Salloum S, Dautov R, Chen X, Peng PX, Huang JZ. Big data analytics on Apache Spark. International Journal of Data Science and Analytics. 2016; 1(3): 145-64.

##plugins.themes.bootstrap3.article.details##

How to Cite
Hassan, A. A., & Hassan, . T. M. . (2022). Real-Time Big Data Analytics for Data Stream Challenges: An Overview. European Journal of Information Technologies and Computer Science, 2(4), 1–6. https://doi.org/10.24018/compute.2022.2.4.62