11 May، 2022

Discussion of a master’s thesis in the College of Computer Science and Mathematics – Department of Computer Science entitled: (A Framework for Real-Time Big Data Analytics)

Discussion of a master’s thesis in the College of Computer Science and Mathematics – Department of Computer Science entitled: (A Framework for Real-Time Big Data Analytics)Discussed at the College of Computer Science and Mathematics at the University of Mosul on Tuesday 10-5-2021, the Master’s thesis:
(A framework for real-time big data analytics)
For the student Rana Abdel Ghafour Muhammad Taher and under the supervision of Prof. Dr. Duha Bashir Abdullah
The letter submitted by the student dealt with a proposal for a framework for analyzing big data in real time in order to support decision-making
Decision-making using modern technologies in processing flowing data. The importance of this work is due to the great role of big data analysis in real time in the current era and in various fields such as health, education, economics, cyber security, fraud detection and others.The study dealt with building a Periodic Task Model for real-time data processing, in addition to combining the power of big data tools represented by using several components within the Apache Spark Ecosystem and artificial intelligence techniques. Offline Data/Real-time Data. The goal of the first phase was to build a binary machine learning model to classify tweets according to their polarity into positive and negative. Several machine learning algorithms were tested using SparkMLlib, a machine learning library from Apache Spark And through the PySpark application interface, in addition to using a model based on the LSTM network to predict the future price of the stock based on the stock price data after the closing of the money session based on the historical data acquired from Yahoo Finance API.Choosing the Twitter platform as the source for the streaming data that will be processed using the automated model adopted in the first stage. The real-time processing of (tweets) is based on Spark Structured Streaming, a data flow processing engine based on Spark SQL. Where a pipeline was created to flow the actual tweets from the Twitter API to the local system to process the data as soon as it arrives in the form of an undefined table in memory without conducting a data storage process and giving the desired results to support the decision within the specified time period. In addition to representing the principle of parallel processing in the implementation of data analysis operations, which led to the achievement of a high use of computational power and the completion of a number of tasks during the same period of time.The study aims to use the concept of Structured Streaming and build a Periodic Task Model to analyze big data flowing in real time in addition to parallel processing within a framework that can be applied in different fields later to support decision-making.
The discussion committee was chaired by Assistant Professor Dr. Naglaa Badi Ibrahim and the membership of Assistant Professor Dr. Safwan Omar Hassoun and Assistant Professor Dr. Mashari Ayed Askar from Salah al-Din Governorate – Tikrit University, under the supervision and membership of Prof. Dr. Duha Bashir Abdullah. After the scientific discussion and the student’s defense of her thesis, the thesis was accepted and the researcher was awarded a master’s degree in computer science

Share

Share