12 September، 2022

PH.D Dissertation on “Gesture Recognition Using Heterogeneous Computing”

A PH.D dissertation was discussed in Department of Computer Engineering / College of Engineering at University of Mosul entitled “Gesture Recognition Using Heterogeneous Computing” submitted by (Mohammed haqi ismaeel) on Sunday, Sep.11, 2022.The study aims to develop a model for detecting signers in a video stream for Arabic sign language and to classify the signs among static, dynamic, and non-sign. The first step in preprocessing the data is extracting keypoints of poses in video frames using the MediaPipe library. By MediaPipe, the pose detection algorithm first detects the location of the human in the video using a CNN. And then, a secondary CNN was used to predict the human keypoints. The best architecture design model used for Sign Detection of Arabic sign language classifies the signs among static, dynamic, and non-sign with training accuracy of 100% and test accuracy of 99%. The model used a limited number of keypoints selected from the body pose and a special normalization approach to extract features. The model included two bidirectional layers using Gated Recurrent Unit (GRU) for both forward and backward layers and ended with the fully connected layer for classification using SoftMax.
For Arabic static sign language recognition, the pre-trained models were used and trained on prepared Arabic Sign Language data. These models were used after some modification. Also, an attempt has been made to adopt two models from the previously trained models, where they are trained in Parallel Deep Feature Extractions. Then they are combined and prepared for the classification stage. The results demonstrate the comparison between the performance of the single model and the multi-model. It appears that most multi-models are better in feature extraction and classification than single models.The results also show that depending on the total number of Incorrect Recognize sign image in the training, validation, and testing dataset, the best CNN model in feature extraction and classification of Arabic sign language is the DenseNet121 for a single model and DenseNet121 & VGG16 for multi-model.
For Arabic dynamic sign language recognition, four deep neural network models were proposed using 2D and 3D CNN to cover all feature extraction methods and then passing these features to the RNN for sequence classification. Long Short-Term Memory (LSTM) and GRU are two types of using Recurrent Neural Network (RNN). The study also included evaluation fusion techniques for various kinds of multiple models. The experiment results show the optimal multi-model for the dynamic dataset of the Arabic sign language recognition achieved 100% accuracy.
A set of models used in the detection and recognition of sign language gestures are proposed to increase the speed of inference by the concept of optimization using the TensorRT inference accelerator tool. This is done using different devices: Laptop, Jetson Xavier Nx, and Cloud Computing. When TF-TRT Integration optimization is applied, the speed of Inference increases about 3x-11x for static sign language recognition models and about 1x-3x for dynamic sign recognition models. While when using optimization by applying TensorRT (TensorRT C++ API), the speed of Inference increases about 14x-110x for static sign language recognition models and about 2x- 10x for dynamic sign recognition models.

Share

Share