Abstract
Hand gesture recognition forms a great difficulty for computer vision especially in dynamics. Sign language has been significant and an interesting application field of dynamic hand gesture recognition system. The recognition of human hands formed an- extremely complicated mission. The solution for such a difficulty requires a robust hand tracking method which depends on an effective feature and classifier. This paper presents a novel, fast and simple method for dynamic hand gesture recognition based on two lines (hundred) of features extracted from two rows of a Real-Time video. Feature selections have been used for hand shape representation to recognize the dynamic word for Kurdish Sign Language. The features extracted in real time from pre-processed hand object were represented through the optimization values of binary captured frame. Finally, an Artificial Neural Network classifier is used to recognize the performed hand gestures by 80% for training and 20% for testing with success 98%.
Introduction
The deaf and dumb people became isolated from the world, and normal people have difficulty in learning the Sign Language. The recognition of sign language has been introduced not only for deaf and dumb people but also as a tool for normal people to interact with them. The aim of this paper is to reduce the distance between the deaf and dumb people and normal people by introducing a Real-Time Dynamic Hand Gesture Recognition System (RTDHGRS) which allows the user to understand the meaning of Kurdish Sign Language. The Computer is used in communication to capture, pre-process, recognize and finally classify the hand gesture in real-time.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
There are two types of gestures namely static gestures and dynamic gestures. These gestures are shown by finger spelling. Several researchers have been working in the domain of Human Computer Interaction (HCI), the main objective of HCI is to create a simple visual interface for producing a natural method of communication between humans and computer. The visual interface is created by using the hand gestures and head movements [1].
S. K. Yewale and P. K. Bharne (2011) introduced a gesture recognition system using Neural Networks. Webcam used for capturing input image at slow rate samples between 15-25 frames per second. Some preprocessing have been made on the input image which converts the input image into sequence of (x, y) coordinates using MATLAB, two methods are used, edge detection or skin color, then passed into Feed Forward Neural Network classifier, in which it will classify the gesture into one of several classed predefined classes which can be identified by the system [2].
P. Neto, et al. (2013) recognized the hand gesture patterns by using Artificial Neural Networks (ANNs) for controlling an industrial robot. In the continuous gesture recognition, the communicative gestures appear (with an explicit meaning) with the non communicative (transition gestures, emotional expressions, idling motion, etc.), P. Neto used a new architecture with two ANNs in series to recognize both kinds of gestures. A data glove (Cyber Glove) is used as an interface technology that depends on a magnetic tracking system that can measure precise body motion, but at the same time, they are very sensitive to magnetic noise, expensive and need to be attached to the human body. The system is proposed architecture with two ANNs in series to recognize communicative and non-communicative gestures. Transitions between gestures are analyzed and a solution based on ANNs is proposed to deal with them. The system uses a real-time, when a user performs a gesture he/she wants to have a response/reaction from the robot with a minimum delay. This takes us to the choice for static gestures rather than dynamic gestures [3].
V.T.GAIKWAD and M.M.SARDESHMUKH (2014) used an artificial neural network (ANN) to classify the EMG signal that was extracted from (A, B, C, D and E) characters represented by deaf and dumb people. A back-propagation (BP) network has been used for classification of the EMG signs, as it works well for different bio signal. The features like auto regression coefficient, fast Fourier transform, short time Fourier transform, wavelet transform coefficient, root mean square, mean absolute value, Mean frequency, variance, standard deviation, zero crossing, and slope sign change are selected to train the neural network [4].
S. C. Mesbahi, et al.(2018) presented a method for hand gesture recognition using background subtraction and convexity defect. First, the background subtraction is used to remove the useless information. The contour segmentation for hand images is used then calculates the convex hull and convexity defect and fingertips for this contour image. The feature extraction applies to determine the significance of a given hand gesture. The features must be able to distinguish the hand gesture. The researchers tested five hand gestures classes from one to five [5].
Sign Language Recognition is one of the important research domains in Computer Vision. The difficulty encountered by the researchers was that the instances of signs varied with both motions and shapes [6]. There are different sign languages in the world. The sign language used in any region of the world depends on the education and spoken language of that society [7]. Examples are American Sign Language (ASL), British Sign Language (BSL), Japanese Sign Language[8], Chinese Sign Language [9], Italian Sign Language [10], Korean sign languages, Arabic Sign Language (ArSL)[11], Kurdish sign language (KurdSL) that is used by the deaf and dumb community in Kurdistan.
The main goal of RTDHGRS is to build a novel system capable of selecting the features of dynamic hand gesture for KurdSL, recognizing the correct meaning of hand gestures. Each gesture uses sign language to characterize its requirements with fewer features so as to reduce the time of classification.
The methodology is explained in Section 2 and describes the prepared dataset. Section 3 presents the experimental results and the discussion of the results. Finally, in section 4, the last section puts forward the conclusion and the suggested future works.
Methodology
The proposed methodology comprises four stages, namely Image Capturing, Pre-Processing, Feature Selection and Hand Classifier. The structural design of the proposed RTDHGRS. This diagram is used for training and testing. On training, the captured images are stored in the data image, while the features are stored in training dataset vectors. This is on one hand. On the other hand, the testing image is passed through the RTDHGRS. It is then classified by ANN depending on two lines of features that compared with stored features in training data set vectors. Each line consists of fifty features that are then saved in the training data set vectors.
The data images are collected from a Real-Time video. The data images are stored in a Bitmap Image File Format (.bmp) just to display an image for the declaration, while the features are storied as vectors consisting of hundred bits or pixels in training dataset vectors.
KurdSL has been developed so that the deaf people in the society can interact with the normal people through RTDHGRS without any difficulties.
The dynamic words captured from the video by two frames 16 and 30. Ten words were considered for training from 10 persons. Consequently, the total was 200 images because every input image captured at frame 16 and 30, these images used for training. The hundred features were extracted from every image to be used as an input to MLP Feed Forward ANN. In the end, the data set consisted of 200 vectors; and each line contained a hundred values representing 200 images. The tested image is passed through all stages to compare with training data set vector then classified.
For reliable convincing results and client independence, training involves more than one person[12], while recognition finds the best identical matching data set features as those extracted from an input image[13]. RTDHGRS testing captures images from video in real time then processes it and classifies it.
A. Image Capturing
Real-Time hand gesture recognition system needs a good quality Webcam like HP Pavilion dv6 used for capturing the data images from video with resolution 640x480 at 30 fps. A single camera is mostly used to capture an image from a sequence of images (video). Other systems need more than one camera to extract more information from hand gesture image[14].
In dynamic hand gesture, the hand is moved from one direction to another direction to represent the word. As such, it is enough to capture frame 16 and 30 to extract the features from these two frames.
B. Pre-Processing
This stage requires some pre-processing steps to guarantee the successful extraction of hand features. After extracting a hand image shape from frame 16 and 30, the next phase of a pre-processing stage involves separating the hand shape from the background. Many algorithms are given in [15] to extract the hand from an input image. The Region of Interest (ROI) is a 2D area presented in the video image that is activated when the user presses the start button. The hand gesture must appear inside ROI(c), (d). The first five mille second are used for capturing 640x480 input images for background. The other twenty-five mille seconds are used for capturing the hand image inside ROI. Then, subtraction between the ROI background and ROI hand gesture is applied. An input image hand is converted into a gray scale image that has a gray level intensity ranging from 0 to 255. Then the threshold is applied to the same image these steps.
C. Feature extractions
An Image is captured in RGB Color space format and is resized through Image capturing and Pre-Processing to a binary image with size 399x199. In this stage, the binary image is resized again to 50x50 by extracting the hand image only. Resizing is required to decrease the computational time, memory space as well as the number of inputs for Neural Network (NN) because the input of NN depends on the number of pixels in the image row.
The feature selection depends on trial and error method to select the best line that is used for recognition after testing images, the best result will be for line 13 and 38. The dynamic models of feature selections. every hand image contains fifty rows, each row represent the shape of the fingers. The image is divided into two regions R1 for the first region then R2 for the second region, each region contain twenty-five rows. Row 13 represents the middle line of R1 (upper shape of the hand), row 38 represents the middle line of R2 (the lower shape of hand). The selection of row 13 and 38 are enough to recognize the hand gesture.
The hand features extracted from a Black White (BW) image of size 50x50 pixels. The features of hand image (dynamic hand image) are stored in training dataset vector then are used to match them with features of the testing image by ANN. The features extracted from the dynamic hand image must be clear and unique for each gesture. In every dynamic hand image, there are many images between the trajectories of movement. Extracting features from many images is a very hard task. Accordingly, it is enough to extract the features from the two important movements. for the words (butterfly and photocopy). These features are enough to recognize the meaning of dynamic hand gestures.
D. Hand Classifier
This represents the final stage. However, it depends on a hundred of features that extracted from a real-time dynamic hand gesture (dynamic testing image) to be classified by using a neural network. The dynamic training image (200) passes through MLP Feed Forward Neural Network to find the weight of every image after training then saving it in training dataset vectors (200 vectors each one consist of 100 values), ANN is fed to the input with a hundred neurons and one hidden layer with fifty neurons and ten output neurons. The dynamic testing image is passed through the trained MLP Feed Forward ANN to find the identical weight. The features of the dynamic testing image (100 values) are compared with features that were saved in dynamic dataset vector to classify it. This stage usually has a classifier that can attach each input testing gesture (dynamic testing image) matching with its class.
The classification represents the task of matching the features (two lines) of an input image by using ANN with their classes (that are saved in training dataset vector) to write the meaning of dynamic hand gesture. Every twenty vectors represent one class (from one to twenty class1, from twenty-one to forty class2... class10). It also uses voice to read this image of the dynamic hand gesture. The user can translate the hand movement to four languages Kurdish, English, Arabic and Turkish.
The inputs of ANN are hundred neurons that represent the two extracted rows from BW image (50x50) which are called Features, there are ten outputs, each output has its classes, these classes recognized by MLP Feed Forward ANN.
The neurons for output and hidden are calculated by the equation, this equation is used to calculate the sum of products of a neuron: [16] Where A is a set of neurons in a layer, Oj is the output of neuron j, Wji represents the connection weight between neurons j and i. Equation 2 represents the Sigmoid function which is used in MLP outputs of neurons goes to all neurons in the next layer until the final outputs of the network will be generated.
In a dynamic hand gesture, there are many frames that represent the hand movement. The main hand shape is represented in frame 16 and 30. Other frames are between these two captured images. The input of the network will be the hundred features of BW image for frame 16 or 30 depending on the last image captured in frame 30 perhaps following the same steps. If it is different, the frame 30 will be like frame 16. The testing dynamic image will pass through all the stages then is classified by the ANN to recognize the hand.
Concluson
Recently, many researchers have achieved success in the field of Computer Vision and HCI. Hand gestures form a more powerful method of communication for hearing impaired persons when they communicate with the normal people. Normal people find little difficulty in recognizing the meaning of sign language articulated by the hearing impaired, it is usual to have an interpreter for the translation of sign language. To overcome this difficulty, an intelligent hand gesture recognition system is prepared. It translates the sign language into a text in more than one language and uses voice to read the translation.
The main difficulty of dynamic hand gesture recognition is represented by the complexity of the feature selection and classification algorithms, especially when using huge dimensional feature vectors to distinguish the meaning of hand gestures. Accordingly, the development of faster and more precise classification methods by using a novel extracting features (hundred) has reduced the saving place and proved the importance of such methods to run such systems in real-time with low time recognition. Table 2 explains a comparison between this work and other researchers, all researchers used ANN.
This paper presents a RTDHGRS to track and recognize ten words for KurdSL. The Dynamic Hand Gesture Recognition System is based on image capturing, hand pre-processing and tracking, and finally recognizing the gesture from extracted hand features. It is possible to extend this work by dynamic image recognition of KurdSL to recognize the head and body of the deaf in real time with better accuracy.
References
- S. Nagarajan and T. Subashini, 'Static hand gesture recognition for sign language alphabets using edge oriented histogram and multi class SVM,' International Journal of Computer Applications, vol. 82, 2013.
- S. K. Yewale and P. K. Bharne, 'Artificial neural network approach for hand gesture recognition,' International journal of engineering Science and technology, vol. 3, 2011.
- P. Neto, et al., 'Real-time and continuous hand gesture spotting: an approach based on artificial neural networks,' in Robotics and Automation (ICRA), 2013 IEEE International Conference on, 2013, pp. 178-183.
- V.T.GAIKWAD and M.M.SARDESHMUKH, 'SIGN LANGUAGE RECOGNITION BASED ON ELECTROMYOGRAPHY (EMG) SIGNAL USING ARTIFICIAL NEURAL NETWORK ' International Journal of Industrial Electronics and Electrical Engineering, ISSN: 2347-6982, vol. 2, 2014.
- S. C. Mesbahi, et al., 'Hand gesture recognition based on convexity approach and background subtraction,' IEEE International Conference on Intelligent Systems and Computer Vision (ISCV), INSPEC Accession Number: 17751508, DOI: 10.1109/ISACV.2018.8354074, 2018.
- J. Singha and K. Das, 'Recognition of Indian sign language in live video,' arXiv preprint arXiv:1306.1301, 2013.
- N. V. Tavari, et al., 'A review of literature on hand gesture recognition for Indian Sign Language,' International Journal, vol. 1, 2013.
- R. L. Thompson, et al., 'The eyes don’t point: Understanding language universals through person marking in American Signed Language,' Lingua, vol. 137, pp. 219-229, 2013.
- W. Gao, et al., 'A Chinese sign language recognition system based on SOFM/SRN/HMM,' Pattern Recognition, vol. 37, pp. 2389-2402, 2004.
- V. Lombardo, et al., 'A virtual interpreter for the Italian sign language,' in International Conference on Intelligent Virtual Agents, 2010, pp. 201-207.
- D. Dahmani and S. Larabi, 'User-independent system for sign language finger spelling recognition,' Journal of Visual Communication and Image Representation, vol. 25, pp. 1240-1250, 2014.
- J. LaViola, 'A survey of hand posture and gesture recognition techniques and technology,' Brown University, Providence, RI, vol. 29, 1999.
- T. Messer, 'Static hand gesture recognition,' University of Fribourg, Switzerland, 2009.
- H. Zhou, et al., 'Static hand gesture recognition based on local orientation histogram feature distribution model,' in Computer Vision and Pattern Recognition Workshop, 2004. CVPRW'04. Conference on, 2004, pp. 161-161.
- V. Bhame, et al., 'Vision Based Calculator for Speech and Hearing Impaired using Hand Gesture Recognition,' International Journal of Engineering Research & Technology (IJERT), vol. 3, 2014.
- R. Rezvani, et al., 'A new method for hardware design of multi-layer perceptron neural networks with online training,' in Cognitive Informatics & Cognitive Computing (ICCI* CC), 2012 IEEE 11th International Conference on, 2012, pp. 527-534.
- T. H. H. Maung, 'Real-time hand tracking and gesture recognition system using neural networks,' World Academy of Science, Engineering and Technology, vol. 50, pp. 466-470, 2009.
- R. P. Anetha, 'Hand Talk-A Sign Language Recognition Based On Accelerometer and SEMG Data,' International Journal of Innovative Research in Computer and Communication Engineering vol. 2, 2014.
- M. Maraqa, et al., 'Recognition of Arabic sign language (ArSL) using recurrent neural networks,' Journal of Intelligent Learning Systems and Applications, vol. 4, p. 41, 2012.
- Y. Guo, et al., 'Flexible Neural Trees for Online Hand Gesture Recognition using surface Electromyography,' JCP, vol. 7, pp. 1099-1103, 2012.