A Pilot Study On Sign Language Detection

This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Cite this essay cite-image


People having physical limitations such as speech and hearing impairment are often unable to convey their message properly, which leads to them being left out in many aspects of life. To help those people express themselves in a better and easier way we have developed the sign language detection application. We have developed a translator that takes hand gestures and input and give the equivalent alphabet as output, which will help those people to communicate. Convolutional neural network was used for image recognition and classification in order to detect the hand from other objects in the screen and classify the sign represented by the hand gesture at any given time, thus enabling us to translate the signs into English alphabets.

Thus, this application can be used by the specially abled people to communicate with others in a more efficient and hassle freeway.The tools used were anaconda, python, Opencv, tensorflow, matplotlib, numpy, convolutional neural networks.

Save your time!
We can take care of your essay
  • Proper editing and formatting
  • Free revision, title page, and bibliography
  • Flexible prices and money-back guarantee
Place an order


There are many of us who are born in a different way and who might have difficulties learning our language due to some complications. For such people, a different method of expression has been developed where only hand gestures are used in other to express their thoughts. It is known as sign language. Sign language is the primary language used by people with impaired hearing and speech. People use sign language gestures as a means of non-verbal communication to express their thoughts and emotions. But non-signers find it extremely difficult to understand, hence trained sign language interpreters are needed during medical and legal appointments, educational and training sessions[1].

According to the latest census, there are 5 million deaf and hearing impaired people who use sign language on a daily basis to express themselves. But the problem with sign language is it is not easily understood by people who have not studied it and hence there is a communication barrier among those people and the impaired people. Thus, the people who use sign language as their primary language face various problems in their day to day life as they are not able to effectively communicate with majority of the population.

Helping such people who can only use sign language to communicate with the rest of the world, without the other person having to learn sign language, is what motivated us into building the sign language detection application, which can be used by the disabled people to translate their signs into the English alphabet, thus making communication with people who do not understand sign language simpler and hassle free.

The methodology used by us is using convolutional neural networks in order to categorize the images to distinguish the hand from other objects and then classify the gesture made by the hand. Gesture or so called sign languages are used as the tool of communication between human to human interactions. Such communications are bound to a specific set of protocols or symbols - the Indian Sign Language (ISL) or the American Sign Language (ASL).The aim of the research is to illustrate the extraction of unique features from each symbol or gesture in reference to an existing data set (ISL or ASL) and then train the machine with the obtained feature vectors using standard classification models.The main objective is to recognize the hand separate from the other objects in the screen, then to classify the gesture that is being made by the hand at any given time, and to output the mapped English alphabet of that gesture on the screen[2].

In sign language recognition where the motion of the hand and its location in consecutive frames is a key feature in the classification of different signs, a fixed reference point must be chosen. The hand’s contour was chosen to obtain information on the shape of the hand and also used the hand’s center of gravity (COG) as the reference point which alleviated the bias and applied as other reference points. After defining the reference point, the distance between all the different points of a contour respect to the COG of the hand were estimated. The location of the tip of the hand was easily extracted by extracting the local maximum of the distance vector[3].

The importance of the end result is that it helps the disabled people convey their message to people who do not understand sign language by converting the gestures of the sign language into the English alphabet, thus empowering them to communicate with other people in a more efficient way and allows them to express themselves better. This application can be used by both disabled people and also by the people whose relative, customer, business partner or any other associate is specially abled and can use this application to communicate with them in a better way without learning sign language themselves. Thus it forms a bridge between such people and allows the people using english language and the people using sign language to communicate with each other without any difficulties.


Neural networks is a machine learning technique which can be modelled after the brain structure. Neural network comprises of learning unit called neurons. These neurons learn how to convert input signals into corresponding output signals,forming the basis of automated recognition. In this paper, the methodology which is opted by establishing a convolution neural network (CNN, or ConvNet) is a type of feed-forward artificial neural network in which the connectivity pattern between its neuron is inspired by the organization of the animal visual cortex.

The first step in image processing is detection of skin color pixels. The aim is to extract the skin (face and hand) from the rest part on image and after that to extract only hand. Detection of the skin is very popular and useful technique for detection and monitoring of human body parts.The main aim of the detector of skin is to form a decision rule that should separate the pixels of the skin from those pixels that do not belong to the skin. Identification of the pixels who have skin color involves finding a range of values for which the largest number of skin color pixels fall to a certain color format, minimizing the classification error of pixels that do not belong to the skin. But in this paper, the skin detection is not used to see whether without detection of skin color pixel, detection of hand gesture is possible or not[4].

The Data set which is used in this paper is American Sign Language[5] which consist of 26 english letters and 3 additional signs which indicates “Space”, “Del” and “Nothing”. For training, 3000 images are used for each sign . [6] First, a video of a sign language demonstration is sampled and concatenated into an image. A video camera is used with standard specifications to acquire the images. The videos are then converted into frames, cropped in order to select a specific region and then an algorithm is implemented on it. After that, the image becomes the input of the convolutional neural network (CNN).It uses only 2D images, cheap camera can be used which is the main advantage of the proposed method.

In this paper, the process through the recognition of the hand gesture is described in the above flow chart. As described above, at first, the training data come into the network and go out as the probability for the selecting the candidates.

The sign language actions are learned through CNN,which is known to have strong performance for image classification problems. The network in this paper consists of four convolution layers and two full-connect layers.The main building block of a convolutional neural network is convolution layer.It comprises of a set independent filters[7].The CNN has the 4 concolutional layers with activation function ReLU (Rectitified Linear Unit) which controls how the signal from one layer to the next, emulating how neurons are fired in our brain. A pooling layer is another building block of a CNN. The function of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network.Pooling layer operates on each feature map independently. There is mean pooling and max pooling. The most common approach is the max pooling in which maximum of a region taken as its representative. Fully connected are the last layers in the network which means that the neurons of preceding layers are connected to every neuron in subsequent layers[8].

In this paper, the inception v3 model of the Tensor Flow library has been used. It is a huge image classification model with millions of parameters that can differentiate a large number of kinds of images. The final layer of the network is trained only. Inception-v3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012 where it reached a top-5 error rate of as low as 3.46%. The pre-trained Inception v3 model is downloaded(trained on ImageNet Dataset consisting of 1000 classes). The final layer on the dataset is trained by adding a new final layer corresponding to the number of categories.

In the first approach, we extracted spatial features for individual frames using inception model (CNN) .Each video (a sequence of frames) was then represented by a sequence of predictions made by CNN for each of the individual frames. The prediction is done under an idle (white) background. First, the frames are extracted from the video sequences of each gesture and then the noise from the frames are removed and a region of interest is created in the form of a rectangular box to extract more relevant features from the frame.

ROI segmentation is used where a wide image is created at first by sampling and concatenating the original video frames [9]. And then by using the network that detects the hand area, the ROI segmented hand area is obtained. The second step is the sign language learning, where the ROI segment image is the input of the classification network, and the input data comes out as probability vectors. By using this, the sign is determined. The dataset used here is in different situations taken from a 1m distance in various combinations of backgrounds, clothes etc. The success rate was calculated to be 84% without ROI and 97% with ROI segmentation in 1m tests. The frames are input to the CNN model and train the model as described above and the testing of the model is done in the idle or white background.


We have been successful in building an application that can translate sign language into the English alphabet with an acceptable accuracy. We have succeeded in doing so by using convolutional neural network to categorize the images and distinguish the hand from other objects present in the image and classify the sign represented by the image and we have successfully mapped the sign language gestures to the English alphabet and displayed the equivalent alphabet of the sign on the monitor. The application is ready for use for communication purposes of disabled people to enable them to express themselves in a better way and interact with people who does not understand sign language.


  1. Lihong Zheng, Bin Liang, and Ailian Jiang, “Recent Advances in Deep Learning for Sign Language Recognition”.
  2. Himadri Nath Saha, Sayan Tapadar, Shinjini Ray, Suhrid Krishna Chatterjee and Sudipta Saha , “A Machine Learning Based Approach for hand Gesture Recognition using Distinctive Feature Extraction”.
  3. Ashok K Sahoo, Gouri Sankar Mishra and Kiran Kumar Ravulakollu, “SIGN LANGUAGE RECOGNITION: STATE OF THE ART”, VOL. 9, NO. 2, February 2014 ARPN Journal of Engineering and Applied Sciences.
  4. Marko Z. Šušiü, Saša Z. Maksimoviü, Sofija S. Spasojeviü and Željko M. Ĉuroviü, “Recognition and Classification of Deaf Signs using Neural Networks”,11th Symposium on Neural Network Applications in Electrical Engineering,NEUREL-2012 Faculty of Electrical Engineering,University of Belgrade,Serbia,September20- 22,2012
  5. https://www.kaggle.com/grassknoted/asl-alphabet
  6. Purva A. Nanivadekar and Dr. Vaishali Kulkarni, “Indian Sign Language Recognition: Database Creation, Hand Tracking and Segmentation “, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applicants (CSCITA).
  7. Yangho Ji, Sunmok Kim, and Ki-Baek Lee , “Sign Language Learning System with Image Sampling and Convolutional Neural Network”, 2017 First IEEE International Conference on Robotic Computing
  8. Marlon Oliveira, Houssem Chatbri, Suzanne Little, Noel E. O'Connor, and Alistair Sutherland , “A comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition”.
  9. Sunmok Kim, Yangho Ji, and Ki-Baek Lee, “An effective sign language learning with object detection based ROI segmentation”, 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAl) August 19-22, 2016 at Sofitel Xian on Renmin Square, Xian, China
Make sure you submit a unique essay

Our writers will provide you with an essay sample written from scratch: any topic, any deadline, any instructions.

Cite this paper

A Pilot Study On Sign Language Detection. (2022, February 21). Edubirdie. Retrieved July 19, 2024, from https://edubirdie.com/examples/a-pilot-study-on-sign-language-detection/
“A Pilot Study On Sign Language Detection.” Edubirdie, 21 Feb. 2022, edubirdie.com/examples/a-pilot-study-on-sign-language-detection/
A Pilot Study On Sign Language Detection. [online]. Available at: <https://edubirdie.com/examples/a-pilot-study-on-sign-language-detection/> [Accessed 19 Jul. 2024].
A Pilot Study On Sign Language Detection [Internet]. Edubirdie. 2022 Feb 21 [cited 2024 Jul 19]. Available from: https://edubirdie.com/examples/a-pilot-study-on-sign-language-detection/

Join our 150k of happy users

  • Get original paper written according to your instructions
  • Save time for what matters most
Place an order

Fair Use Policy

EduBirdie considers academic integrity to be the essential part of the learning process and does not support any violation of the academic standards. Should you have any questions regarding our Fair Use Policy or become aware of any violations, please do not hesitate to contact us via support@edubirdie.com.

Check it out!
search Stuck on your essay?

We are here 24/7 to write your paper in as fast as 3 hours.