Indian Sign Language To Text Conversion

This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Cite this essay cite-image


Communication is the exchange of information by speech, visual signals, writing, or behaviour. Deaf and dumb people communicate among themselves using sign languages, but they find it difficult to expose themselves to the outside world. There is a need for research in this field to bring deaf-mutes more into the light of society and to increase their interaction with common man. The system developed in this paper aims at extending a step forward in this field by developing a system that uses Convolutional Neural Networks(CNN) and Faster R-CNN to extract the patterns in the feature vectors of each sign of twenty six alphabets and further uses the results to recognize those signs. This paper proposes different methods for the design of neural networks to improve the performance. Trained models are implemented on Raspberry Pi.


Indian Sign Language (ISL)

Sign Languages are mainly used by deaf to interact with the others. They express their language by some particular manual actions. These are the languages with their own signs and meanings which vary from place to place. There are different sign languages such as American, Indian, International sign languages. Very little research has been done on ISL. There is no proper dataset confined to it. Fig. 1 shows the ISL sign chart of 26 English alphabets.

Save your time!
We can take care of your essay
  • Proper editing and formatting
  • Free revision, title page, and bibliography
  • Flexible prices and money-back guarantee
Place an order

Machine Learning

Machine learning is an application of Artificial Intelligence(AI). These techniques provides special systems that can learn by training experience by themselves without separate programming for everything. There have been many algorithms and techniques to implement machine learning based on the challenge given. These are very useful for real time computer vision programs.

Convolutional Neural Networks(CNN)

It is a class of deep neural networks. It is mainly used for image classification which is developed by taking visual cortex as reference. The hidden layers mainly consists of convolutional layers, RELU activation functions, pooling, fully con- nected and normalization layers.

Previous Work

A little research is carried out in Indian Sign Language. One method uses key point detection of Image using SIFT and then matching the key-point of a new image with the key-points of standard images per alphabet in a database to classify the new image with the label of one with the closest match[2]. Another one calculated the eigen vectors of co-variance matrix calculated from the vector representation of image and used euclidean distance of new image eigen vector with those in training data set to classify new image [3].Some of them used Neural networks for training but their data set comprised of only single handed images and their feature vectors are based on the angle between fingers, number of fingers[4]. In another approach,a real-time system uses skin segmentation to find the segmented region is fed to the neural network model that is developedto predict the sign[5], but the dataset they used contains only 3000 images and are single handed signs of numbers, there is no much details about the CNN model used and computation time.

Our Contribution

There is no proper data-set available for ISL. Also, people in different regions of India uses different signs for communication. Most commonly used signs are collected a data-set is prepared for those signs. The signs used in this paper contains use of double hands rather than single hand considered in many other Indian Sign Language to Text Conversion using Faster R-CNN 3 papers [4] [5]. A modified version of VGGNet[1] is used to recognise and classify signs in ISL, faster R-CNN are used with base network as modified VGGNet for more accurate results. The modified VGGNet is trained and the model is implemented on Raspiberry Pi for real time recognition of signs in ISL. In the coming sections we described our approach to develop it, used principles, methodologies and results.


First things first, as there exists little data for Indian Sign Language(ISL), a proper dataset is prepared by signs represented by two male and two female signers. Each alphabet’s sign is captured in different lighting conditions and backgrounds which made upto eight to nine hundred different variations of a single sign. The collected data is further labeled with the boxes bounding only the hand-sign portion in the image for all the twenty eight thousand images in the dataset. Fig. 2 shows the sign ’F’ taken in a dimly lit room as part of data collection to prepare dataset.

Modified VGGNet

As discussed in the previous section a system which is accurate and fast should be developed. The neural networks used in the system should have small layers to make faster computations in each convolutional layers. This left an option to modify already developed Convolutional Neural Network(CNN) named Vision Geometric Group Net (VGGNet) by decreasing each convolution layer’s size appropriately to have a good trade off between accuracy and computation time. This in turn reduced the parameters as shown below in the Table 1

The modified VGGNet is trained on NVIDIA GPU with a batch size of thirty two and seventy five epochs. Categorical Cross-entropy is used for loss function. Atfer training on whole dataset an accuracy of 93.39 % is achieved with a loss of 0.0856. The trained model is saved on hard disk for future use to predict signs from images. It is implemented on Raspberry Pi 3 A+ , pi camera is used to take images as input in real time. Images are captured at a rate of one image per 1.5 seconds as the Pi took an average time of 1.5 seconds to predict a given image.

Faster R-CNN

Faster R-CNN consists of 2 totally different networks specifically Region Proposal Network (RPN) and another network that uses the projected regions from RPN to sight objects. The output of an RPN could be a bunch of boxes or proposals which will be examined by a classifier and regressor to eventually check the prevalence of objects. To be additional precise, RPN predicts the likelihood of associate anchor being background or foreground, and refine the anchor. After RPN, we have a tendency to get projected regions with totally different sizes.

Its harsh to create and associate economical structure to figure on options with totally different sizes. Region of Interest Pooling (ROI) will alter the matter by reducing the feature maps into identical size, in contrast to Max-Pooling that incorporates a fix size, ROI Pooling splits the input feature map into a hard and Indian Sign Language to Text Conversion using Faster R-CNN 5 fast variety of roughly equal regions, and so apply Max-Pooling on each region. thus the output of ROI Pooling is often same regardless the scale of input.

Modified VGGNet is used as base network, the layers are modified appropriately to reduce the parameters to train on the dataset. The dataset is labelled to get the boxes which bound the signs, it is used to train RPN. Training is done on Google Cloud Platform with a GPU of 16 GB NVIDIA Tesla P-100.

Results and Discussions

The modified models are trained with dataset of 28,000 images previously prepared. The results were observed for images taken in a good lighting conditions which showed satisfactory results. Table 2 gives the evaluation of modified VG- GNet CNN trained on ISL dataset in terms of precision, recall, fl-score. But as the conditions such as light, background varied the predicted results deviated a lot from the original results. A sign with alphabet ’A’ is given to system and it was predicted correctly with a confidence of 94.14% as shown in Fig. ??

A sign consisting of alphabet ’Z’ is given to the system, Fig. ?? shows that the modified VGG CNN failed to detect the given sign as ’Z’ instead predicts it as ’H’. It failed to detect as it has lot of background in the image and it cosiders whole image as a single sign, tries to predict but that is not the case for that image. That is the reason for approaching the recognition problem with the R-CNN.

To get a good prediction results the method of R-CNN is applied to as mentioned in the previous section. It shows an improved results over the modified VGG model and major advantage is that the R-CNN model draws bounding boxes over the sign in the input image, it can also detect mutliple signs in a single image and draw bounding boxes over them. Table 3 gives the various losses and accuracy of overall modified VGGNet R-CNN. The main disadvantage of this approach is that it takes a lot of time to predict the sign in the image. On an average it takes upto 8 seconds on a CPU of 8GB RAM Intel i7 7th generation. The same sign of alphabet ’Z’ is given to the system but it has predicted it correctly as ’Z’ as shown in Fig. ??


The modified VGGNet model is lighter when compared to the original VGGNet and can be used for application on small scale computers such as Raspberry Pi. There is a trade-off in accuracy but is not that persistent and can be ignored for good lighting conditions. The modified VGGNet CNN can not be used for multiple signs in the single image where as modified VGGNet RCNN can be used for mutliple signs in the image and is trained in such way. VGGNet failed to work in variation of different backgrounds, while modified VGGNet R-CNN had an upper-hand over VGGNet CNN in this context.

Over all accuracy is comparatively good for both the models but the accuracy while prediction dropped as the background changed. It can be that confidence of ’Z’ is 48%. Image segmentation will work upto an extent until there is no background that matches with human skin colour, and also that human skin colour varies from light brown to dark brown, light black to dark black. This increases the complexity of segmentation approach. The predi- cation confidence can be increased by making a dataset from more varied images such as more backgrounds, more shades of light. Mask R-CNN can be used to recognise sign language but the computation involved is not feasible for smaller systems.

The dataset and models proposed in this paper have achieved the purpose to in- crease interactions of deaf-mute with computer. More signs can be added to dataset which may include signs containing motion of hands in it too. Mo- bileNets[7] can be used to reduce the computation time drastically and increase the frame rate of prediction. precision recall fl-score micro avg 0.91 0.90 0.92 macro avg 0.91 0.89 0.91 weighted avg 0.93 0.93 0.90 Table 2. Table showing precision, recall, fl-score of different averages for modified VGGNet.


  1. Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
  2. Sakshi Goyal, Ishitha Sharma, S. S. Sign language recognition system for deaf and dumb people. International Journal of Engineering Research Technology 2, 4 (April 2013).
  3. Joyeeth Singh, K. D. Indian sign language recognition using eigen value weighted euclidean distance based classification technique. International Journal of Advanced Computer Science and Applications 4, 2 (2013). Indian Sign Language to Text Conversion using Faster R-CNN 7
  4. Padmavathi . S, Saipreethy.M.S, V. Indian sign language character recognition using neural networks. IJCA Special Issue on Recent Trends in Pattern Recognition and Image Analysis, RTPRIA (2013).
  5. Sajanraj, T. D., and Mv Beena. ”Indian sign language numeral recognition using region of interest convolutional neural network.” 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE, 2018.
  6. He, Kaiming, et al. ”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  7. Howard, Andrew G., et al. ”Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).
Make sure you submit a unique essay

Our writers will provide you with an essay sample written from scratch: any topic, any deadline, any instructions.

Cite this paper

Indian Sign Language To Text Conversion. (2022, February 21). Edubirdie. Retrieved July 23, 2024, from
“Indian Sign Language To Text Conversion.” Edubirdie, 21 Feb. 2022,
Indian Sign Language To Text Conversion. [online]. Available at: <> [Accessed 23 Jul. 2024].
Indian Sign Language To Text Conversion [Internet]. Edubirdie. 2022 Feb 21 [cited 2024 Jul 23]. Available from:

Join our 150k of happy users

  • Get original paper written according to your instructions
  • Save time for what matters most
Place an order

Fair Use Policy

EduBirdie considers academic integrity to be the essential part of the learning process and does not support any violation of the academic standards. Should you have any questions regarding our Fair Use Policy or become aware of any violations, please do not hesitate to contact us via

Check it out!
search Stuck on your essay?

We are here 24/7 to write your paper in as fast as 3 hours.