Arabic Sign language translation into text and into other language is an important issue that many researchers have worked on. There are many applications available in the market to help the dumb people to interact with the world. Several works is reported that use techniques which help to translate sign language without a human interpreter, but with the use of Artificial Intelligence (AI) techniques. There are many researchers working in this active area, we will review some of this works.
In our proposed system we used semantic web technologies in enhancement translation sign language to another. The first work in using ontology to enhance sign language translation was ATLAS (13). ATLAS is a project for automatically translating from Italian text to Italian Sign Language (LIS). The translation system communicates with the user through a virtual signer: the system takes a text written in Italian language and translates it into a formal intermediate representation of a sign language sentence called ATLAS Written Italian Sign Language (AWLIS). AWLIS sentences are then translated into a character’s gestures Animation Language (AL) which describes the way the basic movements are produced and linked. (14)Present a proposed system for semantically translating Arabic text to Arabic SignWriting in the jurisprudence of prayer domain using ontology as a semantic web technology. The system is designed to translate Arabic text by applying Arabic Sign Language (ArSL) grammatical rules as well as semantically looking up the words in domain ontology. The system result was SignWriting symbols.
Researchers develop sign language recognition systems for a lot of languages. Paper (15) presents an automatic visual SLRS that translates isolated Arabic word signs into text. Geometric features of the hands are employed to formulate the feature vector. Euclidean distance classifier is applied for classification stage. A dataset of 30 isolated words used in the daily school life of the hearing impaired children was developed. The proposed system has a recognition rate of 97%.
Menna ElBadawy et al (16) developed the system recognition for Arabic sign language. 3D Convolutional Neural Network (CNN) was used to recognize 25 gestures from Arabic sign language dictionary. The recognition system was fed with data from depth maps. The system achieved 98% accuracy for observed data and 85% average accuracy for new data.
Aly, S et al (17), propose a system for alphabetic Arabic sign language recognition using depth and intensity images which acquired from SOFTKINECT™ sensor. The proposed method does not require any extra gloves or any visual marks. Local features are learned using method called PCANet. The extracted features are then recognized using linear support vector machine classifier. The obtained results show that the performance of the proposed system improved by combining both depth and intensity information which give an average accuracy of 99:5%.
B. Hisham and A. Hamouda (18) propose a model to recognize both of static gestures like numbers, letters, and dynamic gestures. They used an affordable and compact device called Leap Motion controller, which detects and tracks the hands’ and fingers’ motion and position in an accurate manner. The proposed model applies several machine learning algorithms as Support Vector Machine (SVM), K- Nearest Neighbour (KNN), Artificial Neural Network (ANN) and Dynamic Time Wrapping (DTW). Applied on 38 static gestures (28 letters, numbers (1:10) and 16 static words) and 20 dynamic gestures.
Although a variety of methods have been proposed in recent years to recognize hand gestures, most of them focus on using deep learning. G. Anantha Rao et. al. (19) discusses the results of the application of Deep Convolutional Neural Networks for Sign Language Recognition, with an accuracy of 92.88% recognition on a self-constructed dataset using OpenCV and Keras libraries.
The Proposed System
Previous work in translating Arabic sign to English sign are very few, most of these research worked only on translating words to signs and did not take care of the semantics of the translated sentence or the translation rules of Arabic text to Arabic sign language.
An automatic sign translation system utilizes a camera to capture the image with signs, recognizes signs, and translates results of sign recognition into a sign of the same meaning in other language. The proposed system can do the following:
- Sign gesture translation from one language to anther
- word can be translated into sign language gesture and vice versa
The architecture of the proposed hand gesture translation system is illustrated in Figure 2. There are two main phases in our proposed system. They are training phase and translation phase. The first phase has two sub phases; create Arabic France and English Ontology (AraFraEngOnto), and training process using deep learning algorithms. The second phase has the process of sign image preprocess, classification process and semantic analysis process. Output is translated sign or analyzed text.
The system has an ontology based to store Arabic dictionary words with the corresponding signs, and also its corresponding English and France words with their signs.
The basic deep learning tool used in this work is Convolutional Neural Networks (CNNs) (9), which named LeNet Convolutional Neural Network. LeNet (CNN) originally designed for classifying handwritten digits (8), we can easily extend it to other types of images as well. We have adopted the LeNet architecture (9) to solve the hand signs language recognition task using our gestures database. The Architecture of LeNet Convolutional networks shown in figure 3. It consists of an input layer followed by two sets of alternating convolutional and max-pooling layers. These layers are then followed by a convolutional layer, a fully connected layer. The term “Fully Connected” implies that every neuron in the previous layer is connected to every neuron on the next layer, Finally, is the output layer. Size of output layer, i.e., the number of neurons, is equal to the number of output classes, e.g., ten signs for each language.
The hidden layers are convolution and pooling layers which act as feature extractors from the input gesture images while the fully connected layer used as a classifier. Using convolution layers to extract features automatically from each input image. then reduced The dimensionality of these features by the pooling layer. finally the fully connected layer with a softmax activation function makes use of the learned high-level features to classify the input gesture images into to one of predefined classes.
Machine translation is a made automated translation. This system implemented by utilizing computer software to transform a sign language from Arabic language to another language (such as English, France) without any human involution.
Nowadays, Ontologies are important to share common understanding of the domain knowledge and to know how knowledge is structured and related to each other. Also, it is important to help in reusing knowledge. We can’t use any WordNet Ontology was prepared; we used only some of its semantic relations. Because our ontology developed for different aim, so we propose new ontology. The ontology based developed to store Arabic dictionary words with the corresponding synonym and image file name of the sign represent meaning. The ontology based also store Arabic dictionary words with the corresponding English and France meanings, their corresponding synonyms and image file name of the sign represent meaning. The ontology decomposed into three ontologies (Arabic, English and France Ontologies). The ontology was developed manually with the help Google translation. Both ontologies are mapped using Arabic, English and France bilingual index. The manually created ontology consists of 10 Arabic concepts mapped to their English and France equivalents. In addition to some concepts relations, such as “is a” and “has a” relationships, ontology also includes “instance of” and many other relations.
- The main concepts of ontology are: English_words , mots_france and كلمات_اللغة_العربية
- Properties we are used to describe resources they give more description to a class/concept or individuals
A. Object properties that used for relationship between individuals:
- Synonym: as semantic relations between categories of word vocabulary.
B. Data property that used for relationship between individuals are:
- Label: to assign the word to its label in the deep learning model.
C. Annotation property (label) also used to add all possible meanings of the words.
These relations are used to expand queries with semantically related concepts to improve the information retrieval systems.
For example “أشكرك” is Arabic_meaning of English _meaning “Thank you”, “أشكرك” is Arabic_meaning of France _meaning “merci”, “merci” is France_meaning of English _meaning “Thank you” and vice versa. Each meaning has its sign and all available equivalents meanings. So if “شكرا” was a query keyword it will be expanded to “أشكرك”, system able to return English or France meaning in sign language. According to equivalent concepts in English or France
The preprocessing consists of many main operations, which applied on the original image (Arabic, English, France singe language images), after it is taken from the camera, to enhance the image. The following four operations are used for this purpose: resize the images, convert images to gray scale which transformed the image of the primary colors (RGB) to Gray scale, Then smoothing the image by convolving with a Gaussian filter and median blur to reduce noise in the images. We used Python OpenCV library for image processing.
In this step we already have samples arranged and ready to train the system .Each language has its training data set. The purpose of this step is to create feature extraction model, where the network learns to detect different high-level features from the input images. The hidden layers are convolution and pooling layers act as feature extractors from the input images while the fully connected layer acts as a classifier.
In the architecture of convolutional neural networks, convolution layers are followed by subsampling layers. A layer of sub-sampling reduces the size of the convolution maps, and introduces invariance to (low) rotations and translations that can appear in the input. A layer of max-pooling is a variant of such layer that has shown different benefits in its use. The output of max-pooling layer is given by the maximum activation value in the input layer over sub windows within each feature map. The max-pooling operation is to reduce the size of the feature map. After training each language has its training model.
In translation phase, Sign gesture image is loaded, preprocessed, presented to the network to classify to its corresponding label ,then this label will semantically analysis to corresponding meaning and using sparql query to get the corresponding other language meaning . Finally user can see the sign capture of that meaning .We tested our system using 30 signs 10 for each language .i.e. Arabic, English and France.
The classification process, known as inference, uses the learned model to classify new preprocessed sign image (i.e image sign input that was not previously seen by the model). The inference is implemented each time a new sign image has to be classified. Then label output of the corresponding gesture is produced and extended to the next process.
Within the classification step, when a new (unseen) image is input into the CNN, fully connected layers are used where each neuron provides a full connection to all learned feature maps issued from the previous layer in the convolution neural network. These connected layers are based on the softmax activation function in order to define the corresponding class. The input of the softmax classifier is a vector of features resulting from the learning process and the output is a probability that an input gesture image belongs to one of predefined classes.
Semantic analysis is the process of mapping a word, label, or text to a formal representation of its meaning. When a set of words share the same fundamental meaning they will have one sign .This process takes the result of the previous process and search for the word in the Ontology to get the word synonyms and its meaning in other language then get the sign URL. If the word does not have a corresponding sign then replace this word by one of its synonyms that have a sign gesture in the ontology base.
Given the user query, a semantic query analyzer accesses the ontology to find the related concepts (semantically related words).
SPARQL is used to access the RDFs inside the ontology by using SELECT statement as a query. SPARQL query has a standard syntax and depends on using variables that contain the predicate, subject and objects for RDF.
Here are some sparql queries used in searching and retrieving data from ontology. Our ontology consider English-Arabic –France resource construction, it can be extended to additional languages and resource types.
- SPARQL query to translate Arabic signs to English signs
- SPARQL query to translate English signs to Arabic signs
- SPARQL query to translate Arabic signs to France signs
- SPARQL query to translate Arabic meaning to its corresponding Arabic sign language.
To evaluate our system with real data, we collect a new hand gesture dataset using mobile camera. We captured the images for each sign. We initially split the dataset into two for training and evaluation, the classification accuracy showed to be high. A basic requirement for the successful training of CNN models is that sufficient training data labeled with the classes are available.
The dataset consists of 10 Arabic gestures from the unified Arabic sign language dictionary (21), and their English and France meaning gestures .Dataset collected from 4 different signers. Hence, in our experiments, 75% of dataset is used for training and the rest 25 % is used for testing
We implemented deep learning with Python using Keras (22) .We had to choose the backend between Theano and TensorFlow (23), and we opted for the later. TensorFlow (23) is mainly used in classification, TensorFlow, an open-source software library for deep learning created and maintained by Google. The AraFraEngOnto ontology is created using protégé (24) as ontology editor .We recommend to use Anaconda (25) because it installs all what you want only by typing the install method found in their website to install the packages you want, and it works on all systems, windows, linux and Mac OS. Also it has Spyder editor, a powerful python editor that has all the functionalities needed to write and run the codes, and also view the results in the ipython console. It also has the option to create an environment to install the packages and let the system untouched. It is essential if you have another version of python installed and you don’t want to have incompatibilities with it.
Sign language is a structured set of hand gestures with a specific meaning .In this work; we propose a novel methodology that bridges the gap between ontology and deep learning thus taking advantage of both methods. Arabic sign language recognition is an important field of research. It helps in removing the barriers between deaf and community. Most of the surveyed techniques of Arabic sign language recognition employed machine learning and computer vision techniques for signs recognition.
In this system, the effectiveness of using ontology based was better than the Dictionary based one The benefit of using ontology is not limited to normal word to word translation.
This paper presents a proposed semantic translation system for translating Arabic static sign language to its meaning as text and also to sign language meaning in other language using ontologies for semantic translation. In order to check the feasibility of our proposed system, we have limited the signs to ten Arabic gestures and their meaning in English and France languages. We evaluate this system based on the rate of successful classification. When we translate a set of signs automatically and translate a set of signs manually then compare two results, it was the same. The next step is to design translation system that translates continuous signs from Arabic to English, using ontology at the sentence level.