Natural Language Processing is a branch that is mainly involved in the field of Artificial Intelligence and deals with the study of mathematical and computational modelling of various aspects of natural language and the development of wide range of systems. NLP researchers aim to research on how human beings understand and use language so that appropriate tools and techniques can be developed for the computer to understand and manipulate natural languages to perform the desired tasks. In this paper various techniques involved in the process of NLP and the fields in which it is being applied are described.
Natural Language is a general term for a wide range of tasks and methods related to automated understanding of human languages. In recent years, the amount of available diverse textual information has been growing rapidly, and specialized computer systems can cover ways of managing, sorting, filtering and processing this data more efficiently. As a larger goal, research in NLP aims to create systems that can also understand the sense of the text, gain relevant knowledge, arrange it into easily accessible formats, and even discover latent or previously unknown information using inference. Research in Machine (ML) focuses on the development of algorithms for automatically learning patterns and making decisions based on empirical data, and it offers useful approaches to many NLP problems.
NLP, also known as Computational Linguistics (CL), is a blend of language, machine learning and artificial intelligence. Four key factors enabled these developments:
- Vast increase in computing power
- Availability of very large amounts of linguistics data.
- Development of highly successful machine learning (ML) methods , and
- Richer understanding of the structure of human language and its deployment in social contents.
Applications of NLP include a number of fields of studies, such as machine translations, natural language text processing and summarization, user interfaces, multilingual and cross language information retrieval (CLIR), speech recognition, artificial intelligence and expert systems, and so on.
The steps that are involved in the Natural Language Processing takes place in this order:
Natural Language Understanding
At the core of any NLP task there is the important issue of natural language understanding. The process of building computer programs that understands natural language involves three major problems: the first one relates to the thought process, the second one to the representation and meaning of the linguistic input, and the third one to the world knowledge. An NLP system may begin at the word level – to determine the morphological structure, nature (such as part-of-speech, meaning) etc. Of the word – and then may move on to the sentence level – to determine the word order, grammar, meaning of the entire sentence, etc. – and then to context and the overall environment or domain. A given word or a sentence may have a specific meaning or connotation in a given context or domain, and may be related to many other words and/or sentences in the given context.
Liddy and Feldman suggest that in order to understand natural languages, it is important to be able to distinguish among the following seven interdependent levels, that people use to extract meaning from text or spoken languages:
- Phonetic or phonological level that deals with pronunciation.
- Morphological level that deals with the smallest parts of word, which carries a meaning, suffixes and prefixes.
- Lexical level that deals with the lexical meaning of words and parts of speech.
- Syntactic level that deals with grammar and structure of sentences.
- Semantic level that deals with the meaning of words and sentences.
- Discourse level that deals with the structure of different kinds of text using document structures and
- Pragmatic level that deals with the knowledge that comes from the outside world, i.e., from outside the contents of the document.
Natural Language Text Processing Systems
Manipulation of texts for knowledge extraction, for automatic indexing and abstracting, or for producing text in a desired format, has been recognized as an important area of research in NLP. This is broadly classified as the area of natural language text processing that allows structuring of large bodies of textual information with a view to retrieving particular information or to deriving knowledge structures that may be used for a specific purpose. Automatic text processing systems generally take some form of text input and transform it into an output of some different form.
The central task for natural language processing is the translation of potentially ambiguous natural language queries and texts into unambiguous internal representations on which matching and retrieval can take place. Past research concentrating on natural language processing systems has been reviewed by Haas, Mani & Maybury, Smeaton, and Warner. Some NLP systems have been built to process texts using particular small sublanguages to reduce the size of the operations and the nature of the complexities. These domain-specific studies are largely known as ‘sublanguage analyses’.
Information retrieval has been a major area of application of NLP, and consequently a number of research projects, dealing with the various applications on NLP in IR, have taken place throughout the world resulting in a large volume of publications. Lewis and Sparck Jones commented= that generic challenge for NLP in the field of IR is whether the necessary NLP of texts and queries is doable, and the specific challenges are whether non-statistical and statistical data can be combined and whether data about individual documents and whole files can be combined. Feldman suggests that in order to achieve success in IR, NLP techniques should be applied in conjunction with other technologies, such as visualization, intelligent agents and speech recognition. Chandrasekar and Srinivas propose that coherent text contains significant latent information, such as syntactic structure and patterns of language use, and this information could be used to improve the performance of information retrieval systems. They describe a system, called Glean that uses syntactic information for effectively filtering irrelevant documents, and thereby improving the precision of information retrieval systems.
Natural Language Interfaces
A natural Language interface is one that accepts query statements or commands in natural language and sends data to some system, typically a retrieval system, which then results in appropriate responses to the commands or query statements. A natural language interface should be able to translate the natural language statements into appropriate actions for the system. It is concluded that the goal of natural language systems cannot be fully achieved due to limitations of science, technology, business knowledge, and programming environments. The specific problems include:
- Limitations of NL understanding;
- Managing the complexities of interaction(for example, when using NL on devices with different bandwidth);
- Lack of precise user models(for example, knowing how demographics and personal characteristics of a person should be reflected in the type of language and dialogue, the system is using with the user), and
- Lack of middleware and toolkits.
A few years ago, we used to type keywords into Google search to get effective results. Today we have the comfort of vocally seeking help with the technology assistant. One of the most pragmatic tech trends, NLP, has multiple applications.
- Semantic Analysis: Sentiment analysis of NLP helps business organisations gain insights on consumers and do a competitive comparison and make necessary adjustments in the business strategies.
- Chatbots: Intelligent chatbots are offering personalised assistance to the customers already. Analysts predict that the use of chatbots will grow 5 times year on year.
- Customer Service: NLP has aided in multiple functions of customer service and served as an excellent tool to gain insights. Speech recognition and excellent text to speech systems could even aid the blind.
- Managing the advertisement funnel: NLP is a great source for intelligent targeting and placement of advertisements in the right place at the right time and for the right audience.
- Market Intelligence: NLP gives exhaustive insights into employment changes and status of the market, tender delays, and closings which help in extracting information from large repositories.
There are other applications as well, such as neural machine translation, data visualisation, biometrics, robotics and more.
Fake news Detection
Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem required by all online content providers.
Natural Language Processing (NLP), search for specific patterns or linguistic cues that indicate a particular article is fake news. This is different from a fact checking algorithm that cross-references an article with other pieces to see if it contains inconsistent information.
NLP in Financial Markets
At the moment, NLP is mainly used to assess the sentiment of news feeds. I see that evolving toward a more general approach. You could use NLP to analyze past earning calls and annual reports to estimate future earnings growth or market capitalization.
· Risk Management
When you don’t have prices for a security, you cannot compute factor exposures. You could use NLP to predict those factor exposures by analyzing the latest reports of the companies.
NLP is being used for ensuring the sustainability by gaining sufficient knowledge from the historical data of the various companies.
NLP in healthcare
The rise of big data in the healthcare industry is setting the stage for natural language processing (NLP) and other artificial intelligence tools to assist with improving the delivery of care.
Some of the areas where NLP is involved are:
- Enhancing provider interactions with patients and EHR.
- Improving patient health literacy.
- Contributing to higher quality of care
- Identifying patients in need of improved care coordination.
In the future, NLP and other machine learning tools could be the key to better clinical decision support and patient health outcomes.
NLP in Education Systems
NLP is very significant to develop new software systems and advanced techniques in the educational settings. The main purpose is to bring improvement, which can assist in utilizing advance technologies for the educational systems.
- Text Evaluator Capability: Text Evaluator is designed to evaluate any edited text that is formed as continuous prose.
- Language Muse Application:
Language Muse is a web-based, instructional authoring application intended to support teachers in the development of curricular materials for English-language learners (ELL).
Automated test Item Generation
NLP in educational application provides a solution to the barriers in the educational systems, which result in affecting the academic progress and learning of the students.
In conclusion, Natural Language Processing is a field of computer science and AI that focuses mainly on the interaction among computers and humans. Some of the existing real life applications of NLP include Apple’s Siri and Microsoft’s Cortana.
Hence the ways in which NLP is advancing the future by enhancing human-machines interactions:
- Controlling unstructured data
- Analyzing sentiment
- Smarter Search
- Intelligence Gathering
- Healthcare Recording.
No matter where it is applied, NLP will be essential in understanding the true voice of the user and the customer and facilitating more seamless interaction on any platform where language and human communication are used.
- R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR).
- R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize. Technical report, AT&T Labs, 2007. http://www.research.att.com/˜volinsky/netflix.
- Y. Bengio and R. Ducharme. A neural probabilistic language model. In Advances in Neural Information Processing Systems (NIPS 13), 2001.
- L. Bottou, Y. LeCun, and Yoshua Bengio. Global training of document processing systems using graph transformer networks. In Conference on Computer Vision and Pattern Recognition (CVPR).
- P. F. Brown, V. J. Della Pietra, R. L. Mercer, S. A. Della Pietra, and J. C. Lai. An estimate of an upper bound for the entropy of english. Computational Linguistics, 18(1):31–41, 1992b.
- O. Chapelle, B. Schlkopf, and A. Zien. Semi-Supervised Learning. Adaptive computation and machine learning. MIT Press, Cambridge, Mass., USA, September 2006.
- E. Charniak. A maximum-entropy-inspired parser. In Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies (NAACL-HLT).