Text Processing And Natural Language Processing

Topics:
Words:
1906
Pages:
4
This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Cite this essay cite-image

Abstract

This paper aims to analyze and use some of the analytical techniques and tools which can be applied on human language. Nowadays, Natural language processing (NLP) has recently earned much attention for representing, analyzing and modifying text computationally. Its applications are widespread in various fields such as machine translation, detection of spam emails, information extraction, summarization, medical, and question answering etc. The paper differentiates among four phases by discussing different stages of NLP and parts of Natural Language Generation (NLG) by presenting the history and evolution of NLP, state of the art presenting the various applicable areas of NLP and trends and challenges of present world.

Introduction

Unstructured data, especially text, images and videos contain rich information. However, due to the inherent complexity in processing and analyzing this data, people usually restrain themselves from spending extra time and effort in profound studies of structured datasets to analyze these unstructured sources of data, which can be a potential gold mine [1].

Natural Language Processing (NLP) is all about holding and accessing tools, techniques and algorithms to process and understand natural language-based data, which is usually unstructured like text, speech and others. In this paper, we will be observing tried and tested strategies, techniques and workflows which can be used by practitioners and data scientists to extract useful insights from text data. We will also see some useful and interesting uses of NLP. This paper is all about processing and understanding text data with examples.

The Basics of NLP for Text

The term Text Processing refers to the logic and practice of automation of creation or manipulation of electronic text. The input to natural language processing system will be a simple stream of characters which are Unicode by nature (typically UTF-8). Some processing will be required to change this character stream into a proper sequence of lexical items (words, phrases, and syntactic markers) which can then be used to understand the content in a better way.

The seven phases of text processing are:

1. Sentence Tokenization

Sentence tokenization (also called sentence segmentation) is the idea of segmenting a string of written language into its component sentences. The concept here is very simple. In English and any other language, we can split apart the sentences whenever a punctuation mark is seen.

2. Word Tokenization

Word tokenization (also called word segmentation) is the concept of dividing a string of written language into its component words. In English and many other global languages using some form of Latin alphabet, space can be said as a good approximation of word divider.

3. Text Lemmatization & Stemming

Stemming usually refers to a natural heuristic process that chops off the ends of words in order to derive a stem word. In the hope of achieving the goal correctly most of the time it often includes the removal of derivational affixes. For example, stem word for “beautiful” and “beautifully” is “beauti”.

Lemmatization usually refers to doing things properly with the use of a vocabulary, thesaurus and morphological analysis of words, normally targeting to remove inflectional endings only and to provide the base or dictionary form of a word, which is known as the lemma.

The goal of stemming and lemmatization is to reduce inflecting forms of different words and sometimes derivationally similar forms of a word to a common base form.

4. Stop Words

Stop words are words which are filtered out before or after processing of text. When machine learning is applied to text, these words can produce a lot of noise. That’s why we need to remove these irrelevant words.

Stop words usually refer to the most common words such as articles like “and”, “the”, “a” in a language, but there is no single list of stop words universally. The list can change depending on your application.

5. Regular Expression

A regular expression, regex or regexp is a proper sequence of characters that indicates a search pattern. Regular expressions(RE) use the backslash character ('') to point out special forms or to allow special characters to be used without using their special meaning.

We can use regex to apply extra filtering to our text. For example, we can remove all the non-words characters.

6. Bag of Words

Machine learning algorithms do not work with raw text directly; we need to transform the text into vectors of numbers. This method is called feature extraction [2]. The bag-of-words model is a renowned and simple feature extraction technique used when you work with text. It explains the occurrence of each word within a document or file. Any information regarding the order or structure of words is discarded. That’s why it is named as a bag of words.

7. TF-IDF

TF-IDF, short for term frequency-inverse document frequency is a statistical measure used to assess the importance of a word to a document in a collection of works or corpus.

Natural Language Processing

Natural Language Processing (NLP) is the basic tract of Artificial Intelligence and Linguistics, completely dedicated to make computers understand the statements or words written in human languages. Natural language processing came into real existence to make easy the user’s work and to ease their wish of communicating with the computer in natural language. Since many of the users may be callow or may not be well versed in machine specific language, NLP support those users who do not have enough time to learn new languages or become perfect in it.

Benefits of NLP

The benefits of natural language processing are innumerable. Natural language processing can be applied by companies to improve the efficiency of documentation processes, improve the accuracy of documentation, and identify the most relevant information from huge databases. For example, a hospital might use natural language processing to pull a specific diagnosis from a physician’s unstructured notes and assign the treatment.

Save your time!
We can take care of your essay
  • Proper editing and formatting
  • Free revision, title page, and bibliography
  • Flexible prices and money-back guarantee
Place Order
document

Classification of NLP

Natural Language Processing majorly can be classified into two parts i.e. Natural Language Understanding (NLU) and Natural Language Generation (NLG) which includes the task to understand and generate the text respectively [4]. NLU is a component of NLP. More accurately, it is a subset of the understanding and comprehension part of NLP. NLG is a kind of technology that simply turns data into Simple-English language. In other words, this means a software can look at your data and write a story from it, just like a human analyst can do today.

The various important terminologies of Natural Language Processing are

1. Phonology

Phonology is the part of Linguistics which refers to the proper and systematic arrangement of sound. 'Phonology properly is concerned with the functioning, behavior and collective organization of sounds as linguistic items. Phonology consists of semantic use of sound to encode meaning of any language spoken by human.

2. Morphology

Origin of this word describes its meaning as study of a particular form, shape or structure. The different parts of the word represent the smallest parts or smallest units of meaning known as Morphemes. Morphology which is study of nature of words, are initiated by morphemes. The interpretation of morpheme stays same for all the words, just to understand the meaning humans can break any word into morphemes.

3. Lexical

In Lexical phase, both humans and NLP systems, interpret the meaning of individual words. Words that can act as more than one part-of-speech are given the most probable part-of speech tag based on the context in which they occur.

4. Syntactic

This level emphasizes to observe the words in a sentence so as to look over the grammatical structure of the sentence. Both grammar and parser are required in this phase.

5. Semantic

Semantic processing determines the possible meanings of a sentence by focusing on the interactions among word-level meanings in the sentence. This level of processing can incorporate the semantic disambiguation of words with multiple meanings.

6. Discourse

The discourse phase of NLP engages with units of text longer than a sentence i.e., it does not analyze multi sentence texts as just sequence sentences, which can be explained singly. Rather, discourse focuses on the properties of the text as a whole that extracts meaning by making connections between component sentences or component parts of sentences.

7. Pragmatic

Pragmatic is concerned with the firm use of language in situations and utilizes central point over and above the central point of the text or matter for understanding the goal. It’s meant to explain how extra meaning is read into texts without really being encoded in them. More or less it defines the practical approach.

Applications of NLP

Natural Language Processing can be applied into various areas like Machine Translation, Email Spam detection, Information Extraction, Summarization, Question Answering, Text Categorization etc. [5].

1. Machine Translation

Everyone knows what is a manual translation — we translate information from one language into another manually. When the same thing is done by a machine, it is called “Machine” Translation. The idea behind MT is simple — to develop computer algorithms which allow automatic translation without any human intervention.

2. Speech Recognition

In the last few decades, NLP has allowed us to achieve significant success. Speech recognition is the process of converting speech into text, which is further processed to understand the meaning. Now we have a wide variety of speech recognition software programs that allow us to decipher the human voice. It is used in mobile telephony, home automation, hands-free computing, virtual assistance, video games etc.

3. Sentiment Analysis

Sentiment analysis (also known as opinion mining or emotion AI) is a special type of data mining that measures the inclination of people’s opinions. The work of this analysis is to identify subjective information embedded in the text.

4. Automatic Summarization

Earlier, Information overload could be a real drawback but now we have Automatic Summarization. It is the process of creating a short, accurate, and articulate summary of a longer text document. The most important advantage of using a summary is it lessen the reading time.

5. Chatbot

The first chatbot appeared in the 1960s, they were quite basic: they rearranged what a person spoke to them. Modern chatbots are not far from their ancestors. NLP has become the base for creating chatbots, although such systems are not so reliable and accurate they can easily handle standard tasks.

Tools used for applying NLP

NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to many different work and lexical resources. Also, it contains a stock of text processing libraries for classification, tokenization, stemming, lemmatization, tagging, parsing, and semantic reasoning [6]. Overall, NLTK is a free, open source, community-driven project.

Conclusion: NLP takes a very important role in new machine human interfaces. When we look at some of the products based on technologies with NLP we can see that they are very advanced and very useful. But various limitations are present. For example, language we speak is highly ambiguous. This make it very difficult to understand and analyze. Also with many languages spoken all over the world it is very difficult to design a system that is 100% accurate.

These problems get more complicated when we think of different people speaking the same language with different styles. Intelligent systems are being experimented right now. Ideally NLP will influence the development of programming languages, and computer programming will use natural human languages rather than specialized codes for development. We will be able to see improved applications of NLP in the near future.

References

  1. https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72
  2. https://www.searchtechnologies.com/blog/natural-language-processing-techniques
  3. https://towardsdatascience.com/machine-learning-text-processing-1d5a2d638958
  4. https://towardsdatascience.com/nlp-vs-nlu-vs-nlg-know-what-you-are-trying-to-achieve-nlp-engine-part-1-1487a2c8b696
  5. https://towardsdatascience.com/natural-language-processing-nlp-top-10-applications-to-know-b2c80bd428cb
  6. https://towardsdatascience.com/nlp-vs-nlu-vs-nlg-know-what-you-are-trying-to-achieve-nlp-engine-parthttps://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63-1-1487a2c8b696
Make sure you submit a unique essay

Our writers will provide you with an essay sample written from scratch: any topic, any deadline, any instructions.

Cite this paper

Text Processing And Natural Language Processing. (2022, February 24). Edubirdie. Retrieved April 29, 2024, from https://edubirdie.com/examples/text-processing-and-natural-language-processing/
“Text Processing And Natural Language Processing.” Edubirdie, 24 Feb. 2022, edubirdie.com/examples/text-processing-and-natural-language-processing/
Text Processing And Natural Language Processing. [online]. Available at: <https://edubirdie.com/examples/text-processing-and-natural-language-processing/> [Accessed 29 Apr. 2024].
Text Processing And Natural Language Processing [Internet]. Edubirdie. 2022 Feb 24 [cited 2024 Apr 29]. Available from: https://edubirdie.com/examples/text-processing-and-natural-language-processing/
copy

Join our 150k of happy users

  • Get original paper written according to your instructions
  • Save time for what matters most
Place an order

Fair Use Policy

EduBirdie considers academic integrity to be the essential part of the learning process and does not support any violation of the academic standards. Should you have any questions regarding our Fair Use Policy or become aware of any violations, please do not hesitate to contact us via support@edubirdie.com.

Check it out!
close
search Stuck on your essay?

We are here 24/7 to write your paper in as fast as 3 hours.