Abstract
Sentiment analysis and opinion mining is the field of study that analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining procedures. The growing importance of sentiment analysis coincides with the growth of various online activities such as product/movie reviews, forum discussions, blogs, twitter and other social networks.
With the help of supervised learning and precise datasets, I can get amazing results for predic-tion of sentiments. In the realm of sentiment analysis and opinion mining, researchers often explore various approaches and techniques to improve the accuracy and effectiveness of sentiment prediction. One valuable resource that contributes to this field is the motivation essay, which offers personal insights and narratives that shed light on the intricate relationship between language, emotions, and sentiments. By incorporating the perspectives shared in the motivation essay, researchers can further enhance the precision and efficacy of sentiment prediction models.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
There are several challenges in opinion mining though. For instance, a word that is considered to be positive in one situation may be considered negative in another situation. Take the word ”long” for example. If a customer said a phone’s battery life was long, that would be a positive opinion. If the customer said that the phones start-up time was long, however, that would be a negative opinion. Another chal- lenge is that people dont always express their opinions the same way. As a result, I can have differing opinions and a slight change in the sentence can change the whole meaning. These differences clearly show that an opinion system trained to gather opinions on one type of product or product feature may not perform very well on another.
The project is targeted for a comparative analysis on the different classifiers that can be used for text-based sentiment analysis. It also uses context based regularisation to eliminate inconsistencies as shown in the previous examples. I train the machine on a large dataset and predict the output sentiment of a given paragraph on different classifiers. I check their respective accuracies and choose the classifier that gives the best result among them.
Comparative analysis on different classifiers for text-based sentiment Analysis
Chapter 1
Introduction
1.1 Sentiment Analysis
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. Its also known as opinion mining, deriving the opinion or attitude of a speaker. A generic use case of this topic is how different people feel about a particular topic.
1.2 Text-Based Sentiment Analysis
Say, you see a new smartphone on an online store. Different people may have different opinions about the product. Humans are fairly intuitive when it comes to interpreting the tone of a piece of writing. But, if we want a statistical analysis of the reviews of the particular product, the task will become cumbersome for humans to process alone. To accomplish this task, we will need a machine to process the data. The human language is complex. Teaching a machine to analyse the various grammatical nuances, cultural variations, slang and misspellings that occur in online mentions is a difficult process. But with the right training and historical datasets, a machine can produce good results on the data. Sentext is a text sentiment analyser tool which determines the polarity of a given paragraph by classifying them into positive or negative sentiment. The project is targeted for a comparative analysis on the different classifiers that can be used for text-based sentiment analysis.
1.3 Motivation
Due to a large number of user input data these days, analysis and classification of user opinions become a tough task. To overcome this, text based sentiment analysis will be helpful in every aspect without human intervention.
Throughout this project, we perform the following activities:
- Find out different approaches of sentiment analysis.
- Deduce the importance of text-based sentiment analysis.
- Elaborate the process of sentiment analysis using text.
- Explain different approaches currently available for text-based sentiment analysis.
- Compare different classifiers for sentiment analysis .
- Implement the analysis technique with different classifiers to get the best results.
Chapter 2
Literature Survey
2.1 Classifiers in Machine Learning
In machine learning,classification is divided into two types:
- Supervised Classification: All data is labeled and the algorithms learn to predict the output from the input data. Examples of such classifiers are: Naive Bayes, Support Vector Machines, Maximum Entropy, Decision Tree, Random Forest, Neural Networks, Regression.
- Unsupervised Classification: All data is unlabeled and the algorithms learn to inherent structure from the input data. Examples of such classifiers are: K-Means clustering, Hierarchical clustering, Hebbian learning model, Expectation-maximization algorithm.
2.2 Different Classifiers for text-based sentiment analysis
2.2.1 Naive bayes Classifier
The Naive Bayesian classifier is uncomplicated and widely used method for supervised learning. Bayes’ theorem was named after the Reverend Thomas Bayes (170261), who studied how to compute a distribution for the probability parameter of a binomial distribution. It is one of the fastest learning algorithms, and can deal with any number of features and classes. Naive Bayesian performs incredibly well in a variety of problems. Furthermore, Naive Bayesian learning is robust enough hat small amount of noise does not disturb the results. Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. It is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable.
2.2.2 KNN Classifier
K-Nearest Neighbour is non-parametric classifier which classify unknown points by using nearest neighbor. A (k,l) nearest neighbour classifier - given a feature vector x
- Class with most votes in k nearest examples
- But if less than I votes dont classify
- What are the nearest neighbours? - search?
- What should be the distance metric? Feature Vector: length, colour, angle - mahalanobis?
- Can have excellent performance for arbitrary class conditional pdfs.
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression. The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.
2.2.3 SVM Classifier
Support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. Support vector machine can be referred to as supervised machine learning algorithm. Important property of SVM is that their ability to learn can be independent of dimensionality of feature space. It can be used for classification and regression problems. There are several advantages of using SVM to train the system. SVM tends to deal with high dimensional data sets. SVM do not address to the local minimum of the error rate. This caused to increase the accuracy of SVM.
2.2.4 Decision Tree classifier
Decision trees are one of the most widely used machine learning algorithms. They are popular because they can be adapted to almost any type of data. They are a supervised machine learning algorithm that divides its training data into smaller and smaller parts in order to identify patterns that can be used for classification. Whenever an unknown label is given, inorder to classify it, the data is passed through the tree. At each decision node a specific feature from the input data is compared with a constant that was identified in the training phase. The decision will be based on whether the feature is greater than or less than the constant, creating a two way split in the tree. The data will eventually pass through these decision nodes until it reaches a leaf node which represents its assigned class.
2.2.5 Neural Networks
Neural networks are used in a wide variety of domains for the purpose of classification. The main difference for neural network classifiers is to adapt these classifiers with the use of word features. We note that neural network classifiers are related to SVM classifiers. Each unit receives a set of inputs, which are denoted by the vector Xi, which in this case, correspond to the term frequencies in the ith document. Each neuron is also associated with a set of weights A, which are used in order to compute a function of its inputs.
2.2.6 Random Forest
Random Forest consists of many classification trees known as tree classifiers, which are used to classifies the news articles based on the categorical dependent on text. Each tree gives a class for the input text documents and the class with highest weight words will be chosen. This classifier’s error rate depends on the correlation between any two trees in the forest and the strength of the each individual tree in the forest. In
School of Computer Engineering, KIIT, BBSR 5
A Comparative analysis on different classifiers for text-based sentiment
Analysis order to minimize the error rate the trees should be strong and independent of each other.
Chapter 3
Software Requirements Specification
Sentext uses machine learning algorithms to run on a system. For a smooth execution, a powerful processor is recommended. Apart from this, following are some recommended specifications:
3.1 Hardware Requirements
Ram: Minimum 1 GB
Hard Disk: Minimum 10 GB
Processor: Pentium 4 and above
3.2 Software Requirements
Operating system: Ubuntu(recommended), Windows, Mac
Python version: 2.7, 3.4, or 3.5
NLTK module
NLTK data
Other pip modules for python
Chapter 4
Requirement Analysis
Sentiment analysis or opinion mining is the process of determining the emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions and emotions expressed. Sentiment Analysis is an ongoing field of research in text mining field. An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The average human reader will have difficulty identifying relevant sites and accurately summarizing the information and opinions contained in them. Moreover, it is also known that human analysis of text information is subject to considerable biases. Additionally, a large collection of text usually becomes hectic for a human to analyse and deduce the sentiment from them and . These are some scenarios where Text-based sentiment analysis comes handy. We feed in the raw data to analyse, and get the results in seconds.
There are numerous applications of text sentiment analysis:
- Determining the polarity of user reviews.
- Customer email response satisfaction.
- Analysis of questions like - Why customers are not buying a specific product?
- Politics and socialisation.
- Analyzing trends, identifying ideological bias.
- Targeting advertising/messages, gauging reactions.
- Evaluation of public/voters’ opinions
- And a lot more.
Chapter 5
Implementation
5.1 Dataset used
SenText uses the movie reviews dataset provied by nltk. The dataset contains 1000 positive reviews in one direcory, and 1000 negative review files in another directory. The training is done on 750 positive review files and 750 negative review files, to- talling an amount of 1500 files for the training data. The testing, however is done on 250 negative review files and 250 positive review files, totalling an amount of 500 training instances.
5.2 Platform used
The whole development is done using python language.
5.3 Result
5.3.1 Naive Bayes Classifier
Test ID Test Condition System Behavior Expected Result Accuracy
T01 Good Boy Positive Positive 80.8%
T02 A ridiculous movie Negative Negative 80.8%
T03 Spread hatred Positive Negative 80.8%
T04 Not a good movie Negative Negative 80.8%
Chapter 6
Screenshots of Project
6.1 Positive Sentiment
6.2 Negative Sentiment
Chapter 7
Conclusion and Future Scope
7.1 Conclusion
I did the analysis of the emotional polarity of text as two-classification problems. I used the tokenised method to represent a text, and then used Naive Bayes classifier to give out the result of classification. Our main operation to the data set was cleaning, Word segmentation, removing stop words, feature selection and clas- sification. The experiment results show that the Naive Bayes classification gave a good accuracy of about 80% in the classification.
7.2 Future Scope
In the subsequent developments, I will use different available classifiers in the literature for text-based sentiment analysis and also perform a comparative performance analysis on the different classifiers available for text-based sentiment analysis.
References
- Approaches, Tools and Applications for Sentiment Analysis Implementation; Alessia DAndrea, Fernando Ferri, Patrizia Grifoni, Tiziana Guzzo
- Sentiment analysis algorithms and applications: A survey; Walaa Medhat, Ahmed Hassan, Hoda Korashy
- A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation; Anas Collomb, Crina Costea, Damien Joyeux, Omar Hasan, Lionel Brunie
- A Survey On Sentiment Analysis Methods and Approach; Ms.A.M.Abirami, Ms.V.Gayathri
- The Role of Text Pre-processing in Sentiment Analysis; Emma Haddi, Xiaohui Liu, Yong Shi School of Computer Engineering, KIIT, BBSR 13