The cutting edge technology in the modern setting has led to a rise on several social media platforms which are geared into making the daily life of human being more than comfortable. In this case, Twitter is one of the major social media platforms which are used by billions of users in the entire world. On the other hand, the rise of the social media platform has triggered the onset of malicious individuals who spend most of the time trying to disturb innocent social media users. These malicious activities range from spamming, hacking and phishing. Through this, the team of malicious people tends to send emails and messages which have no significance to users. Others use these opportunities to address the phishing texts; in case one clicks the message, the link directs to a malicious platform leading to the siphoning of individual private data. Notably, the spammer may tend to use viruses and worms which are geared to infect the user’s devices; therefore, to make it easy for the hackers to attack and manipulate the personal data and information stored in the machine. Social media users have suffered a lot from these set up leading to attack on the individuals’ bank accounts losing millions of dollars to these malicious scammers. This research report involves the use of sentimental analysis to distinguish any form of suspicious activity on Twitter. The experiment consists of the process of demonstrating one machine methodology which is easily applied in the filtering of the Twitter message to control and prevent Twitter spamming.
In this research context, the idea of sentiment analyses through the use of the linguistic and the textual assessment is deeply applied. The notion of sentiment analyses involve the use of the natural language process to analyze the word use, the combination of words and the word order to classify, analyze, and even classify the neutral polarity, positive and negative nature of the message data (Santos et al., 2014). Through the use of sentiment analysis, Twitter spasm can be easily analyzed since all the data gathered through the use of sentiment analysis are in a way believed to give detailed information on something and even provide direct access on the previous existence of the data at the same time providing the public opinion and feeling on the data. The research also involves the use of Twitter trending and spasms, through the pieces of monitoring and even analyzing the social reaction on the local topics trending. The main aim of the study is the act of analyzing the Twitter spasms, opinions, and the emotions which are expressed by the citizen through the use of R-studio in machine learning algorithms.
The use of sentiment analysis process tend to constitute of the four known significant steps; these steps include the use of data acquisition, classification of the data, data pre-processing and data analysis.
The Data Acquisition Process
This method/process involves the use of the R package which is used with the R studio to extract the tweets from Twitter and even subsequently used to create charts at the same time classifying the data in the form of emotions, spasm, and polarity. The method involves the process of installing the R packages which include the “twitter R,” “ply” and ROAuth (Bindu, 2018). Notably, the method involved the use of Twitter () functions which are mainly used in the R library to obtain Twitter tweets on the topic selected. Through the technique, hashtags and the use of the single tags and the double quotes were the main considered parameters used and accepted for the search Twitter functions (Coberly et al., 2014). Through this, the use of the Twitter API and searching for the tweets which are related to the keywords for the search were the primary samples for the research topic. For example, the search API includes the use of spasm words such as the temp = the search Twitter (“hashtag marijuana” spamming message). The search engine tends to involve and allow the queries to get the information against the used indices and search on the popular tweets incorporated. The search features used in this case were more effective and thus hard to be found on the mobiles or even in any web client, this becomes one of the most efficient methods and thus making it easy to detect on the searching Twitter spasms.
Through the use of R language, the R studio version provides two primary functions which were used to analyze and even to classify the Twitter messages into polarity spasm and also to support the spammer and not spammer functions. These also incorporate other features to test on the polarity (positive, negative and neutral) and the involved emotions from the public. In the case of emotions, the emotions incorporate the (joy, fear, anger, and surprise). Notably, analysis was then conducted on both of the involved tweets and the not retweeted as well as the use of retweets. Through this, after the process of involving and compiling on the two polarity functions, it becomes more than easy to check on the spasm message and even to understand on the negative, neutral and the positive retweets from the Twitter messages. On the other hand, the process involves checking on the R inability to function and even understanding the used Twitter dialects and even the limitation of the R studio to understand the dictionary words used. Notably the process of classifying the used tweets into either spasm or emotions was also another challenge since most of the tweets are in a way different and thus affected by the topical issues leading to a result of the unknown replies for the spasm and other associated tweets (Davies & Ghahramani, 2011). In this case, these are some of the chief essentials components which are used on the sentiment analysis research as it is conducted on the spasm message detection. Through this, it makes it hard and infective in describing the data which was collected from the Twitter messages.
In the process of data analysis, the use of the R studio software was much essential and useful. The software was used offline to analyze the data during the data preprocessing, post-processing and classification process. Conversely, to process all the information and the data gathered from the Twitter API, the use of the R type of file or dataset was used and then formatted to form a file extension in the form of arff also known as the (attribute file format). The file was then generated to form/develop an extension which later generated a file in form of the CSV file. The process also involved the use of a comma in order to separate the values used and also to separate the values. The arff file was used in this case to separate and incorporate the ASCII files. The ASCII file tends to describe on the list of the spasm messages and also in instant sharing of the data set attributes. Notably, the use of R studio was more significant in this process since it helped in making the sentiment analysis successive and also helped in separation of the comma values file and even in creating of the graphs and the tables.
Results and Finding
In this section it outlines the classification of Twitter tweets, which are associated with spamming and not spamming. These tweets are then downloaded and classified in four topical issues kind of dimensions. Through this, it portrays on how the sentimental analysis is used on the tweets and how it can be used to check and filter on the spamming process. The method is also used in checking on the public perceptions on the topical issues as select from the spamming tweets messages. Through this, the dominant approach used is to start by checking on the lexicon of the negative and the positive word and phrases. In this case, the use of polarity is incorporated in order to check if the word seems to spam or even to evoke something negative or even something positive to the public. For example, the use of beautiful portrays a positive polarity while the use of horrid will portray a negative one. Notably, the use of ‘we sell some product’ will present the spamming message.
Literature Reviews on Sentimental Analysis
Notably, another research depicts that the sentimental analysis method through the use of R language is much essential in the marketing of the company. Through this, companies are able to campaign and even to understand the customer’s perceptions and thus help in improving the firm’s service delivery to the customers. Conversely, through the analysis, companies can check on the sentiments which are derived to the citizen’s tweets and thus used to check on the company’s action of stock forecasting.
Technical Demonstration Chapter
Through the Twitter API and the use of sentimental analysis, it is much simpler to get on the Twitter spasm. In this case, the API tends to convert the pictures, and the simple codes used and thus save the codes in 5 minutes. Through this, it becomes more than easy to get on the Twitter spasm post and the Twitter API to capture the Twitter spasm codes using the R studio software (Souza & Vieira, 2012). The process involves the action/activity of getting the Twitter API access to run and set up the Twitter accounts and also to allow one to use and track the spasm message on Twitter.
How the R Algorithm Operates
Through the use of R language, Twitter is able to verify and classify various demographics which are expressed by people as they give their different opinion on Twitter. In this case, it involves the check on the several typical issues as classified on the methodology part. Through this, the process consists of the use of tweets and the retweets from different people which were then classified through the use of machine learning algorithm. For example by checking on Barrack Obama tweet during the Jamaican visit, it happened to represent a fourth bar form the quartet used on R studio. In this case, the tweet seems to have received around 2583 retweets which are positive and thus 658 negative retweets from the R language analysis.
Performance evaluation chapter
From the above revelation, it is clear that most of the tweets which were posted from people and especially the public; they received a kind of mixed reaction and emotions. On the other hand, the R studio found it hard to detect on the combined response which was varying from anger and joy. From the image presented below, it is clear that R studio happened to encounter a kind of mixed response and difficulties when trying to classify on the emotional reactions. In this case, the majority of the tweets were from the people who presented a kind of mixed feedback to the emotions.
Notably, the image portrays that R studio finds it hard to deal with majority of the tweets which were dealing with the legalization of the marijuana and discrimination of the same drug. Through this, these messages were not classified as either spasm, but they were classified as unknown reactions. Furthermore, there are so many factors which could have led to R studio classifying the information through the use of sarcasm and thus this is main reason which contributed to the difficulty on the R and also finds it hard in determining the emotions portrayed on the tweets. These tweets were either negative or positive as depicted on the topical issues selected during the process.
Overall, the use of R language and the sentiment analysis in triggering and checking on the Twitter spasm is one of the methods used in the current generation. Through the above explanation it is more than clear that the results received were more than correct. However, the results also portrayed a kind of inability which was contributed as a result of the R studio finding it hard to understand on the dialect used in some tweets and also due to the use of sarcasm in some of the spasm message tweeted. In conclusion, R language is more than vital when analyzing the Twitter spasm as long as there is correct dialect on Twitter tweets.
- Santos, I., Miñambres-Marcos, I., Laorden, C., Galán-García, P., Santamaría-Ibirika, A. and Bringas, P.G., 2014. Twitter content-based spam filtering. In International Joint Conference SOCO’13-CISIS’13-ICEUTE’13 (pp. 449-458). Springer, Cham.
- Coberly, J.S., Fink, C.R., Elbert, Y., Yoon, I.K., Velasco, J.M., Tomayao, A.D., Roque Jr, V., Tayag, E., Macasocol, D.R. and Lewis, S.H., 2014. Tweeting fever: Can twitter be used to monitor the incidence of dengue-like illness in the Philippines?. Johns Hopkins APL Tech Dig, 32(4), pp.714-25.
- Martinez-Romo, J., and Araujo, L., 2013. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8), pp.2992-3000.
- Davies, A., and Ghahramani, Z., 2011. Language-independent Bayesian sentiment mining of Twitter. In The 5th SNA-KDD Workshop’11 (SNA-KDD’11).
- Souza, M., and Vieira, R., 2012, April. Sentiment analysis on twitter data for Portuguese language. In International Conference on Computational Processing of the Portuguese Language (pp. 241-247). Springer, Berlin, Heidelberg.
- Bindu, P.V., Mishra, R. and Thilagam, P.S., 2018. Discovering spammer communities on Twitter. Journal of Intelligent Information Systems, 51(3), pp.503-527.
- Cui, A., Zhang, M., Liu, Y., Ma, S., and Zhang, K., 2012, October. Discover breaking events with popular hashtags in twitter. In Proceedings of the 21st ACM international conference on Information and knowledge management (pp. 1794-1798). ACM.