Exploratory Analysis of Different Types of Adjectives for Sentiment Classification

Online websites available on the internet, Amazon, provide a platform for users to share their valuable opinions. Users’ reviews are available in thousands, and extracting useful information from these reviews manually is a crucial task for managers, companies, and users. Reviews are written in natural language, and to extract useful information from the reviews, there is a need for an automatic technique known as sentiment analysis. Researchers have used polarity features like nouns, adjectives, verbs, and adverbs to mine sentiment using one feature or combination. The studies showed that adjectives remain the most prominent feature. However, previous research has not evaluated many types on a comprehensive dataset. This research focuses on the identification of different kinds of adjectives in a given text, the identification of the best type and the classification of different kinds as positive, negative, and neutral. Understanding the nuanced impact of various adjectives in sentiment classification is crucial for developing more accurate and context-aware natural language processing models, essential for applications ranging from sentiment analysis to customer reviews. We comprehensively evaluate different types of adjectives on machine learning algorithms. The experimental results are performed on an annotated dataset of 58,258 office product reviews collected from Amazon. Out of these evaluated adjectives, opinion adjectives have the highest precision of 0.931 on the Naïve Bayes classifier and show the best sentiments.


Introduction
The advancement in e-commerce has led to a fast change in the trading process.People's opinions have moved from traditional commerce to e-commerce in the past few years.Companies have enabled users to share their opinions online about products to create more traffic and increase sales.The reviews are growing faster because most customers share their views about the products on the web.Many reviews are available for a particular product, and it has become difficult for a new customer to read all the reviews about a single product and then decide about that product.New customers check and rely on these opinions.Manual efforts take more time to analyze these reviews.Extracting reviews from the web, finding users' opinions from the textual data, and then classifying them as positive, negative, and neutral are difficult and time-consuming tasks.So, an automatic technique is essential in opinion mining to classify the opinions as positive, negative, and neutral.Therefore, sentiment analysis aims to automate the process of reviews based on opinion summarization of reviews like positive, negative, or neutral.It focuses on a given text and determines its sentiment in terms of positive, negative, or neutral text.So, sentiment analysis has become a challenging research issue.Researchers in sentiment analysis have focused on these problems-identifying whether the given text is objective or subjective and whether the text is positive, negative or neutral.Two approaches are used mainly: supervised machine learning sentiment classification and unsupervised machine learning sentiment classification.Researchers have proposed different methods for sentiment classification based on polarity-bearing features in reviews, e.g.nouns, verbs, adverbs and adjectives.However, no research has been done on a comprehensive dataset of different types of adjectives.This research focuses on the identification of different types of adjectives in a given text, the identification of the best type and the classification of different types as positive, negative, and neutral The adjective types are (1) descriptive or adjective of quality, (2) quantity or numeric adjective, (3) predicative adjectives, (4) personal titles, (5) possessive adjectives, (6) demonstrative adjectives, (7) indefinite adjectives, (8) interrogative adjectives, (9) comparative adjectives, (10) superlative adjectives.In sentiment analysis, polarity feature extraction (polarity-bearing words) is one of the most complex tasks since it requires natural language processing techniques to identify the polarity features in the opinions under analysis automatically.The feature extraction process depends on the data or reviews.Preprocessing is the first step in feature extraction.The input text is split into sentences and then analyzed by a Part-of-Speech (POS) tagger.Then, identify adjectives for each sentence in the given text and each noun in the candidate features list and determines identify whether there is an adjective immediately before or after the noun, separated only by prepositions or stop words.Now, all the adjectives that are identified are stored in the relevant adjectives list.Now, identify all types of adjectives manually.Adjectives have different types, and these types are classified as positive, negative, or neutral.In section 2, the literature survey is reviewed.In section 3, the methodology of the proposed architecture is discussed.The details of the studies and the experimental results come in section 4, and in section 5, the discussion of research findings is concluded.

Literature Survey
The current sentiment analysis focuses on classifying polarities like positive, negative, and neutral in the reviews that express sentiments to decide the polarity of a sentence in a document.Some previous studies on sentiment analysis focus on sentence-level sentiment polarities using a BOW (bag-of-word) model to address and solve the polarity shift problem (Kolekar et al., 2016) by detecting, modifying, and removing negation from the text.This paper also deals with opinion features.The users' opinions about a product are identified based on their online reviews.Semantic orientation was proposed (Mehta et al., 2016), automatically finding frequently used terms in online reviews.For this, an Unsupervised approach/Natural language processing (NLP) is used that automatically extracts meanings of a text from natural language (Sri & Ajitha, 2016;Subrahmanian & Reforgiato, 2008;Bethard et al., 2004;Vermeiji, 2005) and uses a based approach to determine the sentiments in patterns of words to find the co-occurrence, which also uses resources/lexicon like Sentiwordnet, Wikictionary, to see the emotional similarities between words.This approach determines the word's sentiments by using antonyms and synonyms.A sentence-level sentiment analysis is proposed (Fang & Zhan, 2015) using online product reviews.An algorithm is also proposed and implemented for negative sentence identification and sentiment score computation.Sentiment analysis was done on different levels, such as document level, aspect level, and sentence level.A Sentiwordnet algorithm (Tomar & Sharma, 2016) was proposed to find the sentence-level polarity.POS (Parts-of-Speech) tagger is used to determine the polarity of text by proposing a new Sentiwordnet algorithm.On the document level, an Adverb-Adjective-Noun-Verb (AANV) combination in sentiment analysis is proposed (Sarkar et al., 2012).AANV technique is based on analyzing adverbs, adjectives, abstract nouns, and categorized verbs.This technique defines a set of general axioms.Entropy, conditional entropy, and information gain concepts have been used to evaluate the proposed system.Adverb-Adjective Combination is very important in sentiment analysis, but the Adverb-Adjective-Noun (AAN) (Sing et al., 2012) combination was proposed, and it provides better results than using Adverb-Adjective Combination only.Adverb-Adjective Combination (AAC) (Benamara et al., 2017) gives higher Pearson correlations than previously used algorithms that did not use Adverb-Adjective Combination.Another technique is proposed (Subrahmanian & Reforgiato, 2008) to find the polarity of sentiment at the sentence level by combinations of Adjective-Verb-Adverb (AVA).Adverbs and adjectives combination technique is used to extract the opinion (Bethard et al., 2004) at the sentence level.A manually scored adjectives and adverbs (Yu & Hatzivassiloglou, 2003) sum-based scoring method is used in sentiment analysis while using a template-based method (Chklovski, 2006) to set values of sentiments at a degree of [-2, 10] scale is also proposed.A few experiments on subjectivity and polarity classifications of the topic-and genre-independent blog posts, using linguistic features, verb class information, and the online Wikipedia dictionary (Chesley et al., 2006) were used to identify the polarity of adjectives.The framework Hu04 (Vermeij, 2005) which summarizes online users' reviews by extracting opinions on product features and classifies them as positive or negative, is expanded in this paper, as shown in table 1.After critical analysis, we have identified that different researchers' have used different polarity features like noun, adjective, verb, and adverb to mine sentiment using one feature or by using different combinations.The literature survey showed that adjectives remain the most prominent feature however, adjective has many types that are not evaluated on a comprehensive dataset by the previous researches.The types are; (i) descriptive or adjective of quality, (ii) quantity or numeric adjective, (iii) predicative adjectives, (iv) personal titles, (v) possessive adjectives, (vi) demonstrative adjectives, (vii) indefinite adjectives, (viii) interrogative adjectives, (ix) comparative adjectives, (x) superlative adjectives.This research focuses on, (i) the identification of different types of adjectives in given text, (ii) the identification of best type, and (iii) the classification of different types as positive, negative, and neutral.

Methodology
Amazon receives millions of users' reviews per day and these reviews turned into a gold mine for the companies to analyze their brands by mining the sentiments of product reviews.The objective of polarity feature extraction is to extract the polarity feature restricted to adjectives and its different types, recognize the sentiments they express, and then classify them according to their polarities as shown in figure 1  Table 2: A Review (Sample) {"reviewerID": "A1F1A0QQP2XVH5", "asin": "B00000JBLH", "reviewerName": "Mark B", "helpful": [3, 3], "reviewText": "I like HP calculators and its features.I also like Scientific calculators and normal calculators for numeric calculations ...", "overall": 4.0, "summary": "better scientific results , but not durable like old HPs", "unixReviewTime": 1293840000, "reviewTime": "01 1, 2011"}

Pre-processing
Preprocessing is the initial step in the sentiment analysis which applies and removes all raw data in the reviews.Preprocessing avoids the unnecessary overhead of sentiment analysis process and improves the accuracy.In reviews, customers use symbols, periods, apostrophes, hyphens, nonalphabetic characters like numbers and smileys.In this paper, three main steps are involved in preprocessing: tokenization, stemming, and stop word removal.Tokenization is the process of breaking a sequence of strings into pieces such as phrases, symbols, words, and keywords called tokens.Tokens can be the individual words or the full sentences.For example, Apple iPhone is very good.Output: 'Apple', ' iPhone' 'is', very', 'good' break the string in tokens.Stemming is the process of removing morphological affixes from words.It is the process of reducing a word into its root form.For example, the word look, looks and looking all stem into look which is the original and correct word.The pre-processing also involves stop words removal.All punctuation periods, apostrophes, and hyphens, non-alphabetic characters like numbers and smileys are removed from the given dataset of reviews.

Part-of-Speech (POS) Tagging and Polarity Feature Extraction
Part-of-Speech (POS) tagging is the next step in sentiment analysis.This is the process of assigning a word to its grammatical category, in order to understand its role within the sentence.Parts of speech are verb, adjective, adverb, noun, pronoun, preposition, interjection and conjunction.Partof-speech taggers typically take a sequence of words (i.e. a sentence) as input, and provide a list of tuples as output, where each word is associated with the related as shown in Table 3. Stanford parser [10] is used for POS tagging on the given file of reviews.For example, ('iPhone', / 'NN'), ('is', /'VB'), ('very', / 'RB'), ('good', /'JJ').After applying POS tagging on the given file now extract the polarity features restrict to adjectives, adverbs, and verbs from the tagged file and make three separate files as these three provides more sentiment in a given piece of text.Predicative adjectives: These adjectives are not placed before a noun and follow a linking verb.This does not act as a part of the noun.This modifies and uses as a complement of a linking verb which joins it to the noun of the sentence.

Sentiment Score Calculation
Sentiwordnet is a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications.Sentiwordnet 3.0 is an improved version of Sentiwordnet 1.0n a lexical resource publicly available for research purposes.Sentiwordnet is one of these lexicons that assigns to each word of WordNet three sentiment numerical scores, positive, negative and neutral as shown in figure 2. Therefore, it is a knowledge base lexicon which can be used for assigning the scores.The sentiment scores are calculated both on sentence-level and review-level using the scores provided by Sentiwordnet.

Experimental Results
Dataset used in this paper for evaluation of the work is the office product reviews.Dataset consists of 53,258 reviews.Evaluation measures are precision, recall, and f-measure and machine learning algorithms are used for testing the dataset.Dataset is divided into 100 equal size subsets.In these 100 subsets, 10 subset is treated as testing data set for the classification models, and the remaining 90 subsets are used as training data sets.The cross-validation process is repeated 10 times and 10 subsets used one time as validation data.Now, the classifier calculates the average result from these folds and generates a single value.The evaluation measures used in this research are precision, recall, and f-measure that varies with the dataset used and the equations are given as follows; Precision = No. of correctly classified instances (Equation 1) Total No. of classified instances Recall = No. of correctly classified instances (Equation 2) Total No. of instances F-measure = 2 .Precision .Recall (Equation 3) Precision + Recall Naive Bayes classifier are a set of supervised learning algorithms.It is based on applying Bayes' theorem with the "naive" assumption of independence between every pair of features.For each class it calculates the posterior probability and for the class makes a prediction with the highest probability.The classifier settings are as follows; on training set (x (i) , y(i) ) for i = 1 . . .n, where each x (i) is a vector, and each y (i) is in {1, 2, . . ., j}.Here, j is an integer specifying the number of classes in the problem.
(Equation 4)    The result shows that using the Naïve Bayes classifier achieved 0.931 precision on opinion adjectives, as shown in figure 3. Feelings adjectives achieved 0.731 precision, as shown in figure 4. Size adjectives achieved 0.355 precision, as shown in figure 5, and colour adjectives achieved 0.245 precision, as shown in figure 6. on the Naïve Bayes classifier, and figure 7 shows the descriptive adjectives analysis graph.The findings suggest that the Naïve Bayes classifier performs quite well in identifying opinion and feelings adjectives, achieving precision scores of 0.931 and 0.731, respectively.This indicates that the classifier is adept at recognizing adjectives that express subjective opinions and emotions within the text.However, the precision drops significantly when it comes to identifying size and colour adjectives.Size adjectives achieved a precision of 0.355, while colour adjectives achieved an accuracy of 0.245.This suggests that the classifier may struggle more with adjectives that describe physical attributes like size and colour, possibly due to the nuances and variability in how these attributes are expressed in text.Figure 7, the descriptive adjectives analysis graph, likely provides an overview of the distribution of descriptive adjectives identified by the classifier across different categories or types.This graph could offer insights into the patterns and frequencies of descriptive adjectives used within the analyzed text data.While the Naïve Bayes classifier performs well in identifying opinion and feelings adjectives, it may require further refinement or additional features to improve its accuracy in recognizing size and colour adjectives.Additionally, examining the descriptive adjectives analysis graph can provide valuable context and understanding of the adjective usage within the text dataset.

Conclusion
This research uses the sentiment analysis approach to extract the different types of adjectives from users' reviews using POS tagging.Then, they were classified into three main polarities, i.e. positive, negative, and neutral, using Sentiwordnet and identified the following types: descriptive or adjective of quality and opinion, feelings, size and colour in the given dataset of 53,258 office product reviews collected from Amazon.We comprehensively evaluated different types of adjectives on the Naïve Bayes classifier.The experimental results showed that from all evaluated adjectives, opinion adjectives have the highest precision of 0.931 on the Naïve Bayes classifier and show the best sentiments.It is concluded that this research might help companies manage their online reputation and improve their products because understanding customers' preferences can be highly valuable for any product development, marketing, and customer relationship management.The exploratory analysis of different types of adjectives for sentiment classification offers a fascinating journey into the intersection of language, cognition, and computation.It deepens our understanding of how sentiment is conveyed through text and unlocks new avenues for developing more sophisticated NLP tools with practical applications in various domains.
The future directions of this research could be to exploit negation features expressions, symbols, and semantic analysis.Furthermore, identifying context represented in natural language could be challenging for the research community.(2006, January).Deriving quantitative overviews of free text assessments on the web.

Figure 1 :
Figure 1: Polarity feature extraction architecture Personal titles: These adjectives are used as titles like Miss, Mr, Mrs., Dr, Uncle, Prof. etc. and are used to explain the position of the noun.Possessive adjectives: These adjectives are used in the sentence to show the possession like: our, my, your, his, her etc.Demonstrative adjectives: The adjective's role is to demonstrate something like: this, that, these, those and what are used.Indefinite adjectives: These adjectives are formed from indefinite pronouns and do not indicate anything in particular like: any, few, many, more etc.Interrogative adjectives: These modifies a noun or a noun phrase and is similar to the interrogative pronoun like : where, why, which, what, who, etc.

Figure
Figure 6: Color adjectives on Naïve Bayes classifier

Table 1 : Previous research on sentiment analysis
.

Table 3 : POS tagging Reviews
Young, Old.10.Material: Stone, Carbon, Silver.11.Opinions: beautiful, soft, reliable.Numeric adjectives: These adjectives provide answer to the question of 'how much'.It describes the quantity or the numeric value present in the sentence.
contains different part-of-speech such as noun, adjective, verb and adverb etc.We identified adjectives and its different types by using stanford POS tagger.For example; I/PRP like/IN HP/NNP calculators/NNS and/CC its/PRP features/NNS I/PRP also/RB like/IN Scientific/NNP calculators/NNS and/CC normal/JJ calculators/NNS for/IN numeric/JJ calculations/NNS.The detail of different types of adjectives are described as below and a list of some common adjectives are shown in table 4. Colors: Pink, Yellow, Grey, Brown.2. Touch: Yummy, Bitter, Juicy, Sweet, Strong, Fresh.3. Feelings: Joy, Alone, Fear 4. Sizes: Tall, Small. 5. Origin: Latin, Greek 6. Shapes: High, Flat, Low, Narrow.7. Qualities: Good, Bad, Average.8. Time: Fast, Late, Long Modern, Slow, Rapid, Quick, Brief.9. Age:

Table 4 : Adjectives list
Comparative adjectives: These are used to compare two things in a clause like: nicer, taller, smarter, etc. Superlative adjectives: These adjectives convey the supreme value of the noun like: richest, happiest, etc.