• Aucun résultat trouvé

Semantic Intensity (Valence-based) Lexicons

Chapter 4 33

4.3 Sentiment Lexicons

4.3.2 Semantic Intensity (Valence-based) Lexicons

The ability to calculate the sentiment intensity would help researchers and analysts to do more rigorous and in-depth study and rather not just settle down with binary polarity of a positive or negative sentiment of a text. Valence based lexicons allow us to find the sentiment intensity in text. It is very crucial and helpful for analysts and research to understand how the sentiment intensity for a product has changed over the period. Due to these reasons having a lexicon with strength valence would be helpful as it allows analysts and researchers to rhetorically analyze the change in user interest over a time.

The extension of the WordNet database is SentiWordNet, which provides the valence strength of the text. Each word in the sentence is attached with a numerical score, which ranges between 0 to 1. These scores are calculated using highly complex semi-supervised algorithms. Although it is not the gold standard resource like WordNet, LIWC, etc., it is helpful in determining the strength of sentiment for most domains. Natural Language Processing Toolkit (NLTK) in Python provides the interface for SentiWordNet for the use of analysts and researchers. But often, SentiWordNet fails to determine the sentiment intensity for social media text correctly.

4.3.3. VADER (Vader Aware Dictionary for sEntiment Reasoning)

The major short coming of the two approaches discussed was that they are unable to handle social media texts. VADER promises to leverage the benefits of rule-based modeling for calculating the sentiments of text by adding the support of social media style texts, yet able to efficiently handle textual data from other domains too. This is achieved by adding additional lexical features commonly used in lexical resources like SentiWordNet, WordNet, LIWC, etc.

to express sentiment in social media text (emoticons, slang, acronyms). (Hutto & Gilbert 2013) [9] points out that the VADER

lexicon performs very well in the social media domain. The correlation coefficient determines that VADER (r = 0.881) performs as well as singular human raters (r = 0.888) at rivaling ground truth (aggregated group mean from 20 human raters for sentiment intensity of each tweet). Even though the corpus used in VADER has more than 9000 lexical features which is higher than other gold standard lexical resources it is still fast enough to be used online with streaming data and does not have any performance lag.

VADER sentiment lexicon administers both polarity and strength of sentiments shown in social media contents. It provides a ‘normalized, weighted composite score’ also called as

compound score which is the sum of valence scores of each word in the lexicon, and then it is normalized to a range of (-1, 1) where -1 is most negative and +1 is most positive.

In this thesis, after the comparative study of all the types of lexical resources, VADER has been used to calculate the sentiment polarity and intensity of every user tweet. VADER provides the compound scores; the user tweets are then classified into positive, negative, and neutral categories based on the threshold value. In this analysis, the value of 0.1 is used. The rules for labelling each user tweet into sentiment category based on the threshold are shown below: -.

1. Positive Sentiment: compound score >= 0.1 2. Negative Sentiment: compound score <= -0.1 3. Neutral Sentiment: -0.1 < compound score ≤ 0.1

The next chapter discusses the results and discussion of the data processed. The comparative study is conducted for major political parties and its leaders based on the volume of tweets, change in sentiment, etc. from 1st February 2019 to 23rd May 2019.

Chapter 5

Results and Discussion

This chapter aims to analyze the overall twitter data collected over the campaigning period of four months and present the results of the sentiment analysis conducted for each party and its leader.

5.1. Results

The election campaigns picked up from the month of March 2019 to May 2019, but to understand the transition and analyze campaign growth, the data was extracted from user tweets from 1st February 2019 to 23rd May 2019. The dataset contains approximately 200,000 tweets; this is after filtering the tweets based on the keywords listed in the data collection section. After conducting data preprocessing by cleaning the raw tweets and mapping the tweets to the party and its candidates Tables 5.1 and Table 5.2 shows the distribution of tweets by party and candidate.

Table 5.1: Number of Tweets by Party Party Number of Tweets

BJP 1,54,667

Congress 33,260

AAP 8,497

BSP 185

Table 5.2: Number of Tweets by Candidate

The distribution of user tweets shows that during the election campaign, BJP and Congress were the two major parties for whom the public was showing more interest, followed by AAP and BSP.

5.2. Analyzing Popularity of Party and its Candidates

Analyzing the popularity of parties and their candidate in the twitter sentiment analysis is used to understand how the user sentiment changes over the period. Parties and their leaders need to understand this shift in sentiment to adjust and change their campaigning strategies over the period during the elections. Moreover, it is also important to relate this by the change in the volume of user tweets during the elections for each party and its candidate.

Candidate/Party Number of Tweets

Narendra Modi (BJP) 1,35,822

Rahul Gandhi (Congress) 27,868

Arvind Kejriwal (AAP) 20,222

Mayawati (BSP) 161

5.2.1. Volume Analysis

Figure 5.1, Figure 5.2, and Figure 5.3 presented is the frequency of user tweets changing over months, each week, and each day for each party considered for comparison.

Figure 5.1: Change in frequency of tweets by month for each party

Figure 5.3: Change in frequency of tweets by day for each party

It can be clearly interpreted from Figure 5.1, Figure 5.2 and Figure 5.3 that the frequency of user tweets is more for BJP than its principal rival Congress; moreover, the frequency of user tweet has decreased for Congress over a period of 4 months. There is a slight increase in the number of user tweets for AAP, but it does not stand anywhere close to BJP.

A relative trend can also be seen in the frequency of user tweets for the candidate, Narendra Modi has far better reach over its rival candidates. Rahul Gandhi and Arvind Kejriwal user tweets very less compare to Narendra Modi. BSP tweets very lower to compare all leaders.

This can be easily interpreted from Figure 5.4, Figure 5.5, and Figure 5.6.

Figure 5.4: Change in frequency of tweets by monthly for each candidate

Figure 5.5: Change in frequency of tweets by week for each candidate

Figure 5.6: Change in frequency of tweets by day for each candidate

5.2.2. Analyzing change in sentiment

This sub-section focuses on analyzing the change in sentiment of parties and its candidate during election campaigns. As mentioned earlier, election campaigns picked up from the month of February 2019. Figure 5.7 displays the average sentiment (its compound sentiment value or sentiment strength provided by VADER) value for each month during the campaign period.

Since BJP was already elected at the center in the 2014 Lok Sabha elections, it was expected to have higher overall sentiment for February month. Although the plot shows that BSP has a higher sentiment score than all other parties, the reason being that BSP has a much lower number of user tweets than other parties, therefore, taking the overall sentiment score more than other parties. All the parties have seen the overall sentiment on the positive side during the initial election campaigning period, but for April and May when the actual voting started

in phases countrywide except BJP, all other parties has seen the shift in sentiment on the negative side, especially smaller parties like AAP and BSP. Figure 5.8 and Figure 5.9 plots displays average each party sentiment by each week and each day of campaign period.

Figure 5.7: Change in Party sentiment for each month

Figure 5.9: Change in Party sentiment for each day

A similar trend is observed for change in sentiment for each candidate of the parties over the campaigning period. Figure 5.10 shows that the overall sentiment of user tweets for the candidate Narendra Modi has doubled since elections campaigns started. The candidate appointed by Congress, Rahul Gandhi, has a significant drop in the overall sentiment; this is the same for Arvind Kejriwal, AAP, for whom the overall sentiment was negative for the month of May. Mayawati, BSP, although it sees a sharp rise in positive sentiment from April to May, again, the overall number of user tweets for Mayawati is much lower than other candidates; therefore, those numbers are not significant concerning other candidates. Figure 5.11 and Figure 5.12 display average sentiment of candidates by each week and each days of campaign period.

Figure 5.10: Change in Candidate sentiment for each month

Figure 5.11: Change in Candidate sentiment for each week

Figure 5.12: Change in Candidate sentiment for each day

From the above all plots, it can be positively stated that BJP and its leader Narendra Modi have higher overall sentiment and popularity.

5.3. Word Cloud

Word Clouds are the simplest and efficient way to understand the overall meaning of the text.

Moreover, it is used to represent the frequency of various words or phrases in the user tweet.

With the help of word cloud, it can easily be analyzed which words or phrases are used by the public to annotate political parties and their leaders. It can also help the political agenda of each of the parties and how the public responds to them with its opinions on twitter.

In this section, a comparison of word clouds of major parties is performed; these include BJP, Congress, and AAP. Smaller parties like BSP, DMK, TMC are not involved as the number of tweets related to these parties is not at a level with BJP and Congress, the two main rivals for 2019 Indian Lok Sabha Elections.

Figure 5.13 shown below is the word cloud for Bhartiya Janata Party, BJP and its leader Narendra Modi. The Figure reveals a positive sentiment towards BJP and its leader, phrases like “mainbhichowkidar”, “best defeat”, “modi ji”, “citizen loves”, etc. are used at frequencies than others which equates to higher popularity and following among people.

“mainbhichowkidar” was one of the critical slogans among people which translates to the

agenda of BJP for the 2019 elections which

was strengthened Indian Army and Make in

India movement.

Figure 5.13: Word Cloud of BJP (Narendra Modi)

Due to the dominance of BJP over the election campaign, the word cloud in Figure 5.14, for

tweets.

Figure 5.14: Word Cloud of Congress (Rahul Gandhi)

Similarly, for Aam Aadmi Party (AAP) and its leader Arvind Kejriwal, the word cloud in figure 5.15 shows words like “ashamed” “prove anything”, “withdraw”, etc. which highlight the negative sentiment. There are words like “cm like”, “modesty”, etc. which shows some

level of positive

sentiment from the

public. Overall, the

number of user tweets

for AAP was very less

than BJP and Congress,

and due to those reasons

overall sentiment is towards neutral which can also be seen in Table 5.3 and Table 5.4

Figure 5.15: Word Cloud of AAP (Arvind Kejriwal)

5.4. Discussion

The sentiment analysis was conducted using VADER for calculating the polarity and sentiment intensity of text. Table 5.3 and Table 5.4 shows the distribution of sentiment score for the party and its candidates in percentage since the number of tweets extracted for each party and candidate is different, so it is incorrect to compare the absolute count of tweets for each of positive, negative and neutral polarity respectively.

Table 5.3: Distribution of Tweet Sentiment by Party

Party Positive Negative Neutral

BJP 50.02% 18.94% 31.03%

Congress 35.78% 27.80% 36.42%

AAP 36.98% 38.51% 24.51%

BSP 22.94% 49.54% 27.52%

Table 5.4: Distribution of Tweet Sentiment by Candidate

Party Positive Negative Neutral

Narendra Modi 50.51% 18.54% 30.96%

Rahul Gandhi 36.80% 27.39% 35.82%

Arvind Kejriwal 36.65% 34.16% 29.19%

Mayawati 38.29% 36.57% 25.14%

In the overall results shown in Table 5.3 and Table 5.4, the positive sentiment for BSP party is 22.94%, whereas its candidate Mayawati has a positive sentiment of 38.29%. There is an approximately 15% gap between positive sentiment between BSP Party and its leader Mayawati as the number of tweets distribution for BSP and Mayawati is very low compared to all other parties. It is difficult to compare Party and its leader in this case as people show more negative sentiment for other BSP candidates except for Mayawati, as a result the overall party sentiment is more negative than its leader. That is why BSP sentiment is showing 49.54% negative and only 22.94% positive. But as people think positive for Mayawati, so candidate wise, Mayawati gets a positive sentiment of 38.29%. The tweets distribution for other parties and its leaders is more than BSP and its leader. BJP, Congress, and AAP and its leaders Narendra Modi, Rahul Gandhi, and Arvind Kejriwal have consistent results with the tweet distribution. It can be clearly inferred from Table 5.3 and Table 5.4 that both BJP and

its leader Narendra Modi have higher positive sentiment. Also, the number of tweets extracted based on keywords is more than the other parties and their leaders, which shows Narendra Modi had more popularity and positive opinion among the people. Figure 5.16 shows higher negative sentiment for the parties AAP and BSP and their leaders Arvind Kejriwal and Mayawati, respectively than BJP and Congress and their leaders Narendra

Figure 5.16: Distribution of Sentiment by Party and Candidate

The election results in Table 5.6 show that Congress won second highest number of seats in

the Lok

Sabha

election 2019. Due to higher popularity of Narendra Modi among people and since he was the existing Prime Minister during the 2019 elections, BJP won the election with a whopping 303 seats out of contested 542 seats. Rahul Gandhi, the leader of Congress party failed to reserve the majority and in contrast, Congress was able to win merely 52 seats.

Table 5.6: Distribution of seats won in parliament in Lok Sabha election 2019

Party No of Seats Won

BJP 303

Congress 52

BSP 10

AAP 1

Chapter 6

Conclusions and Future Work

6.1. Conclusions

In this thesis, our target was to conduct twitter sentiment analysis for the 2019 Indian Lok Sabha Election for major parties and their candidates. User tweets were extracted from twitter archiver between 1st February to 23rd May 2019 based on keywords and hashtags related to the trend observed during the election period. In order to pre-process or filter the noise from the textual Twitter data, it is necessary to perform the sequence of pre-processing steps, which included cleaning of raw tweets and mapping the tweets to referenced party and the candidate. This overall helped in generating more accurate tweet which enables precise calculation of polarity of tweets.

A model was proposed to perform sentiment analysis on a linguistic dataset, highlighting the use of the VADER sentiment analyzer specifically attuned to calculate the polarity of text expressed in social media rather than using a traditional dictionary-based approach that maintains WordNet for positive, negative and neutral polarity keywords. It was seen that people had higher positive sentiment and emotions for Narendra Modi and its party BJP than Congress, AAP, and BSP.

The results of sentiment analysis follow the actual election results which resulted in BJP winning the election by a majority. With this, we can conclude that sentiment analysis and opinion mining can help us understand how people respond to the campaign and which party and candidate are leading the election race. In this new age of internet and its ease of access,

politician and their parties are taking full advantage of it by professionally targeting user groups for running their campaigns and leading the election race.

6.2. Future Work

The future scope of this work can include more keywords for the data extraction process and increase Tweet distribution for the party and its candidates and more parties and candidates can be added in the tweet extraction. Additionally, changes in data extraction techniques can be made by using advanced big data frameworks like Apache Spark, as such sophisticated event processing engines can process vast amounts of data. Due to advances in technology and growing digitalization, the data size of social media texts is growing manifolds and using Apache Spark or Spark Streaming will help us perform the extraction, transformation, and loading in near real-time. This can be achieved by running spark jobs in a cluster environment created on AWS or Azure cloud services. This thesis mainly concentrated efforts on classifying positive, negative, or neutral sentiments correctly, but it would be interesting to distinguish between subjective and objective tweets since this would be a high initial filter. Also, the approach handles the topic of classification for tweets based on the presence or absence of a keyword that we called “main feature”, but a more elaborated method could be developed where the context or even synonyms are taken into consideration.

Not only expanding the amount of manually classified datasets but also expanding the domain of the inputs would be interesting. In this study, the focus was on Twitter, but more social networks could be added. Additionally, it would be interesting to create datasets with a

with a balancing algorithm more balanced distribution of tweets from different classes .

References

1. Indian General Election: https://en.wikipedia.org/wiki/2019_Indian_general_election 2. World bank internet:

https://data.worldbank.org/indicator/IT.NET.USER.ZS?locations=IN

3. Demographics of India: https://en.wikipedia.org/wiki/Demographics_of_India

4. Tumasjan, A.; Sprenger, T. O.; Sandner, P.; and Welpe, I. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of ICWSM 5. O‟Connor, B.; Balasubramanyan, R.; Routledge, B. R.; and Smith, N. A. 2010. From

Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In ICWSM 6. Adam Bermingham and Alan F Smeaton. 2011. On using Twitter to monitor political

sentiment and predict election results. In Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011), pages 2–10.

7. A. Das and S. Bandyopadhyay, “SentiWordNet for Bangla,” Knowledge Sharing Event-4: Task, Volume 2, 2010.

8. D. Das and S. Bandyopadhyay, “Labeling emotion in Bengali blog corpus - a fine grained tagging at sentence level,” Proceedings of the 8th Workshop on Asian Language Resources, pp. 47–55, Aug. 2010.

9. C. J. Hutto and E. E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media,” presented at the Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, Ann Arbor, MI, 2015.

10. M. Eirinaki, S. Pisal, and J. Singh, “Feature-based opinion mining and ranking,” Journal of Computer and System Sciences, vol. 78, no. 4. pp. 1175–1184, 2012, doi:

10.1016/j.jcss.2011.10.007.

11. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.

12. D. J. S. Oliveira, P. H. de Souza Bermejo, and P. A. dos Santos, “Can social media reveal the preferences of voters? A comparison between sentiment analysis and traditional opinion polls,” Journal of Information Technology & Politics, vol. 14, no. 1. pp. 34–45, 2017, doi: 10.1080/19331681.2016.1214094.

13. Kim, S., Hovy, E. Determining the sentiment of opinions. International conference on Computational Linguistics (COLING’04). 2004.

14. Parrott, W. G. "Emotions in social psychology: Volume overview." Emotions in social psychology: Essential readings. Ed. W. G. Parrott. Philadelphia: Psychology Press, 2001:

1-19.

15. Medhat, W., Hassan, A., & Korashy, H. Sentiment Analysis algorithms and applications:

A survey. AinShams Engineering Journal, 5(4), 1093-1113. 2014.

16. Luciano Barbosa and Junlan Feng. Robust Sentiment Detection on Twitter from Biased and Noisy Data. In Proceedings of the international conference on Computational Linguistics (COLING), 2010.

17. R. Jose and V. S. Chooralil, "Prediction of election result by enhanced sentiment analysis

17. R. Jose and V. S. Chooralil, "Prediction of election result by enhanced sentiment analysis

Documents relatifs