• Aucun résultat trouvé

Tracking sentiment in mail : How genders differ on emotional axes

N/A
N/A
Protected

Academic year: 2021

Partager "Tracking sentiment in mail : How genders differ on emotional axes"

Copied!
11
0
0

Texte intégral

(1)

Publisher’s version / Version de l'éditeur:

Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (ACL-HLT 2011, pp. 70-79, 2011-01-01

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. https://nrc-publications.canada.ca/eng/copyright

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.

Questions? Contact the NRC Publications Archive team at

PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.

NRC Publications Archive

Archives des publications du CNRC

This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

Tracking sentiment in mail : How genders differ on emotional axes

Mohammad, Saif M.; Yang, Tony (Wenda)

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

NRC Publications Record / Notice d'Archives des publications de CNRC: https://nrc-publications.canada.ca/eng/view/object/?id=2739d3c6-dc2d-479c-8e3a-65f42e7b7e3f https://publications-cnrc.canada.ca/fra/voir/objet/?id=2739d3c6-dc2d-479c-8e3a-65f42e7b7e3f

(2)

Tracking Sentiment in Mail:

How Genders Differ on Emotional Axes

Saif M. Mohammad† and Tony (Wenda) Yang†⋆

Institute for Information Technology, National Research Council Canada†. Ottawa, Ontario, Canada, K1A 0R6.

School of Computing Science, Simon Fraser University⋆

Burnaby, British Columbia, V5A 1S6.

saif.mohammad@nrc-cnrc.gc.ca, wenday@sfu.ca

Abstract

With the widespread usage of email, we now have access to unprecedented amounts of text that we ourselves have written. In this pa-per, we show how sentiment analysis can be used in tandem with effective visualizations to quantify and track emotions in many types of mail. We create a large word–emotion associ-ation lexicon by crowdsourcing, and use it to compare emotions in love letters, hate mail, and suicide notes. We show that there are marked differences across genders in how they use emotion words in work-place email. For example, women use many words from the joy–sadness axis, whereas men prefer terms from the fear–trust axis. Finally, we show vi-sualizations that can help people track emo-tions in their emails.

1 Introduction

Emotions are central to our well-being, yet it is hard to be objective of one’s own emotional state. Letters have long been a channel to convey emotions, ex-plicitly and imex-plicitly, and now with the widespread usage of email, people have access to unprecedented amounts of text that they themselves have written and received. In this paper, we show how sentiment analysis can be used in tandem with effective visu-alizations to track emotions in letters and emails.

Automatic analysis and tracking of emotions in emails has a number of benefits including:

1. Determining the risk of a repeat attempt by sui-cide note analysis (Osgood and Walker, 1959;

Matykiewicz et al., 2009; Pestian et al., 2008).1 2. Understanding how genders communicate through work-place and personal email (Boneva et al., 2001).

3. Tracking emotions towards people and entities, over time. For example, has a certain manage-rial course brought about a measurable change in one’s inter-personal communication? 4. Determining if there is a correlation between

the emotional content of letters and changes in a person’s social, economic, or physiologi-cal state. Sudden and persistent changes in the amount of emotion words in mail may be a sign of psychological disorder.

5. Enabling affect-based search. For example, ef-forts to improve customer satisfaction can ben-efit by searching the received mail for snippets expressing anger (D´ıaz and Ruz, 2002; Dub´e and Maute, 1996).

6. Assisting in writing emails that convey only the desired emotion, and avoiding misinterpre-tation (Liu et al., 2003).

7. Analyzing emotion words and their role in per-suasion in communications by fervent letter writers such as Francois-Marie Arouet Voltaire and Karl Marx (Voltaire, 1973; Marx, 1982).2 In this paper, we describe how we created a large word–emotion association lexicon by crowdsourc-ing with effective quality control measures (Section

1

The 2011 Informatics for Integrating Biology and the Bed-side (i2b2) challenge by the National Center for Biomedical Computing is on detecting emotions in suicide notes.

2Voltaire: http://www.whitman.edu/VSA/letters

(3)

3). In Section 4, we show comparative analyses of emotion words in love letters, hate mail, and sui-cide notes. This is done: (a) To determine the dis-tribution of emotion words in these types of mail, as a first step towards more sophisticated emotion analysis (for example, in developing a depression– happiness scale for Application 1), and (b) To use these corpora as a testbed to establish that the emo-tion lexicon and the visualizaemo-tions we propose help interpret the emotions in text. In Section 5, we ana-lyze how men and women differ in the kinds of emo-tion words they use in work-place email (Applica-tion 2). Finally, in Sec(Applica-tion 6, we show how emo(Applica-tion analysis can be integrated with email services such as Gmail to help people track emotions in the emails they send and receive (Application 3).

The emotion analyzer recognizes words with pos-itive polarity (expressing a favorable sentiment to-wards an entity), negative polarity (expressing an unfavorable sentiment towards an entity), and no po-larity (neutral). It also associates words with joy, sadness, anger, fear, trust, disgust, surprise, antici-pation, which are argued to be the eight basic and prototypical emotions (Plutchik, 1980).

2 Related work

Over the last decade, there has been considerable work in sentiment analysis, especially in determin-ing whether a term has a positive or negative polar-ity (Lehrer, 1974; Turney and Littman, 2003; Mo-hammad et al., 2009). There is also work in more sophisticated aspects of sentiment, for example, in detecting emotions such as anger, joy, sadness, fear, surprise, and disgust (Bellegarda, 2010; Mohammad and Turney, 2010; Alm et al., 2005; Alm et al., 2005). The technology is still developing and it can be unpredictable when dealing with short sentences, but it has been shown to be reliable when drawing conclusions from large amounts of text (Dodds and Danforth, 2010; Pang and Lee, 2008).

Automatically analyzing affect in emails has pri-marily been done for automatic gender identifica-tion (Cheng et al., 2009; Corney et al., 2002), but it has relied on mostly on surface features such as exclamations and very small emotion lexicons. The WordNet Affect Lexicon (WAL) (Strapparava and Valitutti, 2004) has a few hundred words

anno-tated with associations to a number of affect cate-gories including the six Ekman emotions (joy, sad-ness, anger, fear, disgust, and surprise).3 General Inquirer (GI) (Stone et al., 1966) has 11,788 words labeled with 182 categories of word tags, includ-ing positive and negative polarity.4 Affective Norms for English Words (ANEW) has pleasure (happy– unhappy), arousal (excited–calm), and dominance (controlled–in control) ratings for 1034 words.5 Mo-hammad and Turney (2010) compiled emotion an-notations for about 4000 words with eight emotions (six of Ekman, trust, and anticipation).

3 Emotion Analysis

3.1 Emotion Lexicon

We created a large word–emotion association lex-icon by crowdsourcing to Amazon’s mechanical Turk.6 We follow the method outlined in Moham-mad and Turney (2010; 2011). Unlike MohamMoham-mad and Turney, who used the Macquarie Thesaurus (Bernard, 1986), we use the Roget Thesaurus as the source for target terms.7 Since the 1911 US edition of Roget’s is available freely in the public domain, it allows us to distribute our emotion lexicon without the burden of restrictive licenses. We annotated only those words that occurred more than 120,000 times in the Google n-gram corpus.8

The Roget’s Thesaurus groups related words into about a thousand categories, which can be thought of as coarse senses or concepts (Yarowsky, 1992). If a word is ambiguous, then it is listed in more than one category. Since a word may have different emotion associations when used in different senses, we ob-tained annotations at word-sense level by first ask-ing an automatically generated word-choice ques-tion pertaining to the target:

Q1. Which word is closest in meaning to shark (target)? • car • tree • fish • olive This is followed by ten questions asking if the tar-get is associated with positive sentiment, negative

3 WAL: http://wndomains.fbk.eu/wnaffect.html 4 GI: http://www.wjh.harvard.edu/∼inquirer 5ANEW: http://csea.phhp.ufl.edu/media/anewmessage.html 6

Mechanical Turk: www.mturk.com/mturk/welcome

7

Macquarie Thesaurus: www.macquarieonline.com.au Roget’s Thesaurus: www.gutenberg.org/ebooks/10681

8

(4)

sentiment, anger, fear, joy, sadness, disgust, sur-prise, trust, and anticipation. The questions are phrased exactly as described in Mohammad and Tur-ney (2010; 2011).

The near-synonym is taken from the thesaurus and the distractors are randomly chosen words. This question guides the annotator to the desired sense of the target word, after which we ask whether it is associated with different emotions and polarities or not. If an annotator answers Q1 incorrectly, then we discard information obtained from the remaining questions. Thus, even though we do not have correct answers to the emotion association questions, likely incorrect annotations are filtered out. About 10% of the annotations were discarded because of an incor-rect response to Q1.

Each term is annotated by 5 different people. For 74.4% of the instances, all five annotators agreed on whether a term is associated with a particular emo-tion or not. For 16.9% of the instances four out of five people agreed with each other. The informa-tion from multiple annotators for a particular term is combined by taking the majority vote. The lexi-con has entries for about 24,200 word–sense pairs. The information from different senses of a word is combined by taking the union of all emotions asso-ciated with the different senses of the word. This resulted in a word-level emotion association lexicon for about 14,200 word types. These files are to-gether referred to as the NRC Emotion Lexicon ver-sion 0.92.

3.2 Text Analysis

Given a target text, the system determines which of the words exist in our emotion lexicon and calculates ratios such as the number of words associated with an emotion to the total number of emotion words in the text. This simple approach may not be reliable in determining if a particular sentence is expressing a certain emotion, but it is reliable in determining if a large piece of text has more emotional expres-sions compared to others in a corpus. Example ap-plications include detecting spikes in anger words in close proximity to mentions of a target product in a twitter stream (D´ıaz and Ruz, 2002; Dub´e and Maute, 1996), and literary analyses of text, for ex-ample, how novels and fairy tales differ in the use of emotion words (Mohammad, 2011b).

4 Love letters, hate mail, and suicide notes

In this section, we quantitatively compare the emo-tion words in love letters, hate mail, and suicide notes. We compiled a love letters corpus (LLC) v 0.1 by extracting 348 postings from lovingyou.com.9 We created a hate mail corpus (HMC) v 0.1 by col-lecting 279 pieces of hate mail sent to the Millenium Project.10 The suicide notes corpus (SNC) v 0.1 has 21 notes taken from Art Kleiner’s website.11We will continue to add more data to these corpora as we find them, and all three corpora are freely available.

Figures 1, 2, and 3 show the percentages of pos-itive and negative words in the love letters corpus, hate mail corpus, and the suicide notes corpus. Fig-ures 5, 6, and 7 show the percentages of differ-ent emotion words in the three corpora. Emotions are represented by colours as per a study on word– colour associations (Mohammad, 2011a). Figure 4 is a bar graph showing the difference of emotion per-centages in love letters and hate mail. Observe that as expected, love letters have many more joy and trust words, whereas hate mail has many more fear, sadness, disgust, and anger.

The bar graph is effective at conveying the extent to which one emotion is more prominent in one text than another, but it does not convey the source of these emotions. Therefore, we calculate the rela-tive salienceof an emotion word w across two target texts T1and T2: RelativeSalience(w|T1, T2) = f1 N1 − f2 N2 (1) Where, f1and f2 are the frequencies of w in T1and T2, respectively. N1 and N2 are the total number of word tokens in T1 and T2. Figure 8 depicts a relative-salience word cloud of joy words in the love letters corpus with respect to the hate mail corpus. As expected, love letters, much more than hate mail, have terms such as loving, baby, beautiful, feeling, and smile. This is a nice sanity check of the man-ually created emotion lexicon. We used Google’s freely available software to create the word clouds shown in this paper.12

9 LLC: http://www.lovingyou.com/content/inspiration/ loveletters-topic.php?ID=loveyou 10 HMC: http://www.ratbags.com 11 SNC: http://www.well.com/ art/suicidenotes.html?w

12Google WordCloud: http://visapi-gadgets.googlecode.com

(5)

Figure 1: Percentage of positive and negative words in the love letters corpus.

Figure 2: Percentage of positive and negative words in the hate mail corpus.

Figure 3: Percentage of positive and negative words in the suicide notes corpus.

Figure 4: Difference in percentages of emotion words in the love letters corpus and the hate mail corpus. The relative-salience word cloud for the joy bar is shown in the figure to the right (Figure 8).

Figure 5: Percentage of emotion words in the love letters corpus.

Figure 6: Percentage of emotion words in the hate mail corpus.

Figure 7: Percentage of emotion words in the suicide notes corpus.

Figure 8: Love letters corpus - hate mail corpus: relative-salience word cloud for joy.

(6)

Figure 9: Difference in percentages of emotion words in the love letters corpus and the suicide notes corpus.

Figure 10: Difference in percentages of emotion words in the suicide notes corpus and the hate mail corpus.

Figure 9 is a bar graph showing the difference in percentages of emotion words in love letters and sui-cide notes. The most salient fear words in the suisui-cide notes with respect to love letters, in decreasing or-der, were: hell, kill, broke, worship, sorrow, afraid, loneliness, endless, shaking,and devil.

Figure 10 is a similar bar graph, but for suicide notes and hate mail. Figure 11 depicts a relative-salience word cloud of disgust words in the hate mail corpus with respect to the suicide notes corpus. The cloud shows many words that seem expected, for example ignorant, quack, fraudulent, illegal, lying, and damage. Words such as cancer and disease are prominent in this hate mail corpus because the Mil-lenium Project denigrates various alternative treat-ment websites for cancer and other diseases, and consequently receives angry emails from some can-cer patients and physicians.

5 Emotions in email: men vs. women

There is a large amount of work at the intersection of gender and language (see bibliographies compiled by Schiffman (2002) and Sunderland et al. (2002)). It is widely believed that men and women use lan-guage differently, and this is true even in computer mediated communications such as email (Boneva et al., 2001). It is claimed that women tend to foster

Figure 11: Suicide notes - hate mail: relative-salience word cloud for disgust.

personal relations (Deaux and Major, 1987; Eagly and Steffen, 1984) whereas men communicate for social position (Tannen, 1991). Women tend to share concerns and support others (Boneva et al., 2001) whereas men prefer to talk about activities (Caldwell and Peplau, 1982; Davidson and Duberman, 1982). There are also claims that men tend to be more con-frontational and challenging than women.13

Otterbacher (2010) investigated stylistic differ-ences in how men and women write product re-views. Thelwall et al. (2010) examine how men and women communicate over social networks such as MySpace. Here, for the first time using an emotion lexicon of more than 15000 words, we investigate if there are gender differences in the use of emo-tion words in work-place communicaemo-tions, and if so, whether they support some of the claims mentioned in the above paragraph. The analysis shown here, does not prove the propositions; however, it provides empirical support to the claim that men and women use emotion words to different degrees.

We chose the Enron email corpus (Klimt and Yang, 2004)14as the source of work-place commu-nications because it remains the only large publicly available collection of emails. It consists of more than 200,000 emails sent between October 1998 and June 2002 by 150 people in senior managerial

posi-13

http://www.telegraph.co.uk/news/newstopics/ howaboutthat/6272105/Why-men-write-short-email-and-women-write-emotional-messages.html

14The Enron email corpus is available at

(7)

Figure 12: Difference in percentages of emotion words in emails sent by women and emails sent by men.

Figure 13: Emails by women - emails by men: relative-salience word cloud of trust.

tions at the Enron Corporation, a former American energy, commodities, and services company. The emails largely pertain to official business but also contain personal communication.

In addition to the body of the email, the corpus provides meta-information such as the time stamp and the email addresses of the sender and receiver. Just as in Cheng et al. (2009), (1) we removed emails whose body had fewer than 50 words or more than 200 words, (2) the authors manually identified the gender of each of the 150 people solely from their names. If the name was not a clear indicator of gender, then the person was marked as “gender-untagged”. This process resulted in tagging 41 em-ployees as female and 89 as male; 20 were left untagged. Emails sent from and to gender-untagged employees were removed from all further analysis, leaving 32,045 mails (19,920 mails sent by men and 12,125 mails sent by women). We then determined the number of emotion words in emails written by men, in emails written by women, in emails written by men to women, men to men, women to men, and women to women.

Figure 14: Difference in percentages of emotion words in emails sent to women and emails sent to men.

Figure 15: Emails to women - emails to men: relative-salience word cloud of joy.

5.1 Analysis

Figure 12 shows the difference in percentages of emotion words in emails sent by men from the per-centage of emotion words in emails sent by women. Observe the marked difference is in the percentage of trust words. The men used many more trust words than women. Figure 13 shows the relative-salience word cloud of these trust words.

Figure 14 shows the difference in percentages of emotion words in emails sent to women and the per-centage of emotion words in emails sent to men. Ob-serve the marked difference once again in the per-centage of trust words and joy words. The men re-ceive emails with more trust words, whereas women receive emails with more joy words. Figure 15 shows the relative-salience word cloud of joy.

Figure 16 shows the difference in emotion words in emails sent by men to women and the emotions in mails sent by men to men. Apart from trust words, there is a marked difference in the percentage of an-ticipation words. The men used many more antic-ipation words when writing to women, than when writing to other men. Figure 17 shows the

(8)

relative-Figure 16: Difference in percentages of emotion words in emails sent by men to women and by men to men.

Figure 17: Emails by men to women - email by men to men: relative-salience word cloud of anticipation. salience word cloud of these anticipation words.

Figures 18, 19, 20, and 21 show difference bar graphs and relative-salience word clouds analyzing some other possible pairs of correspondences.

5.2 Discussion

Figures 14, 16, 18, and 20 support the claim that when writing to women, both men and women use more joyous and cheerful words than when writing to men. Figures 14, 16 and 18 show that both men and women use lots of trust words when writing to men. Figures 12, 18, and 20 are consistent with the notion that women use more cheerful words in emails than men. The sadness values in these fig-ures are consistent with the claim that women tend to share their worries with other women more of-ten than men with other men, men with women, and women with men. The fear values in the Figures 16 and 20 suggest that men prefer to use a lot of fear words, especially when communicating with other men. Thus, women communicate relatively more on the joy–sadness axis, whereas men have a preference for the trust–fear axis. It is interesting how there is a markedly higher percentage of anticipation words in cross-gender communication than in same-sex com-munication (Figures 16, 18, and 20).

Figure 18: Difference in percentages of emotion words in emails sent by women to women and by women to men.

Figure 19: Emails by women to women - emails by women to men: relative-salience word cloud of sadness.

6 Tracking Sentiment in Personal Email

In the previous section, we showed analyses of sets of emails that were sent across a network of individ-uals. In this section, we show visualizations catered toward individuals—who in most cases have access to only the emails they send and receive. We are us-ing Google Apps API to develop an application that integrates with Gmail (Google’s email service), to provide users with the ability to track their emotions towards people they correspond with.15 Below we

show some of these visualizations by selecting John Arnold, a former employee at Enron, as a stand-in for the actual user.

Figure 22 shows the percentage of positive and negative words in emails sent by John Arnold to his colleagues. John can select any of the bars in the fig-ure to reveal the difference in percentages of emo-tion words in emails sent to that particular person and all the emails sent out. Figure 23 shows the graph pertaining to Andy Zipper. Figure 24 shows the percentage of positive and negative words in each of the emails sent by John to Andy.

In the future, we will make a public call for vol-15

(9)

Figure 20: Difference in percentages of emotion words in emails sent by men to men and by women to women. unteers interested in our Gmail emotion application, and we will request access to numbers of emotion words in their emails for a large-scale analysis of emotion words in personal email. The application will protect the privacy of the users by passing emo-tion word frequencies, gender, and age, but no text, names, or email ids.

7 Conclusions

We have created a large word–emotion association lexicon by crowdsourcing, and used it to analyze and track the distribution of emotion words in mail.16 We compared emotion words in love letters, hate mail, and suicide notes. We analyzed the differ-ence in emotion words used by men and women in work-place email. We showed that women use and receive relatively more number of joy and sadness words, whereas men use and receive relatively more trust and fear words. We also found that there is a markedly higher percentage of anticipation words in cross-gender communication (men to women and women to men) than in same-sex communication. We showed how different visualizations and word clouds can be used to effectively interpret the re-sults of the emotion analysis. Finally, we showed ad-ditional visualizations and a Gmail application that can help people track emotion words in the emails they send and receive.

Acknowledgments

This research was funded by the National Research Council Canada (NRC). Grateful thanks to Peter Turney and Tara Small for many wonderful ideas. Thanks to the more than thousands of people who answered the emo-tion survey with diligence and care.

16Please send an e-mail to saif.mohammad@nrc-cnrc.gc.ca

to obtain the latest version of the NRC emotion lexicon, suicide notes corpus, hate mail corpus, love letters corpus, or the Enron gender-specific emails.

Figure 21: Emails by men to men - emails by women to women: relative-salience word cloud of fear.

Figure 22: Emails sent by John Arnold.

Figure 23: Difference in percentages of emotion words in emails sent by John Arnold to Andy Zipper and emails sent by John to all.

(10)

References

Cecilia Ovesdotter Alm, Dan Roth, and Richard Sproat. 2005. Emotions from text: Machine learning for text-based emotion prediction. In Proceedings of the Joint

Conference on HLT–EMNLP, Vancouver, Canada.

Jerome Bellegarda. 2010. Emotion analysis using la-tent affective folding and embedding. In Proceedings

of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, California.

J.R.L. Bernard, editor. 1986. The Macquarie Thesaurus. Macquarie Library, Sydney, Australia.

Bonka Boneva, Robert Kraut, and David Frohlich. 2001. Using e-mail for personal relationships. American

Be-havioral Scientist, 45(3):530–549.

Mayta A. Caldwell and Letitia A. Peplau. 1982. Sex differences in same-sex friendships. Sex Roles, 8:721– 732.

Na Cheng, Xiaoling Chen, R. Chandramouli, and K.P. Subbalakshmi. 2009. Gender identification from e-mails. In Computational Intelligence and Data

Min-ing, 2009. CIDM ’09. IEEE Symposium on, pages 154 –158.

Malcolm Corney, Olivier de Vel, Alison Anderson, and George Mohay. 2002. Gender-preferential text min-ing of e-mail discourse. Computer Security

Applica-tions Conference, Annual, 0:282–289.

Lynne R. Davidson and Lucile Duberman. 1982. Friendship: Communication and interactional pat-terns in same-sex dyads. Sex Roles, 8:809–822. 10.1007/BF00287852.

Kay Deaux and Brenda Major. 1987. Putting gender into context: An interactive model of gender-related behavior. Psychological Review, 94(3):369–389. Ana B. Casado D´ıaz and Francisco J. M´as Ruz. 2002.

The consumers reaction to delays in service. In-ternational Journal of Service Industry Management, 13(2):118–140.

Peter Dodds and Christopher Danforth. 2010. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies, 11:441–456. 10.1007/s10902-009-9150-9.

Laurette Dub´e and Manfred F. Maute. 1996. The an-tecedents of brand switching, brand loyalty and ver-bal responses to service failure. Advances in Services

Marketing and Management, 5:127–151.

Alice H. Eagly and Valerie J. Steffen. 1984. Gender stereotypes stem from the distribution of women and men into social roles. Journal of Personality and

So-cial Psychology, 46(4):735–754.

Bryan Klimt and Yiming Yang. 2004. Introducing the enron corpus. In CEAS.

Adrienne Lehrer. 1974. Semantic fields and lexical structure. North-Holland, American Elsevier, Ams-terdam, NY.

Hugo Liu, Henry Lieberman, and Ted Selker. 2003. A model of textual affect sensing using real-world knowledge. In Proceedings of the 8th international

conference on Intelligent user interfaces, IUI ’03, pages 125–132, New York, NY. ACM.

Karl Marx. 1982. The Letters of Karl Marx. Prentice Hall.

Pawel Matykiewicz, Wlodzislaw Duch, and John P. Pes-tian. 2009. Clustering semantic spaces of suicide notes and newsgroups articles. In Proceedings of the

Workshop on Current Trends in Biomedical Natural

Language Processing, BioNLP ’09, pages 179–184,

Stroudsburg, PA, USA. Association for Computational Linguistics.

Saif M. Mohammad and Peter D. Turney. 2010. Emo-tions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In

Pro-ceedings of the NAACL-HLT 2010 Workshop on Com-putational Approaches to Analysis and Generation of Emotion in Text, LA, California.

Saif M. Mohammad and Peter D. Turney. 2011. Crowd-sourcing a word–emotion association lexicon. In

Sub-mission.

Saif Mohammad, Cody Dunne, and Bonnie Dorr. 2009. Generating high-coverage semantic orientation lexi-cons from overtly marked words and a thesaurus. In

Proceedings of Empirical Methods in Natural

Lan-guage Processing (EMNLP-2009), pages 599–608,

Singapore.

Saif M. Mohammad. 2011a. Even the abstract have colour: Consensus in wordcolour associations. In

Pro-ceedings of the 49th Annual Meeting of the Associa-tion for ComputaAssocia-tional Linguistics: Human Language Technologies, Portland, OR, USA.

Saif M. Mohammad. 2011b. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. In Proceedings of the ACL 2011

Work-shop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), Portland, OR, USA.

Charles E. Osgood and Evelyn G. Walker. 1959. Mo-tivation and language behavior: A content analysis of suicide notes. Journal of Abnormal and Social

Psy-chology, 59(1):58–67.

Jahna Otterbacher. 2010. Inferring gender of movie re-viewers: exploiting writing style, content and meta-data. In Proceedings of the 19th ACM international

conference on Information and knowledge manage-ment, CIKM ’10, pages 369–378, New York, NY, USA. ACM.

(11)

Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in

Infor-mation Retrieval, 2(1–2):1–135.

John P. Pestian, Pawel Matykiewicz, and Jacqueline Grupp-Phelan. 2008. Using natural language pro-cessing to classify suicide notes. In Proceedings of

the Workshop on Current Trends in Biomedical Natu-ral Language Processing, BioNLP ’08, pages 96–97, Stroudsburg, PA, USA. Association for Computational Linguistics.

Robert Plutchik. 1980. A general psychoevolutionary theory of emotion. Emotion: Theory, research, and

experience, 1(3):3–33.

Harold Schiffman. 2002. Bibliography of gender and language. In http://www.sas.upenn.edu/∼haroldfs/

popcult/bibliogs/gender/genbib.htm.

Philip Stone, Dexter C. Dunphy, Marshall S. Smith, Daniel M. Ogilvie, and associates. 1966. The General

Inquirer: A Computer Approach to Content Analysis. The MIT Press.

Carlo Strapparava and Alessandro Valitutti. 2004. Wordnet-Affect: An affective extension of WordNet. In Proceedings of the 4th International Conference

on Language Resources and Evaluation (LREC-2004), pages 1083–1086, Lisbon, Portugal.

Jane Sunderland, Ren-Feng Duann, and Paul Baker. 2002. Gender and genre bibliography. In

http://www.ling.lancs.ac.uk/pubs/clsl/clsl122.pdf. Deborah Tannen. 1991. You just don’t understand :

women and men in conversation. Random House. Mike Thelwall, David Wilkinson, and Sukhvinder Uppal.

2010. Data mining emotion in social network commu-nication: Gender differences in myspace. J. Am. Soc.

Inf. Sci. Technol., 61:190–199, January.

Peter Turney and Michael Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information

Systems (TOIS), 21(4):315–346.

Francois-Marie Arouet Voltaire. 1973. The selected letters of Voltaire. New York University Press, New York.

David Yarowsky. 1992. Word-sense disambiguation us-ing statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International

Conference on Computational Linguistics (COLING-92), pages 454–460, Nantes, France.

Figure

Figure 9: Difference in percentages of emotion words in the love letters corpus and the suicide notes corpus.
Figure 16: Difference in percentages of emotion words in emails sent by men to women and by men to men.
Figure 21: Emails by men to men - emails by women to women: relative-salience word cloud of fear.

Références

Documents relatifs

[r]

Word densities in present tense by percentage In this group, the Amazon.com data acts similarly to the other non-suicide data except that the percentage for the

If three distinct full squares have their last compatible occurrences in a partial word with one hole starting at the same position, then the longest square is at least twice as long

The systematic analysis of the vowel monophonemes, of the grammatical persons and of the adverbs of the Italian language highlights a strong tendency to isomorphism

Ras jebal prep school Teacher :Mrs.. They

If letter perception uses features invariant to 180° rotation, then one can expect that rotated primes provide priming effects of about the same magnitude as scaled down primes,

The aim of this research is to propose a description of video- game playing, based on some dimensions of emotional func- tioning such as emotion regulation,

- Provide access to information and resources, such as an employee assistance program, employee benefits, and/or community support (e.g., Wellness Together Canada) to