• Aucun résultat trouvé

Suite Talking : a corpus comparison study of function words in banking-industry shareholder letters

N/A
N/A
Protected

Academic year: 2022

Partager "Suite Talking : a corpus comparison study of function words in banking-industry shareholder letters"

Copied!
83
0
0

Texte intégral

(1)

Master

Reference

Suite Talking : a corpus comparison study of function words in banking-industry shareholder letters

TARPLEY, James

Abstract

Working from a corpus of shareholder letters from ten financial institutions each from the English-speaking and French-speaking worlds from 2009 to 2013, this comparison study employs a methodology inspired by the work of James W. Pennebaker's to catalogue and analyze in detail the frequency of use of " function words" such as pronous, articles, and prepositions, as well as a number of other categories defined by Pennebaker and for which he has calculated baseline usage rates for comparison purposes. The ultimate goal of this methodology is to highlight aspects of language use in translated shareholder letters that differ markedly from those in similar letters originally written in English, in order to suggest tweaks which would enable translators to better reproduce the language style of these extremely high-profile corporate communication texts in their translations of such texts from French to English.

TARPLEY, James. Suite Talking : a corpus comparison study of function words in banking-industry shareholder letters. Master : Univ. Genève, 2015

Available at:

http://archive-ouverte.unige.ch/unige:75393

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

JAMES H. TARPLEY

SUITE TALKING:

A CORPUS COMPARISON STUDY OF FUNCTION WORDS IN BANKING–INDUSTRY SHAREHOLDER LETTERS

Directeur: David Jemielity Juré: Ian Mackenzie

Mémoire présenté à la Faculté de traduction et d’interprétation (Département de Traduction, Unité d’Anglais) pour l’obtention de la Maîtrise universitaire en traduction, mention traduction spécialisée

(3)

2

Déclaration attestant le caractère original du travail effectué

J’affirme avoir pris connaissance des documents d’information et de prévention du plagiat émis par l’Université de Genève et la Faculté de traduction et d’interprétation (notamment la Directive en matière de plagiat des étudiant–e–s, le Règlement d’études de la Faculté de traduction et d’interprétation ainsi que l’Aide–mémoire à l’intention des étudiants préparant un mémoire de Ma en traduction).

J’atteste que ce travail est le fruit d’un travail personnel et a été rédigé de manière autonome.

Je déclare que toutes les sources d’information utilisées sont citées de manière complète et précise, y compris les sources sur Internet.

Je suis conscient–e que le fait de ne pas citer une source ou de ne pas la citer correctement est constitutif de plagiat et que le plagiat est considéré comme une faute grave au sein de l’Université, passible de sanctions.

Au vu de ce qui précède, je déclare sur l’honneur que le présent travail est original.

Nom et prénom : Tarpley James

Lieu / date / signature : Yverdon–les–Bains / 20 mai 2015 /

(4)

3

Table of Contents

Déclaration attestant le caractère original du travail effectué ... 2

Introduction ... 5

Theoretical Framework ... 6

Methodology ... 10

Corpus construction ... 11

Pennebaker’s The Secret Life of Pronouns ... 13

Category Analyses ... 18

Linguistic Processes ... 19

Word Counts ... 19

Longer words ... 21

Pennebaker’s ‘Dictionary’ ... 23

Function words ... 24

Pronouns ... 26

Impersonal pronouns ... 28

Personal pronouns ... 28

First person singular ... 30

First person plural ... 32

Second person ... 35

Third person singular ... 37

Third person plural ... 37

Articles ... 39

Verbs ... 40

Auxiliary verbs... 41

Past tense ... 43

Present tense ... 43

Future forms ... 45

Adverbs ... 47

Prepositions ... 48

Conjunctions ... 50

Negations ... 51

Quantifiers ... 53

Numbers ... 55

Swear words ... 55

(5)

4

Psychological Processes ... 56

Social processes ... 56

Affective processes ... 58

Cognitive processes ... 60

Perceptual processes ... 63

Biological processes ... 65

Relativity ... 65

Personal Concerns ... 66

Spoken Categories ... 67

Conclusions ... 67

Bibliography ... 71

Annex: Raw LIWC Results ... 73

(6)

5

Introduction

This study is part of on–going efforts centered around the Banque Cantonale Vaudoise (“BCV”) in Lausanne1 aimed at improving the quality of communication in French–to–

English translation in the financial industry. David Jemielity, Head of the Translation Department at the Bank, and also a researcher and instructor at the Faculty of Translation and Interpreting at the University of Geneva, has been working for over a decade on numerous research projects rooted in the professional requirements of BCV and its peer institutions in Switzerland and around the world. This thesis specifically follows up on work conducted by Rosemary Wells in 2009 in which she demonstrated a broad range of stylistic conventions in English–language corporate communication and compared these conventions to observed language style in English translations of equivalent texts originally written in French. Working from a corpus of shareholder letters from ten financial institutions each from the English–speaking and French–speaking worlds from 2009 to 2013, this comparison study will employ the methodology of University of Texas scholar James W. Pennebaker to catalogue and analyze in detail the frequency of use of

“function words” such as pronouns, articles, and prepositions, as well as a number of other categories defined by Pennebaker and for which he has calculated baseline usage rates for comparison purposes. The ultimate goal is to highlight aspects of language use in translated shareholder letters that differ markedly from those in similar letters originally written in English, in order to suggest tweaks which would enable translators to better reproduce the language style of these extremely high–profile corporate communication texts in their translations of such texts from French to English, and, by analogy, perhaps in other language pairs as well. The period under consideration makes this a particularly interesting snapshot of financial sector corporate communications, as 2009–2013 corresponds with the aftermath of and recovery from the economic crisis of 2008, and banks may reasonably be expected to have paid particular attention to anything touching on their public image during this difficult time for the financial industry.

1 http://www.bcv.ch

(7)

6

Theoretical Framework

While this study will strive to remain pragmatic and down to earth in an effort to increase the relevance of any results for the practical realities of professional financial translation, it is of course embedded within a theoretical framework. Because this study is in many ways a follow–up to work done by Wells in 2009, we shall generally apply the theoretical model which she developed and employed for her research, incorporating updates to the principal theories involved when applicable. Wells rooted her study in a functionalist approach, specifically that of Christiane Nord, which is in turn intrinsically linked to skopos theory elaborated by Katharina Reiss and Hans Vermeer.2 The summary provided by Wells of this theoretical framework is quite complete, permitting us the luxury of greater brevity.

Reiss and Vermeer have distilled the essential core of their theory into a hierarchical set of interdependent rules for translational action, which they emphasize is “not only a linguistic but also a cultural transfer.”3 The first of their rules states that “[a] translatum is determined by its skopos.”4 The term “translatum” as utilized by Reiss and Vermeer is defined as the product of “the process of translating or interpreting.”5 Their term

“skopos” has a definition that has spawned multiple tomes as dense as they are long, but can be at least superficially understood as referring to the purpose ascribed to the text, with the caveat that different actors will almost certainly have differing understandings of the purpose (or, more likely, purposes) of the text. The second rule set forth by Reiss and Vermeer is that “[a] translatum is an offer of information in a target culture and language about an offer of information in a source culture and language.”6 They expand on the terminology in this rule as follows: “The process of translational action starts from

2 Wells, Rosemary Helen, A Stylistic Analysis of the English Translations of French-Language Annual Reports (Geneva: Université de Genève, 2009) 4-5. It is worthy of note that Christiane Nord is also responsible for the English translation of the book by Reiss and Vermeer that was employed in the drafting of this section.

3 Reiss, Katharina and Hans J. Vermeer, Towards a General Theory of Translational Action: Skopos Theory Explained (Manchester: St. Jerome Publishing, 2013) 3.

4 Reiss and Vermeer 107.

5 Reiss and Vermeer 1.

6 Reiss and Vermeer 107.

(8)

7

a given text, which is understood and interpreted by the translator/interpreter. We can say that a text is a piece of information offered to a recipient by a text producer […]. The translator puts together a target text which, being a text, also offers a piece of information to a recipient.”7 The third rule is that “[a] translatum is a unique, irreversible mapping of a source–culture offer of information.”8 It is important to note that their use of the words

“unique” and “irreversible” is a deliberate refutation of what they characterize as an

“erroneous supposition”, namely that “[t]ranslation is a biunique, reversible mapping process of communication”.9 Reiss and Vermeer’s fourth rule for translational action is that “[a] translatum must be coherent in itself.”10 The fifth and final rule posited by Reiss and Vermeer is that “[a] translatum must be coherent with the source text.”11 These last two rules render explicit their belief that “intratextual coherence takes precedence over the assessment of ‘fidelity’ between the source and that target texts”.12

As Wells has argued, Nord’s refined version of skopos theory represents a “universal theory [of translation] that can be applied to all text types”.13 Translation should strive for congruency in what Nord refers to as intention, function and effect: “the function intended by the sender is also assigned to the text by the receiver, who experiences exactly the effect conventionally associated with this function.”14 As the shareholder letters examined in this study are component parts of financial–sector annual reporting that feature language that can be by turns idiomatic and technical, this broad adaptability of Nord’s concepts is particularly helpful to avoid having to force theoretical concepts to stretch to ill–fitting concrete examples.

7 Reiss and Vermeer 18.

8 Reiss and Vermeer 107.

9 Reiss and Vermeer 42.

10 Reiss and Vermeer 107.

11 Reiss and Vermeer 107.

12 Reiss and Vermeer 101-102.

13 Wells 4.

14 Nord, Christiane, Text Analysis in Translation: Theory, Methodology, and Didactic Application of a Model for Translation-Oriented Text Analysis, Amsterdamer Publikationen Zur Sprache Und Literatur.

2nd ed. (Amsterdam; Atlanta: Rodopi, 2005) 53.

(9)

8

Wells has thoroughly analyzed the functions of annual reports by applying Nord’s theoretical framework, specifically exemplified by the texts translated at the Banque Cantonale Vaudoise. She identifies the “sender” as the institution concerned by the report–the bank itself in the case of BCV.15 She points out that the “text producer” is actually the Investor Relations Officer, but the fact that the CEO and Chairman sign the shareholder letter gives them a significant stake as authors. Regarding the intention of the shareholder letter specifically, Wells shows that this is the same for the original letter and the translation: “persuade [readers] to invest, or continue investing, in the company.

In this regard, the sender’s intention is to portray the company in a positive light.”16 Wells has identified this putative reader, or “Audience”, as falling into nine categories, including shareholders, creditors, employees, analysts and advisors, customers, government and tax authorities, and the general public.17 She furthermore points out that the audiences for the French annual report and its English translation for BCV are differentiated essentially by where they are located geographically, with the readers of the English report often situated outside Switzerland.18

After identifying the media used for the BCV annual report (paper and electronic .pdf files available online), the timing of the communication (following the end of the financial year), and the motive for production of the annual report (required by law for the original, but optional and therefore motivated by marketing for the translation),19 Wells was able to sum up the function of the annual report, and notably the Chairman’s letter, specifically at BCV. “For the Chairman’s letter, the function is the same for the source and target texts. The Chairman’s letter serves first and foremost an informative function. To a large extent, it is also expressive, as the Chairman generally gives his own opinions. This, in turn, results in an appellative function, as the Chairman generally gives a positive opinion in order to appeal to the reader. Finally, the letter format serves a phatic function because the author wishes to make contact with the reader. This last function, however, is less prominent than the other functions but, interestingly, it is conventionally more

15 Wells 13.

16 Wells 13-14.

17 Wells 14-15.

18 Wells 15-16.

19 Wells 16-17.

(10)

9

important in the target culture genre.”20 From this analysis, Wells concludes that an

“instrumental” translation approach is warranted in the case of annual reports, meaning that “the target should conform to target culture genre conventions and linguistic conventions”.21 This would certainly seem to jibe with Toury’s supposition that

“[c]onventions, the necessary outcome and manifestation of any struggle for order and stability, are at the same time a means for their attainment.”22 Wells then calls for the identification of these conventions in order to better equip translators to adhere to them in their work. Her excellent study goes on to very carefully articulate the genre conventions of the target culture, and also addresses linguistic conventions, for example regarding the use of the first–person plural and various buzz words in shareholder letters.

It cannot be overstated how crucial shareholder letters are to the CEOs and other corporate leaders who (at least putatively) draft them for the annual reports of their companies each year. L. J. Rittenhouse, who directs Rittenhouse Rankings, a prominent consulting firm specializing in executive–suite candor, describes the stakes in this way:

“Executive communication reveals the character of the CEO. Is a letter written in a personal or impersonal style? Is the CEO comfortable with disclosing his unique persona, or is he protected by handlers? Does he offer a frank report about mistakes that were made and challenges that were met, or does he report only successes? Authentic leaders write balanced reports that build trust. Inauthentic leaders will twist facts and weaken trust.”23 This study, through application of a very broad–based corpus–comparison methodology, will seek to go further in identifying the linguistic conventions typical of shareholder letters written in English, and will compare them to observed practices in translated shareholder letters from a corpus derived from French–language financial institutions. Because this research study was also based at BCV, we shall also compare the translations of shareholder letters produced by the translation team at that bank in

20 Wells 18.

21 Wells 18.

22 Toury, Gideon, Descriptive Translation Studies - and Beyond, Benjamins Translation Library, Revised ed. (Amsterdam; Philadelphia: J. Benjamins, 2012) 63.

23 Rittenhouse, L. J., Investing between the Lines: How to Make Smarter Decisions by Decoding CEO Communications (New York, New York: McGraw-Hill, 2013) 13.

(11)

10

order to foreground specific areas for further study and possible improvement of the overall translational process at BCV and other French–language banks with a communication strategy that includes the English language, and thus the various cultures entwined with it. It should be noted that the author of this study was working directly with the translation team at BCV when several of the shareholder letters being analyzed in this study were translated (as was Wells when she was conducting her research), and that the director of this research project is also the head of the translation team at BCV. It is the position of this author that no conflict of interest obtains, and in fact this

“embedding” of the project in the realities of professional translation in the financial sector in Switzerland informs the study in ways that would be impossible without this close relationship between researchers and the texts being analyzed.

Methodology

This is a corpus–comparison study using the “Linguistic Inquiry and Word Count (LIWC)”

text analysis software developed by James W. Pennebaker, Roger J. Booth, and Martha E.

Francis.24 Pennebaker’s software package, which he pronounces like the name “Luke”, also grants access to an enormous database of standard English from various regions of the world, and provides built–in comparison of custom corpora with this baseline for immediate detection of unusual frequency of use of given words and phrases. The specific methodological steps employed in this study are:

1. Construction of a corpus of shareholder letters (aka “Chairman’s Letter”, “To Our Shareholders”, etc.) written originally in English from the annual reports of financial institutions.

2. Construction of a parallel corpus of such letters originally written in French and then translated into English as part of a multilingual corporate communication strategy.

3. Analysis of each corpus in comparison with baseline word–use frequency as calculated by Pennebaker. The various categories of language use defined by Pennebaker are examined to highlight potential areas for in–depth study.

4. Comparison of the two corpora in order to highlight similarities and divergences between these two selections of texts.

24 Full information available at http://www.liwc.net/

(12)

11

5. Comparison of the LIWC results for the English and translated corpora to a chronologically congruent selection of similar letters translated from French to English at BCV.

Corpus construction

Because this study follows up on research conducted by Rosie Wells in 2008, the texts included in the corpora for this project date from 2009 through 2013. Ten financial institutions were selected for inclusion in the English–original corpus on the basis of having been examined by Wells and/or due to their prominent public identities. The companies included in the corpus were (in alphabetical order using their current corporate names of choice): Barclays,25 Bank of America,26 Citi,27 HSBC,28 JPMorgan Chase,29 Lloyds Banking Group,30 Royal Bank of Canada,31 RBS,32 U.S. Bank,33 and Wells Fargo.34 Because of ongoing (some might argue, never–ending) consolidation in the financial sector, some of the institutions that Wells examined have since been merged with other groups, but this corpus ensures that her results and those obtained here should be as comparable as is practicable. The annual reports for the years 2009 through 2013 for these ten institutions were downloaded from the investor relations areas of the various companies’ websites, and the shareholder letters were extracted and saved as separate text (.txt) files. In cases where there were two such letters for a given institution, which is more common in UK than US financial institutions, they have both been retained and are simply treated as a longer letter than would have otherwise been the case.

25 http://www.barclays.com/

26 http://www.bankofamerica.com/

27 http://www.citigroup.com/

28 http://www.hsbc.com/

29 http://www.jpmorganchase.com/

30 http://www.lloydsbankinggroup.com/

31 http://www.rbc.com/

32 http://www.rbs.com/

33 http://www.usbank.com/

34 http://www.wellsfargo.com/

(13)

12

Occasionally, instead of a message in letter format, other permutations are observed, for example putative interviews with the executive in question, or a more factual essay style.

These cases have been retained as they generally provide a text sample similar in size to the more common convention of the letter from one or more high–ranking company executives and/or directors. The resulting corpus of shareholder letters from financial institutions, originally written in English, thus comprises 50 individual files.

As with the English corpus, ten financial institutions were selected for the French–to–

English translated corpus. Because not all major banks from the French–speaking world have policies of providing English translations of their annual reporting, and because some companies that do have such policies produce annual reporting in a typically French format that does not include anything analogous to a shareholder letter, the selection of institutions for inclusion in this corpus was largely constrained by availability. The companies included in the corpus were (in alphabetical order using their current corporate names of choice): Banque Cantonale de Genève,35 Banque Cramer & Cie SA,36 Banque Laurentienne,37 BNP Paribas,38 BPCE,39 Compagnie Financière Tradition,40 Crédit Agricole,41 Crédit Mutuel,42 Desjardins,43 and Edmond de Rothschild (Suisse) S.A.44 The annual reports for the years 2009 through 2013 for these ten institutions were downloaded from the investor relations areas of the various companies’ websites, and the shareholder letters were extracted and saved as separate text (.txt) files. Occasionally, instead of a message in letter format, other permutations are observed, for example

35 http://www.bcge.ch/

36 http://www.bdg.ch/

37 http://www.banquelaurentienne.ca/

38 http://www.bnpparibas.com/

39 http://www.bpce.fr/

40 http://www.traditiongroup.com/

41 http://www.credit-agricole.fr/

42 http://www.creditmutuel.com/

43 http://www.desjardins.com/

44 http://www.edmond-de-rothschild.ch/

(14)

13

putative interviews with the executive in question, or a more factual essay style. These cases have been retained as they generally provide a text sample similar in size to the more common convention of the letter from one or more high–ranking company executives and/or directors. The resulting corpus of shareholder letters from financial institutions, originally written in French and then translated into English, thus comprises 50 individual files.

Pennebaker’s The Secret Life of Pronouns

Texas–based scholar James W. Pennebaker added a mass–market success to his decades of academic publications with his 2011 monograph, The Secret Life of Pronouns: What Our Words Say About Us. This text was very widely (and overwhelmingly positively) reviewed in the popular press, ranging from the left–leaning National Public Radio in Washington, D.C.,45 to the mouthpiece of the 1%, the Wall Street Journal.46 Pennebaker is a social psychologist by training, but for more than twenty years he has been operating at the intersection of that field and linguistics, conducting corpus–based research in the way language use reflects various psychological—and social, and societal—phenomena. The Secret Life of Pronouns does not just examine pronouns, but in fact discusses the ways in which the use of what Pennebaker interchangeably calls “stealth words”47 or “function words, including pronouns, prepositions, articles, and a small number of similar short but common words.”48 When Pennebaker realized that these words often revealed more about the speakers (or writers) than the specific content words that they used, he began a long process of cataloguing and classifying function words and the contexts in which they appeared. We note that there is a large amount of overlap between Pennebaker’s function words and the types of words associated with deixis, classically defined by Lyons as “the function of personal and demonstrative pronouns, of tense and of a variety of other grammatical and lexical features which relate utterances to the spatiotemporal co–

45 Spiegel, Alix, "To Predict Dating Success, the Secret's in the Pronouns", Washington, D.C.:

National Public Radio (30 April 2012).

46 Christian, Brian, "'I' Is a Window to the Soul: How Inconspicuous Words Like 'we' and 'the' Betray Our Emotions and Affect Our Audience's Perceptions", New York: The Wall Street Journal (4 October 2011).

47 Pennebaker, James W., The Secret Life of Pronouns: What Our Words Say About Us (New York;

London: Bloomsbury, 2011) 3.

48 Pennebaker 12.

(15)

14

ordinates of the act of utterance.”49 This would seem to link many of the observations in this study to on–going research by David Jemielity on the ramifications of deixis for the professional translator. Of concrete use for this study, Pennebaker has assembled enormous corpora which allow him to analyze function words in a statistically sound fashion.

One of the basic conclusions that has come out of Pennebaker’s research is that “[t]here is a meaningful difference […] between language content and language style”.50 In other words, independently of the subject being discussed, the object being described, or the narrative being detailed, the language style itself, expressed largely through the use of function words, can be analyzed quantitatively, and this analysis can produce consistent results pointing to underlying psycho–social realities such as emotional states, socioeconomic statuses, personality traits, and even desired outcomes of the communicative act in question. “Style (or function) words are words that connect, shape, and organize content words. […] Stealth words are:

 used at a very high rate

 short and hard to detect

 processed in the brain differently than content words

 very, very social”.51

Pennebaker posits that “[b]y listening to, counting, and analyzing stealth words, we can learn about people in ways that even they may not appreciate or comprehend. At the same time, the ways people use stealth words can subtly affect how we perceive them and their messages.”52 Obviously, one of the primary goals of this study is to apply the filter elaborated by Pennebaker to the messages in shareholder letters in the annual reports of financial companies to determine whether the messages these stealth, or function, words are putting across differ significantly between communications written originally in English and those translated from French to English. This will involve first comparing a corpus of English–language letters to Pennebaker’s global corpus of the

49 Lyons, John, Semantics (Cambridge; London: Cambridge University Press, 1977) 636.

50 Pennebaker 20

51 Pennebaker 22-23

52 Pennebaker 38

(16)

15

English language in order to highlight any observable specificities of this particular genre of financial corporate communication. We will subsequently compare our English corpus with a selection of letters translated from French to English in order to ascertain any significant variance between the style of the texts originally crafted in English and those translated into that language. A corpus of translations of shareholder letters from BCV for the same years as those from the other banks will provide another basis for comparison of particular interest to the author of this study and the translation team at that Lausanne–based bank. Pennebaker himself addresses translation in his book:

“Interestingly, nouns and regular verbs generally translate across languages fairly smoothly. It is the function words that can cause the biggest problems.”53 The present research may provide indications regarding to what extent this claim (from someone who professes no particular expertise in translation) is borne out by our analysis of a specific subset of texts from a specific domain.

Of particular interest to this study is the use of the pronoun “we” and other first person plural forms. In her research, Wells demonstrated that this pronoun appeared to be under–represented in translated texts compared to English–original texts,54 and we hope to provide a solid quantitative underpinning to this observation. According to Pennebaker, “[t]he use of we highlights one of the most enigmatic function words in our vocabulary.”55 Pennebaker clarifies that this pronoun covers widely varying semantic fields: the “warm and fuzzy we—‘my wife and I;’ ‘my dog and me,’ ‘my family.’”56 This is in rather stark contrast to “the cooler, distanced, and largely impersonal we.”57 Pennebaker further muddies the waters by pointing to the somewhat historical royal we, as well as “the purely ambiguous we that is particularly loved by politicians. We need change in this country and we deserve it! Our taxes are too high and we need to do something about it!”58 This last usage is glossed by Pennebaker thusly: “Sometimes, the

53 Pennebaker 37.

54 Wells 76.

55 Pennebaker 41.

56 Pennebaker 41.

57 Pennebaker 41.

58 Pennebaker 41.

(17)

16

politician means ‘you,’ sometimes ‘I,’ sometimes ‘you and I,’ and sometimes ‘everyone on earth who agrees with me.’”59 If, as Wells has demonstrated, “we” represents a point of divergence between the style of English–penned corporate communication and translated texts, then Pennebaker’s nuanced splitting of the semantic fields indicated by the various utilizations of this pronoun could well prove to offer compelling insights into the possible miscommunication, or at the least potentially less compelling communication, that could be caused by failing to hew more closely to natural distribution rhythms. This could well be of interest to financial corporate communication specialists, in particular those operating on the frontier of English and another language by dint of participation in a process of translation.

Pennebaker contrasts his quantitative methodology of corpus–based analysis—which we intend to apply to this study—with a case–by–case qualitative process—which we also employ regarding carefully selected linguistic phenomena: “a relatively slow but careful qualitative approach can give us an in–depth view of a small group of people: a computer–

based quantitative approach provides a broader social and cultural perspective. The two methods complement each other in ways the two research camps often fail to appreciate.”60 What Pennebaker offers that is of particular use for this study is a system of metrics. As Lord Kelvin would have it, “When you can measure what you are speaking about, and express it in numbers, you know something about it.” Through analysis of staggeringly large corpora, Pennebaker has established a baseline of linguistic normality, or at least commonality, in the form of the percentage of a given body of text which a given word or category of word represents. He has furthermore elaborated an algorithm for comparing the language of two speakers, or writers—he calls this calculation alternately

“Language Style Matching” and “Linguistic Style Matching”, consistently abbreviated to

“LSM”. This rating is calculated based on the usage percentages returned by LIWC analysis of texts, using the following formula:

LSM = 1 – ( (|%

A

– %

B

|) / (%

A

+ %

B

) )

59 Pennebaker 41.

60 Pennebaker 46.

(18)

17

where the percentages are usage rates for a given category for two entities (“A” and “B”).

61

Pennebaker explains that “[t]he LSM scale ranges from a perfect 1.00 if the two people are in perfect function word harmony and as low as 0 if they are completely out of synch.

In reality, numbers below .60 reflect very low synchrony and those above .85 reflect high synchrony.”62 Pennebaker’s LSM formula is sufficiently straightforward of use that we will be able to apply it to selected elements of our corpora in order to produce a metrically specific analysis of the extent to which translated shareholder letters are in synchrony with letters originally crafted in English and/or with baseline English–language usage.63 Between Pennebaker’s LSM synchrony metrics and his baselines for word frequency in English globally, he offers two invaluable tools for a quantitative analysis of the language style of translated shareholder letters from the financial sector. A final citation from Pennebaker will indicate how his analysis style, and the pronouns (and other function words) he tracks, can apply to the world of business, and specifically to the image of a corporation:

Management consultants sometimes distinguish among I–companies, we–

companies, and they–companies. To get a rough idea of an organization’s climate, they ask employees to talk about their typical workday. If employees refer to “my office” or “my company,” the atmosphere of the workplace is usually fine. People working in these I–companies are reasonably happy but not particularly wedded to the company itself. However, if they refer to “our office” or “our company,” pay special attention. Those in we–companies have embraced the workplace as part of their own identities. This sense of we–ness may explain why they work harder, have lower employee turnover, and have a greater sense of fulfillment about their work lives. And be very concerned if an organization’s employees start calling it

“the company” or, worse, “that company” and referring to their co–workers as

“they.” They–companies can be nightmares because workers are proclaiming that

61 Niederhoffer, K. G., and James W. Pennebaker, "Linguistic Style Matching in Social Interaction", Journal of Language and Social Psychology 21 (2002) 340.

62 Pennebaker 205.

63 An interesting follow-up study would be to conduct a similar LSM assessment of different companies’ letters, to see if there are “schools” of communication style. Indeed, even a single company’s communications could be analyzed chronologically to determine to what extent their style remains consistent year in, year out. One might even seek out correlations between certain communication styles and the overall stock performance of the company in question for the financial year of the communication to determine if certain communication patterns are linked to certain levels of performance. Finally, were one of a decidedly cynical bent, one might elect to study letters attributed to CEOs who were

subsequently discredited for ethical—or even criminal—failings. This last study might suggest styles to avoid, or at least minimize, in order to communicate honesty to the shareholders and/or other readers of the annual report.

(19)

18

their work identity has nothing to do with them. No wonder consultants report that they–companies have unhappy workers and high turnover.64

That this illustration seems fairly common–sensical in no way detracts from the fact that it represents a sort of qualitative analysis of pronoun usage which, in a very specific context, returns salient, actionable data. Indeed, anyone working in financial translation in Switzerland would immediately see that the comments regarding “they–companies”

are liable to require action when these translators are called upon to translate texts whose protagonist is “la Banque”.

In the following research study, we hope to shed light on a particular corner of the translation profession on a similarly linguistic basis through a thorough analysis of the relative frequency of certain function words in shareholder letters from financial institutions, specifically those written in English and those translated from French to English. The fact that the institutions under review are in a sector that in the years under review was still emerging from the crisis—with ramifications for both the bottom line and the corporate image—of 2008 should add poignancy to the study, and in any event ensures that the letters to be analyzed were undoubtedly crafted with a tremendous amount of care and attention to detail, which should allay any possible concerns that we might at some point be reading too much into some detail of relatively tight focus.

Category Analyses

Pennebaker’s Linguistic Inquiry and Word Count (“LIWC”) software package, as its name suggests, performs a word–count analysis of the texts processed through it. It provides some basic statistics about the texts, including overall raw word counts. It then calculates the prevalence of each term as a percentage of the entirety of the text that contained it.

But what makes Pennebaker’s system unique is the categories that he has developed over time, and the data that LIWC provides based on these categories. There are 79 categories and sub–categories with which LIWC has been programmed, broken down into four overarching groups: Linguistic Processes, Psychological Processes, Personal Concerns,

64 Pennebaker 227.

(20)

19

and Spoken Categories.65 This means that this software performs a particularly versatile and wide–ranging analysis.

In this section, representing the vast bulk of the primary research in this study, we will examine the results returned from our corpora of shareholder letters from financial institutions in terms of Pennebaker’s categories. While some of these will surely prove more germane than others, the strategy in this study is to at least glance at every single category in order to increase the likelihood that hints will be picked up regarding what Nord would call cultural and linguistic conventions of style and socio–cultural positioning of the voices in these corporate communication instruments, and also any differences between the language profiles of those letters originally penned in English and the ones that were translated into that language from French as part of an international, multilingual communication strategy. We will begin with Pennebaker’s linguistic process categories, as they appear to be the most relevant, then follow up with a more rapid analysis of our LIWC results in the psychological process categories before very briefly touching on the final two groups of categories, which are less relevant to our particular study, although still potentially capable of providing some insights into the texts comprising our corpora. Additionally, we will examine LIWC results, and occasionally LSM scores, for shareholder letters from the same years translated at BCV with an eye to highlighting areas of note, either because they show particularities of the translations produced by that team with whom the author of this study has worked over a number of years, or to point out possible areas where tweaks could result in a translatum offering higher coherence with the linguistic conventions for shareholder letters from financial–

sector annual reports.

Linguistic Processes

Word Counts

The first of Pennebaker’s categories that we will examine is the raw number of words in the constituent files of the corpus. At first glance, we immediately observe a striking disparity between the shareholder letters written in English and those translated from French to English: the former are far lengthier in terms of number of words employed, with 204853 words in total compared to 72564 for the letters translated from French,

65 Pennebaker, James W., et al., "The Development and Psychometric Properties of LIWC2007", LIWC.net (2007) 5-6.

(21)

20

representing average word counts per shareholder letter of 4097 and 1451 respectively.

Even accounting for the occasional existence of two letters in English–original annual reports, this still shows that the communications being analyzed in this study are on average nearly twice as long when written originally in English as they are when they are translated into English from French. This is somewhat counterintuitive given the prevailing impression that English–users are more concise, and less verbose, than counterparts from the French–speaking world. However, taken at face value, this divergence marks an apparent cultural difference at the level of the preparers of annual reports for financial institutions: less information is being offered by the corporate communication specialists at the French–language banks than by their counterparts from the Anglosphere. Conversely, this shows a relative taciturnity on the part of French–

language financial executives. For the translator, this is clearly not the sort of language feature that can be brought easily into line, as said executives would be exceedingly unlikely to encourage their English–language translators to pad out their comments to shareholders to the extent of doubling the extant text. (On a purely logistical note, it would also be rather awkward to see such a divergence between the bound copies of the French and English annual reports, not to mention the additional problems this would pose in pagination, illustration, printing, etc.)

On the other hand, corporate communication specialists in financial institutions in the French–speaking world might be interested in this relative paucity of words if they are interested in increasing the coherence of their communication strategies with those of institutions from the United Kingdom and North America. Pennebaker takes overall verbosity as one possible indication of the psycho–social identity of a communicating entity, and according to his analyses this divergence would suggest a greater self–

confidence on the part of the English–language entities, and a certain hesitancy would be ascribed to the French–language institutions.66 It is not the purpose of this study to produce a grotesque psychoanalytical model of English–using and French–using financial institutions, and indeed we remain quite skeptical of that sort of analysis in general, but we anticipate that aligning the clues teased out by the word counting and Pennebaker’s established baselines will at the least suggest some broad brushstrokes of potential interest, and maybe even use, to translators in the financial domain.

66 Pennebaker 80.

(22)

21

Taken alone, any of these categories might indicate communicative strategies or conventions (intentional or not) which are not borne out when considered in the context of the many other categories defined—and normed—by Pennebaker. In some cases we may discover commonalities between financial–sector corporate communication, independent of language, which diverge from average English production across the wide socio–cultural swathe considered by Pennebaker in his long–term, and on–going, research. In other cases, which we would expect to be of greater interest to professional translators, particularly those working in the financial sector, we may identify language phenomena which show an inconsistency between the communication strategies employed by English–language institutions and those of similar companies from the French–speaking world. One thing that this very first category allows us to observe is that apparent differences in style exist between individual financial institutions in our survey.

For example, word counts show that JP Morgan Chase is far and away the most verbose of all the institutions examined, with nearly four times the volume of the next closest institution (Wells Fargo), and with almost seven times the production of the most loquacious French–language institution, Rothschild.

Longer words

Unlike the raw number of words considered above, for which it is not possible to calculate a baseline, for the other categories defined by Pennebaker we will be considering frequency of use as a percentage of the total production in a given text. This allows us to more easily observe relative frequency regardless of the overall size of a source text, and is thus a more flexible and subtle analytical tool than simply counting and reporting the number of occurrences of a given word or group of words in a chosen text. This is an instance in which LIWC performs much better than AntConc (used by Wells) and other similar concordancing software packages. Pennebaker has observed that, on average, English–speakers employ words more than six letters long 16.10% of the time. It is immediately striking from our corpus that in both texts originally written in English and those translated from French to English, there is a far higher incidence of utilization of words seven letters in length and larger that at baseline. For those texts from English–

language financial institutions, we observe an average of 29.67% for longer words, and for our translated texts that percentage is even higher, at 31.53%. This shows that in the language style employed in shareholder letters from financial institutions, complex

(23)

22

vocabulary is nearly twice a prevalent as it is in language use across all sectors—the LSM for baseline English and English shareholder letters is only .70, marking the synchrony between these two subsets of English usage as rather low. BCV comes in much closer to the norm for English–drafted shareholder letters than translated ones, with 29.30% of words at more than six letters in length, generating an LSM of .99 indicating perfect synchrony.

Graph 1: Usage rates for words seven letters or longer in length

The tight correlation between the original and translated English, with an LSM of .97, as well as the results at BCV, here strongly suggests that the higher intellectual register that this use of longer, more complex words represents is an integral part of the language conventions specific to this particular socio–cultural niche, regardless of whether the banking executives in question operate primarily in New York or Paris. Far from attempting a ‘folksy’ style, the writers of these letters to shareholders, putatively from the highest–ranking executives and administrators of powerful and wealthy companies, instead clearly embrace a style that is significantly more ‘highbrow’ than the average enunciation on the street. This clearly indicates that the letters that form the raw material of this research study are carefully–crafted documents, written by people (or teams) interested in the style and vocabulary of their texts, and this deliberateness underscores the pertinence of the choice of this particular textual subset for such an analysis. While the observed frequency of use is quite similar in both our translated and untranslated texts, it is worth noting that there is a wider variance in the English–original texts, with

0 5 10 15 20 25 30 35

Baseline English Translated BCV

Words ≥ seven letters in length as %

(24)

23

both the lowest and highest percentages of longer–word use coming from that subset of our corpus.

Interestingly, the company already identified as the most voluminous communicator, JP Morgan Chase, also proves to be the entity that employs the lowest percentage of words greater than six letters in length. US Bancorp edges out the Royal Bank of Canada as the company using the highest percentage of longer words, with both of them above 34% in this category. All of the French–using institutions fall in the middle of the pack. Whether this is due to translators ‘smoothing out’ language use is not clear, but merely identifying the outliers in this category will begin the process of potentially highlighting stylistic features of individual components of our cross–section of the financial industry, and thereby pointing to possible veins that might be mined for further, more qualitative, study at a later date.

Pennebaker’s ‘Dictionary’

Over the decades he has spent refining LIWC and building his massive corpora, Pennebaker has built what he calls a ‘dictionary’. In fact, this is a compilation of words and word fragments that he has classified as representing his various categories. A word may be classified in more than one category, in some cases simultaneously reflecting multiple categories, in some cases contextually only representing one or more. One of the categories that LIWC tracks is the percentage of words in analyzed texts that were matched to one or more of Pennebaker’s categories. Our results from our original–

English texts show a marked correlation with Pennebaker’s observations: 83.29% of the words in these texts were present in his dictionary, while the average result is 82.42%.

Our originally French texts showed a somewhat lover correlation, only 78.41%. This may be imputed to a higher proportion of non–Anglophone proper nouns in texts describing people and places from outside the political and cultural boundaries of the English–

speaking world. It may also suggest a slight degree of the ‘exotic’ from the original French that shows up even through the filter of translation.

On the correlation side, Wells Fargo is the outlier with 85.02% of the words from those texts finding a match in Pennebaker’s dictionary. The lowest correlation, as we might expect, is from a French–language institution, the Compagnie Financière Tradition, with only 74.55% of its words finding dictionary matches. This spread of eleven percentage points indicates that while, on average, there may not be a huge disparity between the

(25)

24

French–original and English–original texts, when we zoom in we still find quite a bit of relief in the numbers. Surprisingly, the letters from BCV score 76.13%, suggesting that the in–house language at BCV is less well aligned with that of the English language as used in day–to–day life than it is with other financial reporting texts.

Going forward in this research study, we will examine each of the categories in Pennebaker’s dictionary in order to render our analysis more systematic and less prone to cherry picking. Some of the categories will of course provide more interesting or compelling results than others, but it is hoped that the overall, wide–angle approach provided by adhering to Pennebaker’s tried and tested methodology will ultimately point to some surprises that a more targeted analysis of likely phenomena would have missed entirely. And, of course, such a targeted, quantitative analysis can always be performed for any language–use details that this broad–based approach might highlight as promising avenues for further research.

Function words

As detailed earlier, Pennebaker’s research has tended to suggest that the relative frequency of use of what he calls ‘function’, or ‘stealth’, words, that is to say words that provide linguistic framework more than providing specific content details, is an excellent tool for the analysis of the verbal or written production of a person (or of an entity). The suite of analytical approaches that this opens has resulted in much of the attention to Pennebaker in the popular press, especially regarding his analyses of pronoun usage by US presidents and the intimations regarding their personalities that those studies conveyed – for one such article, see Ben Zimmer’s piece in the New York Times from August 2011.67 Pennebaker’s dictionary includes in the “Function words” category all manner of pronouns, prepositions, articles, and relatively short and commonly–used adjectives and adverbs, as well as a variety of other words and terms.68 It is immediately apparent from analysis of our corpus that there is a lower instance of function word use in financial–sector shareholder letters than the average baseline usage determined by Pennebaker. He has established the norm in this category as 54.85%, which is a surprisingly high proportion of words that do not relate specific details, but instead form

67 Zimmer, Ben, "The Power of Pronouns", New York: The New York Times (26 August 2011).

68 Campbell, Sherlock, et al., LIWC2007 Dictionary Poster, Austin, Texas: LIWC.net, 2007.

(26)

25

the framework inside which specific content words are arranged. Neither the English–

original texts nor the translated texts in our corpus attain even 50% function–word usage, let alone the nearly 55% that Pennebaker has observed as an average result. Our English texts come in at 49.67% function words, ten percent lower than Pennebaker’s average, but scoring an LSM of .95 that indicates high synchrony. The originally French texts show 46.98% function–word usage, nearly 15% below the average, and with an LSM of .92, lower but nevertheless still in the high range for synchrony.

Graph 2: Usage rates for function words

This represents the first indication from our corpus comparison study that a measurable divergence exists, at a linguistically fundamental level, between texts that were originally drafted in English and others that represent the product of a process of translation from French. If we accept Pennebaker’s supposition that function words really are key to the way different people and actors communicate, then this represents potentially fertile ground for further study. The fact that the texts from BCV come in even lower than the average for the translated texts, at 44.52% for an LSM of .77, further underscores this category as representing a tantalizing subset of words for further analysis, with the caveat that the selection of words in this category may also require some further scrutiny in order to determine if there are in fact words commonly serving as “function” words in shareholder letters that for some reason have not been included in Pennebaker’s dictionary entry for this category. It is to be hoped that the analyses for specific sub–

0 10 20 30 40 50 60

Baseline English Translated BCV

Function word usage as %

(27)

26

categories of function words will shed some immediate light on these surprisingly divergent LIWC results.

Pronouns

Pennebaker has compiled a comprehensive list of words that act as pronouns, broken down further into personal pronouns and impersonal pronouns. He also tracks individual types of personal pronouns, with subcategories for first–person singular, first–person plural, second–person, third–person singular, and third–person plural pronouns. He clearly finds pronouns compelling from a research perspective, but they also seem to represent somewhat of a calling card for this scholar, to the extent that they were even foregrounded in the title of his highest–profile monograph. According to Pennebaker’s analysis of his comprehensive corpora of English–language production, on average 15.03% of English discourse is composed of pronouns, making this an extremely impactful category by sheer volume alone. Indeed, pronouns constitute the most voluminous category tracked by Pennebaker apart from function words as a whole.

It is immediately apparent from the analysis of our corpora that the shareholder letters from financial institutions show a markedly lower rate of pronoun usage than does the English language as a whole. The English–written letters, at 9.91%, are a third lighter in pronouns than the baseline determined by Pennebaker. The letters originally written in French display an even lower rate of pronoun usage than those drafted in English, with only 7.62% of these texts comprising pronouns across all categories. The texts from BCV come in between those of the two corpora, but closer to the English–original texts, at 8.75%.

(28)

27 Graph 3: Usage rates for pronouns

The percentage from the corpus of translated texts is just over half as high as that of the baseline, indicating that these letters diverge quite significantly from the general language norm, and even noticeably from the English–language texts from corresponding institutions from the English–speaking regions of the world. The fact that English–

language banks and financial institutions utilize pronouns with less frequency in their corporate communication instruments than do general users of English in forums such as fireside chats is perhaps not immediately surprising, but given that the vast majority of these documents are couched in terms of letters, or messages, from a person who in some way embodies the company they direct, it does not seem unrealistic to expect personal pronouns to remain quite prevalent, and there is no immediate reason to expect usage of impersonal pronouns to be affected at all. Nevertheless, the numbers in this category plainly show an across–the–board phenomenon that apparently serves as a distinctive convention of the language style of this sort of element of financial institutions’

annual reporting. Further investigation of the various sub–categories of pronouns will hopefully clarify in which areas this distinct style is the most clearly expressed, and also will point to the sub–categories showing the most marked divergence between this distinctive style and that of the translated French annual–reporting documents.

0 2 4 6 8 10 12 14 16

Baseline English Translated BCV

Pronoun usage as %

(29)

28 Impersonal pronouns

Pennebaker has identified a comprehensive list of impersonal pronouns.69 They make up, on average, 4.96% of English–language production, or roughly one third of all pronoun usage. As a cursory glance at the words comprising this category reveals, these are true function words in the Pennebakerian sense, in that they form the scaffolding on which meaning is hung by content–specific words, and in that they are generally in the background and not heavily emphasized. As such, we would not expect there to be significant variance between the base usage rate of these words and the specific proportion of shareholder letters made up of these words. The LIWC results bear out this supposition, showing that in our English–original texts, impersonal pronouns make up 3.39% of the words employed, with the French–original texts coming in even higher at 3.80%. While in both cases impersonal pronouns are under–represented in the financial communication texts compared with the language overall, the difference is slight enough that it would be hard to make any compelling claims about stylistic or socio–cultural positioning based upon these terms. The difference already detected between pronoun usage in our corpus texts and the English language as a whole must then be mostly a phenomenon of personal pronouns.

The results from BCV are markedly lower in this category than in the baseline, at 2.56%, and in fact are lower than both the English–original texts and the translated shareholder letters, although not the lowest results posted for an individual bank. The general inconsistency of the results in this category when examined bank by bank, ranging from 1.78% at the Royal Bank of Canada to 6.22% at Crédit Mutuel, with no really clear trend distinguishing the translated texts from the English–original letters, marks the impersonal pronoun category as somewhat of an anomaly, and therefore potentially representing an intriguing – or frustrating – line for further research.

Personal pronouns

Pennebaker has compiled a far–ranging list of personal pronouns in his dictionary.70 He has observed that personal pronouns comprise 10.07% of general English language production as observed in his massive (and ever–increasing) corpora. However, this linguistic tithe does not seem to have been forthcoming in our financial annual–reporting

69 Campbell et al., column K.

70 Campbell et al., column E.

(30)

29

documents. The texts originally written in English show a lower prevalence of personal pronouns, with only 6.52% of their language made up of words from this category. This confirms our supposition above that the stylistic divergence between baseline English and the language style specific to shareholder letters, where pronouns are concerned, is rooted in personal, rather than impersonal, pronouns. But the numbers regarding the translated letters to shareholders show a difference that is much more pronounced, between both the baseline numbers and, more intriguingly for the purposes of this study, the LIWC results for the English–language letters. At only 3.82%, the originally French letters show slightly more than half as much personal pronoun use as that in corresponding letters drafted in English, and just over a third as much personal pronoun usage as is observed in the language generally. Here, then, is the kind of linguistic phenomenon that LIWC can highlight, and an obvious venue for deeper, more qualitative, analytical digging.

The results from BCV are more closely aligned to the English–original texts than the other translations from French, coming in at 6.19%, marking this as a category likely reflecting the application of the research conducted at BCV and developed by Wells, particularly regarding first–person plural pronouns.71 The almost perfectly synchronous letters from English–language banks and BCV show an LSM of only .79 and .76 respectively when compared to baseline English usage, showing that they represent genre conventions which are stronger than the general patterns in language use.

71 Wells 76

(31)

30 Graph 4: Usage rates for personal pronouns

Further analysis of the specific sub–categories of personal pronouns should further sharpen the focal point of this target for qualitative analysis using a traditional concordance software package.

First person singular

“I” and associated pronouns are the sub–category of personal pronouns that Pennebaker has observed being used the most extensively in his baseline corpora.72 Indeed, they constitute more than half of all personal pronouns used in English on average, and make up 5.72% of all the words employed in typical communication. It is therefore quite striking to examine the results for this category returned by the processing through LIWC of our corpora of shareholder letters. Letters from our English–language corpus show a vastly lower incidence of “I” word usage, only 0.38% of total words employed, while the translated texts show even less “I” word usage of 0.23% of the total. This means that first–

person singular personal pronouns are used only one fifteenth, or 6.6%, as frequently in shareholder letters written in English as they are in general, an extremely low LSM of .12.

The divergence is even more striking in the French–original letters, where “I” words are used only 4% as frequently as in baseline English, a vanishingly low LSM of .08.

Here, then, is an extremely significant feature of shareholder letters in general, and also a case in which the translated letters do not correlate perfectly. The original to translated

72 Campbell et al., column F.

0 2 4 6 8 10 12

Baseline English Translated BCV

Personal pronoun usage as %

Références

Documents relatifs

The availability of a large labelled database of British English (Aix-Marsec) provides an opportunity to test different hypotheses concerning the factors influencing

Eight years after the war, Beloved appears outside Sethe’s house who then adopts her and starts to believe that the young woman is her dead daughter come back to life..

It is hoped that the constitution of the corpus #intermittent will enable researchers to work on this kind of discourse (tweets related to a controversial topic), to characterize it

We first analyze tone realization in disyllabic words according to tone nature and prosodic position, and then analyze tone re- alization in the first syllable of disyllabic words as

The full list of the core of the semantic field of empire for the Czech language is as follows (in alphabetical order): c´ısaˇ r (emperor), c´ısaˇ rstv´ı (empire), dynastie

The results of the MEDDOCAN shared task and evaluation effort on automatic de-identification of sensitive information from texts in Spanish show that ad- vanced deep learning

The preliminary list of search terms showing lexical stress variation was based on the 261 items in the 2008 Longman Pronunciation Dictionary for which survey data was

My host was Sancharika Samuha (SAS), a feminist non- governmental organization (NGO) dedicated to the promotion of gender equality of women through various media initiatives