• Aucun résultat trouvé

4. METHODOLOGY

4.2. The corpora

4.2.1. The French monolingual corpus

FR-All contained 50 original French-language earnings releases. I chose to only use earnings releases issued by banks engaged in retail banking in order to have a more uniform corpus and facilitate comparisons among the constituent texts. This section will begin by providing an overview of the corpus and presenting its main characteristics (Section 4.2.1.1).

Then it will describe how the texts were assembled (Section 4.2.1.2) and cleaned up (Section 4.2.1.3).

4.2.1.1. Overview of French monolingual corpus

Of the 50 earnings releases comprising FR-All, 20 were issued by French banks (Groupe BPCE,10 Crédit du Nord Group and BFCM Group) and 30 by Swiss banks (Banque Cantonale de Genève, BCV Group, Banque Cantonale du Jura and Banque Cantonale Neuchâteloise). (The nationality of these banks was not considered as a variable in any of the analyses in this study.) Table 4.1 summarizes the contents of this corpus. The words counts are based on the cleaned-up versions of the releases (that is, the releases in the form in which they actually appear in the corpus). The process of cleaning up the corpus is described in Section 4.2.1.3. Table 4.1 also indicates the banks for which human translations are available. The releases of those banks comprise FR1. The table includes the number of occurrences found of synonyms expressing the concept of an increase; further information on these occurrences and how they were found will be provided in Section 4.3.

10 The bank names used in this table and the rest of this thesis are taken from the human translations of the relevant bank’s earnings releases, where available. While the word “Groupe” is translated as “Group” in the relevant bank name in the BFCM Group, Crédit du Nord Group and BCV Group earnings releases, Groupe BPCE is called “Groupe BPCE” in the human translations of that bank’s earnings releases.

48 Table 4.1: French monolingual corpus

Bank name

The shortest release, at 399 words11, was issued by Banque Cantonale du Jura and the longest, at 4,967 words, came from Groupe BPCE. Word counts for the individual earnings releases comprising the corpus (as well as the GNMT and human translations) can be found in Annex 4. The sources and titles of these releases, as well as the dates they were issued, are provided in Annex 1. The document names used in the annexes and the rest of this thesis conform to the following structure: Each name begins with a number (from 1 to 50) that is unique to the earnings release. An abbreviated form of the bank name follows the first underscore. (These abbreviated names are shown in brackets in the table above.) An indication of the financial period follows the second underscore. This contains the last two digits of the year and the letters “HY” for the first half-year period in a year, “FY” for the full year, or “1Q”

or “3Q” if the release covers just the first or third quarter. If reference is being made only to the French-language original, the human translation or the GNMT translation, there is a final

11 All word counts provided in this and the following sections are after clean-up.

49 underscore followed by the letters “FR,” “HT,” or “GT,” respectively. (For example, the document name 1_BPCE_17FY_HT designates the human translation of the Groupe BPCE earnings release that reports the bank’s full-year results for 2017.)

4.2.1.2. Compilation of corpus

In order to find earnings releases, I used Google’s search engine and entered queries such as “banque communiqué de presse résultats financiers.” Because I had decided to only include banks that had a retail banking business, I excluded any private banks that appeared in the search results. I also excluded any banks where the French earnings release may not have been the original (for example, Swiss cantonal banks who publish German and French versions of their earnings releases, as the German version could have been the original, and major Swiss and French banks with a significant international presence, as their earnings releases could have originally been drafted in English). I excluded releases that could not be easily copied and pasted into a Word document or text file or that had a complex layout, as the releases would eventually need to be in a format that could be read by Google Translate and the concordancer Antconc.

I only used earnings releases that were published on the corresponding bank’s website;

I did not take any releases from third-party websites. Once I found one earnings release on a bank website, it was generally easy to find other releases issued by the same bank for other financial periods. I placed no restrictions on the financial period covered, so the earnings releases included in the corpus sometimes cover a quarter, sometimes a six-month period, and sometimes an entire year. A release covering a six-month period will include comparisons to the immediately preceding six-month period and to the same six-month period of the prior year, but it will often also include comparisons for the quarters comprising the six-month period.

Similarly, an earnings release covering an entire year might also include comparisons for the second six-month period or the last quarter of that year.

The releases from all of the banks were labeled “communiqué de presse” except for three, issued by Banque Cantonale Neuchâteloise. Each of the three was labeled as being intended for a “conférence de presse,” was between 9 and 12 pages in length, covered a full-year period and contained various components. The first of these components was a two- to three-page communiqué de presse; this was followed by a page of key figures, a balance sheet, an income statement, a statement by the chairperson, comments on the financial year, and a summary of important developments at the bank. Because the contents of the documents as a

50 whole were similar to the contents of other earnings releases, I included the documents in their entirety in the corpus.

4.2.1.3. Clean-up of corpus

All the earnings releases comprising the corpus were available in pdf format at the website addresses indicated on Annex 1. I copied the contents of the releases into Word documents and then removed items that were unlikely to contain the types of comparisons I was interested in (such as tables, footnotes, page numbers, headers, footers, media contact information, financial calendars, notes on accounting methods, accounting definitions, a note that appears at the end of the Swiss earnings releases which states that the release was issued outside the Swiss stock exchange’s opening hours,12 and a section present in a number of releases that provides general information about the bank in question). This not only made the size of the corpus more manageable and the texts easier to consult, but also helped ease review of the concordance search results in Antconc. Once the releases were copied into Word documents, the items mentioned above sometimes appeared in the middle of sentences and, if not removed, could have interfered with the identification of relevant comparisons in the Antconc concordance pane.

After having the French-language releases translated by Google Translate, I saw that the Groupe BPCE releases had not been translated in their entirety because of their length. For each release, the system stopped translating at the end of the last full paragraph before the 5,000-word mark. Once I realized this, I went back to the French monolingual corpus and deleted the portions of the Groupe BPCE releases that Google Translate had left untranslated.

After cleaning up the earnings releases in Word, I saved each one as a text file. All 50 text files were placed in one folder, which made the corpus available in a format that could be read by Antconc.