• Aucun résultat trouvé

Using word embeddings to explore the Aboriginality discourse in a corpus of Australian Aboriginal autobiographies

N/A
N/A
Protected

Academic year: 2021

Partager "Using word embeddings to explore the Aboriginality discourse in a corpus of Australian Aboriginal autobiographies"

Copied!
2
0
0

Texte intégral

(1)

Using word embeddings to explore the

Aboriginality discourse in a corpus of

Australian Aboriginal autobiographies

November 1, 2019

1

Poster submission - Abstract

The genre of Australian Aboriginal autobiography emerged in the 1960s, with the rise of the Aboriginal rights movement. It is considered a literature of significant socio-political importance, with authors sharing an alternative history as a truth different to the one previously asserted by the European settlers where Indigenous peoples and cultures were either misrepresented or ignored. [1, 3]

With this poster submission, we present intermediate results of the work-in-progress PhD project within the scope of which we are focusing on explor-ing a newly created text corpus with the word embeddexplor-ings technique [2]. We have analysed discursive spaces of a corpus of 92 Australian Aboriginal au-tobiographies published in 1970s-2010s. Here, the particular focus has been on the discourses of race and aboriginality as some of the dominant themes characterizing the genre. The books have been digitised and converted into machine-readable texts, whereas the words have been mapped to real-valued vectors (word embeddings) believed to reflect various linguistic properties and enabling calculating semantic similarity between them.

Word embeddings modelling has recently been used in Digital Humani-ties as an exploratory technique to complement and guide traditional close reading approaches, which is justified by their potential to identify word use patterns in a collection of texts. Word embeddings capture a lot of cultural

(2)

information, as language is considered a reflection of the cultural system of a social group using it.

The results of our initial experiments suggest that the trained word em-beddings reflect the historical, political, and cultural environment of the authors and some unique experiences and perspectives related to their racial and gender identity.

2

Authors

Ekaterina Kamlovskaya, PhD student at the Doctoral Training Unit ”Dig-ital History and Hermeneutics”, Centre for Contemporary and Dig”Dig-ital His-tory, University of Luxembourg. Graduated in 2009 with a degree in linguis-tics and translation studies from the Kamchatka State University (Russia).

Christoph Schommer, Associate Professor in Artificial Intelligence, Computer Science and Communications Research Unit, Interdisciplinary Lab for Intelligent and Adaptive Systems (ILIAS), University of Luxembourg

{ekaterina.kamlovskaya, christoph.schommer}@uni.lu

References

[1] Anita Heiss. Dhuuluu-Yala = to Talk Straight : Publishing Indigenous

Literature. Aboriginal Studies Press, Canberra, 2003.

[2] Tomas Mikolov, Greg Corrado, Kai Chen, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. pages 1–12, 2013. [3] Adam Shoemaker. Black words white page: Aboriginal literature

1929–1988. ANU E Press, Canberra, 3rd ed. edition, 2004.

Références

Documents relatifs

We found that neighbors of concrete nouns had an average concreteness score of 4.6 and nearest neighbors of abstract nouns had an average concreteness score of 2.37 meaning

We assume that dictionary entries (a definition of a word) contain latent word similarity and relatedness information that can improve language representations.. Such entries

The empirical verification of the equations listed by Theorem 1 and Corollary 1 strongly suggests that the claimed linear relation between PMI and the inner product of word

Recurrent neural networks are widely used for addressing chal- lenges in Natural Language Processing. The implementation of an NLP pipeline using RNNs can be broken done into four

In a lot of countries, when a lawyer is presented with a case, the final verdict is generally based on two things, 1) statutes (established laws) and 2) precedents (prior cases).

In our approach, dif- ferent vectorization methods such as Word2Vec, Glove, Tf-Idf and count vectorizer are used and then similarity is calculated using Jaccard simi- larity and

This diversification of the network (as against repetition of identical relationships) is supported by other elements which I will present in the following section: the normative

I will not further engage with Lévi-Strauss, just to mention that the move from a conceptualization of kinship as abstract or unconscious systems underlying