• Aucun résultat trouvé

Perceiving the World through Statistics: ENBIS-14

N/A
N/A
Protected

Academic year: 2022

Partager "Perceiving the World through Statistics: ENBIS-14"

Copied!
18
0
0

Texte intégral

(1)

Perceiving the World through Statistics:

Some Dimensional Thoughts

Michel Lutz

OCTO Technology

Rodolphe Le Riche

CNRS and Ecole des Mines de Saint-Etienne

ENBIS-14

(2)

Motivation

Statistical analysis: positive facts or pure mental construction ?

Our belief: standpoint over the contingency (Desrosières,

2010), knowledge creation, useful for decision-making,

based on our capacity to grasp our multidimensionnal world

Consequence: « true », but inevitably partial and distorted descriptions of a complex world

(3)

Statistics analysis ≠ THE Truth

Risks that can be avoided/controlled by a careful analysis:

Spurious correlation due to hidden variables, large samples

(Granville, 2013) or compositional data (Pearson, 1897)

Spurious regression caused by the presence of a unit root

(Granger & Newbold, 1974)

Bertrand paradox: probabilities may be influed by

mechanisms or methods that produces the random variables

Inevitable consequences of the analysis:

Distortions of our infinite-dimensionnal world, which cannot be analyzed as a whole (Rucker, 1984)

This is the scope of this presentation

(4)

Metaphor: welcome to Flatland

(Abbott, 2013)

(5)

Proposition: when crunching data, we are travelling through hyperspaces

Knowledge D

(Tsoukas & Vladimirou, 2001 ; Alavi &

Leidner, 2001 ; Tuomi, 1999)

(6)

Proposition

knowledge creation

=

a dynamic back & forth process

between higher and lower dimensional hyperspaces

(7)

Going down through dimensions

Dreal = ∞

Ddata = M

Dinformation = m

Dknowledge = n

> M > m > n

D

(8)

Why? Cognitive limitations

Capacity of perception

• High dimension is not intuitive

Interpretation is difficult

(9)

Why? Technical limitations

Capacity to measure, store and process data are limited

Curse of dimensionality: the higher the

dimension, the more data

(Donoho, 2000)

(10)

Consequence: distortions

x1

x2 A

B

C

Dimension: number of variable

Event: ... is closer to ... than ... (Euclidean norm)

[Be A(xa,ya) and B(xb,yb) ; dist.(AB) = ]

Data: A is farther to B than C.

According to x1 : A is closer to B than C.

According to x2 : A is farther to B than C.

(xb −xa)²+(yb −ya)²

(11)

Consequence: distortions

CONTRADICTION

Definition: “is contradictory a knowledge of the reality that is both A and non-A”

Partial knowledge

(12)

Going up through dimensions thanks to technical innovations

• « Big Data » architectures: more

measures, more storage, more processing capacities

Machinery can help us to grasp more

complexity

(13)

Going up through dimensions thanks to ad hoc quantitative analysis

• Useful to overcome some contradictions…

Ex. : concatenation, SVM classification (Vapnik, 1995 ; Aizerman & al., 1964)

(14)

Going up through dimensions

• … can also bring to other contradictions…

Ex. : SVM kernel trick several kernels can be used !

• … and cognitive limitations still exist !

Perceiving hyperspaces (> 3) is not straightforward for human and needs specific learning

(15)

The knowledge: a dynamic back & forth process between higher and lower dimensional hyperspaces

Too much

dimensions Reduction

Contradiction Knowledge

More dimensions Data

Human knowledge is dynamic, recurring and constantly surpassing

Statistics

Statistics

D

(16)

Conclusions

Dynamic human knowledge process: towards complex thinking (Morin, 2005)

Statistics serve human cognition: tools and methods to build reasoned and dialectic knowledge

Debate:

Machine (data processing) + human (intuition, interpretation) = increase human knowledge for decision-making

How far can the limits of human intelligence be pushed by artificial intelligence?

Could human limitations be surpassed by artificial intelligence?

Eg. Could human be excluded of the knowledge process for decision-making?

Black box models

Does a machine generate knowledge for itself?

(17)

References

Abbott, Flatland, Librio Littérature, 2013

Alavi & Leidner, Knowledge management and knowledge management systems: conceptual foundations and research issues, MIS quarterly, 2001

Aizerman, Braverman & Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 1964

Desrosières, La politique des grands nombres – Histoire de la raison statistique, La Découverte, 2010Donoho, Aide-Mémoire. High-dimensional data analysis: the curses and blessings of

dimensionnality, Departement of Statistics, Stanford University, 2000

Granger & Newbold, Spurious Regression in Econometrics, Journal of Econometrics, 1974 Granville, The curse of big data, 2013, http://www.analyticbridge.com/profiles/blogs/the- curse-of-big-data

Morin, Introduction à la pensée complexe, Seuil, 2005

Pearson, Mathematical Contributions to the Theory of Evolution—On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs,

Proceedings of the Royal Society of London, 1897

Rucker, The fourth dimension – A guided tour of the higher universes, Houghton Mifflin Company, 1984

Tsoukas & Vladimirou, What is organizational knowledge, Journal of Management Studies, 2001Tuomi, Data is more than knowledge – Implications of the reversed knowledge hierarchy for knowledge management and organizational memory, Journal of Management Information Systems, 1999

Vapnik, The nature of statistical learning theory, Springer-Verlag, 1995

(18)

Thank you ENBIS-14

Michel Lutz & Rodolphe Le Riche

Références

Documents relatifs

Abstract. Our study concerns tacit knowledge that has been one of the most discussed concepts in the field of contemporary knowledge management. We are particularly

In this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE), Information Retrieval and

His criticisms revolve around three points which are of particular relevance to our discussion: (i) the possible obsolescence of universal bibliographic classification schemes,

There was considerable debate over which model of KM should be followed: the centralized, IT based, first-generation model or the more widely distributed, community based approach

An Informal Model is introduced here as a representation of knowledge suitable for engineering knowledge management as well as to support the development of knowledge based

Pero esas diferencias permiten soste- ner (C) y rechazar (NJ), respecto al caso CANICA, al tiempo que se rechaza que la creencia perceptiva que tuviera el sujeto, Pedro, en el caso

This work proposes a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing knowledge hierarchies, modeled as generic Directed

By using a representation of directed graphs in terms of first-order logical structures, we defined organiza- tional knowledge as integrated relevant information about