Perceiving the World through Statistics:
Some Dimensional Thoughts
Michel Lutz
OCTO Technology
Rodolphe Le Riche
CNRS and Ecole des Mines de Saint-Etienne
ENBIS-14
Motivation
• Statistical analysis: positive facts or pure mental construction ?
• Our belief: standpoint over the contingency (Desrosières,
2010), knowledge creation, useful for decision-making,
based on our capacity to grasp our multidimensionnal world
• Consequence: « true », but inevitably partial and distorted descriptions of a complex world
Statistics analysis ≠ THE Truth
• Risks that can be avoided/controlled by a careful analysis:
• Spurious correlation due to hidden variables, large samples
(Granville, 2013) or compositional data (Pearson, 1897)
• Spurious regression caused by the presence of a unit root
(Granger & Newbold, 1974)
• Bertrand paradox: probabilities may be influed by
mechanisms or methods that produces the random variables
• Inevitable consequences of the analysis:
• Distortions of our infinite-dimensionnal world, which cannot be analyzed as a whole (Rucker, 1984)
• This is the scope of this presentation
Metaphor: welcome to Flatland (Abbott, 2013)
Proposition: when crunching data, we are travelling through hyperspaces
Knowledge D
(Tsoukas & Vladimirou, 2001 ; Alavi &
Leidner, 2001 ; Tuomi, 1999)
Proposition
knowledge creation
=
a dynamic back & forth process
between higher and lower dimensional hyperspaces
Going down through dimensions
Dreal = ∞
Ddata = M
Dinformation = m
Dknowledge = n
∞ > M > m > n
D
Why? Cognitive limitations
• Capacity of perception
• High dimension is not intuitive
• Interpretation is difficult
Why? Technical limitations
• Capacity to measure, store and process data are limited
• Curse of dimensionality: the higher the
dimension, the more data
(Donoho, 2000)Consequence: distortions
x1
x2 A
B
C
Dimension: number of variable
Event: ... is closer to ... than ... (Euclidean norm)
[Be A(xa,ya) and B(xb,yb) ; dist.(AB) = ]
• Data: A is farther to B than C.
• According to x1 : A is closer to B than C.
• According to x2 : A is farther to B than C.
(xb −xa)²+(yb −ya)²
Consequence: distortions
CONTRADICTION
Definition: “is contradictory a knowledge of the reality that is both A and non-A”
Partial knowledge
Going up through dimensions thanks to technical innovations
• « Big Data » architectures: more
measures, more storage, more processing capacities
• Machinery can help us to grasp more
complexity
Going up through dimensions thanks to ad hoc quantitative analysis
• Useful to overcome some contradictions…
Ex. : concatenation, SVM classification (Vapnik, 1995 ; Aizerman & al., 1964)
Going up through dimensions
• … can also bring to other contradictions…
Ex. : SVM kernel trick several kernels can be used !
• … and cognitive limitations still exist !
Perceiving hyperspaces (> 3) is not straightforward for human and needs specific learning
The knowledge: a dynamic back & forth process between higher and lower dimensional hyperspaces
Too much
dimensions Reduction
Contradiction Knowledge
More dimensions Data
Human knowledge is dynamic, recurring and constantly surpassing
Statistics
Statistics
D
Conclusions
• Dynamic human knowledge process: towards complex thinking (Morin, 2005)
• Statistics serve human cognition: tools and methods to build reasoned and dialectic knowledge
• Debate:
• Machine (data processing) + human (intuition, interpretation) = increase human knowledge for decision-making
How far can the limits of human intelligence be pushed by artificial intelligence?
• Could human limitations be surpassed by artificial intelligence?
Eg. Could human be excluded of the knowledge process for decision-making?
Black box models
Does a machine generate knowledge for itself?
References
Abbott, Flatland, Librio Littérature, 2013
Alavi & Leidner, Knowledge management and knowledge management systems: conceptual foundations and research issues, MIS quarterly, 2001
Aizerman, Braverman & Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 1964
Desrosières, La politique des grands nombres – Histoire de la raison statistique, La Découverte, 2010Donoho, Aide-Mémoire. High-dimensional data analysis: the curses and blessings of
dimensionnality, Departement of Statistics, Stanford University, 2000
Granger & Newbold, Spurious Regression in Econometrics, Journal of Econometrics, 1974 Granville, The curse of big data, 2013, http://www.analyticbridge.com/profiles/blogs/the- curse-of-big-data
Morin, Introduction à la pensée complexe, Seuil, 2005
Pearson, Mathematical Contributions to the Theory of Evolution—On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs,
Proceedings of the Royal Society of London, 1897
Rucker, The fourth dimension – A guided tour of the higher universes, Houghton Mifflin Company, 1984
Tsoukas & Vladimirou, What is organizational knowledge, Journal of Management Studies, 2001Tuomi, Data is more than knowledge – Implications of the reversed knowledge hierarchy for knowledge management and organizational memory, Journal of Management Information Systems, 1999
Vapnik, The nature of statistical learning theory, Springer-Verlag, 1995
Thank you ENBIS-14
Michel Lutz & Rodolphe Le Riche