Convex Optimization as a Tool for Correcting Dissimilarity Matrices for Regular Minimality

Matthias Trendtel and Ali ¨Unl ¨u

Abstract Fechnerian scaling as developed by Dzhafarov and Colonius (e.g., Dzhafarov and Colonius, J Math Psychol 51:290–304, 2007) aims at imposing a metric on a set of objects based on their pairwise dissimilarities. A necessary condition for this theory is the law of Regular Minimality (e.g., Dzhafarov EN, Colonius H (2006) Regular minimality: a fundamental law of discrimination. In:

Colonius H, Dzhafarov EN (eds) Measurement and representation of sensations.

Erlbaum, Mahwah, pp. 1–46 ). In this paper, we solve the problem of correcting a dissimilarity matrix for Regular Minimality by phrasing it as a convex optimization problem in Euclidean metric space. In simulations, we demonstrate the usefulness of this correction procedure.

1 Preliminaries

For a set of stimuliX D fx₁; x₂; : : : ; x_ng, let W XX ! RC be some discri-minability measure, mapping pairs of stimulix_i 2 X andx_j 2 X into the set of nonnegative reals. For example, a pair of line segments

x_i; x_j

may be repeatedly presented to an observer (or a group of observers), and

x_i; x_j

may be estimated by the frequency of responses “they are different (in length)”. Possible examples are numerous, and more can be found inDzhafarov and Colonius(2006b). In such a pairwise presentation paradigm, even if stimulix_i andx_j have the same value

M. Trendtel ()

Division for Methodology and Statistics, Federal Institute for Educational Research, Innovation & Development of the Austrian School System (BIFIE), Salzburg, Austria e-mail:m.trendtel@bifie.at

A. ¨Unl¨u

Chair for Methods in Empirical Educational Research, TUM School of Education, Technische Universität München, München, Germany

e-mail:ali.uenlue@tum.de

B. Lausen et al. (eds.), Algorithms from and for Nature and Life, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-319-00035-0 16,

165

166 M. Trendtel and A. ¨Unl¨u

(say, they are line segments of the same length), they must occupy different spatial and/or temporal positions. This difference in spatial or temporal locations does not enter in the comparison, but it may affect the way people perceive lengths, and this in turn may lead to .x_i; x_i/being larger than

. Therefore the notion of observation area was introduced (Dzhafarov 2002;Dzhafarov and Colonius 2006b).

Henceforth we call first observation area the set of stimuli presented, say, first in time or on the left, and second observation area the set of stimuli presented second or on the right.

In the context of pairwise same-different comparisons, it is a well-established empirical fact that

x_i; x_j

, however obtained, is not a metric. So the data have to be modified to make such data-analytic procedures as multidimensional scaling (MDS; e.g., Kruskal and Wish 1978) applicable. Also, the class of allowable metrics in MDS is usually a priori restricted to Minkowskian metrics in low-dimensional spaces of real-component vectors. By contrast, Fechnerian scaling (FS) deals directly with -data subject to same-different comparisons, and it imposes no a priori restrictions on the class of metrics computed from . The only property of the -data which is required by FS is Regular Minimality (RM).

This principle postulates the existence of pairs of stimuli that are mutually the most similar ones to each other. In regard to discrimination probability matrices (discrimination probabilities being presented as annnmatrix D

i;jD1;:::;n), this means that a matrix satisfies RM, iff every row has a unique minimum entry which is also the unique minimum entry in its column. For instance, the matrix of -data similar stimuli. Here, the first symbol in every pair refers to a row object (all row objects belonging to one, the “first”, observation area) and the second symbol refers to a column object (in the “second” observation area).

FS imposes a metricG, if RM is satisfied. For every pair of objects xi; xj

we consider all possible chains of objects

xi; xk1; : : : ; xkr; xj

. Presupposing RM, for each such a chain we compute what is called its psychometric length. Then we find a chain with minimal psychometric length, and take this minimal value for the quasidistance fromxi toxj (referred to as the oriented Fechnerian distance).

Quasidistance is a pairwise measure which satisfies all metric properties except for symmetry. In FS we symmetrize this quasimetric and transform it into a metric, taking it for the “true” or “overall” Fechnerian distanceG.xi; xj/betweenxi and xj. (For a detailed discussion refer toDzhafarov(2002),Dzhafarov and Colonius (2006a),Dzhafarov and Colonius (2006b), Dzhafarov and Colonius(2007), and Dzhafarov et al.(2011).)

Convex Optimization as a Tool for Correcting Dissimilarity Matrices for. . . 167 RM can be generalized to nonnegative reals:

Definition 1. Let S_n D f W f1; : : : ; ng ! f1; : : : ; ng W a permutationg be the group of permutations onf1; : : : ; ng. A matrix D

i;jD1;:::;n2Rⁿ_C², viewed as the vector . 11; 12; : : : ; 1n; 21; 22; : : : ; 2n; : : : ; n1; n2; : : : ; nn/^T, of dis-criminability measures satisfies RM, iff there exists an 2 Snsuch that, for any iD1; : : : ; n,

i .i /< ijforj ¤ .i /; and

i .i /< j .i /forj ¤i:

Then,is also uniquely determined, and we say thatsatisfies RM in-form. The set of allnnmatrices satisfying RM in-form (for one given permutation) is denoted by RMⁿ.

The square matrix is the matrix of true (unknown) discrimination probabilities

ij in the population of reference. Same-different comparisons can be modeled by a Bernoulli random variable:1 if the response is “different”, with “success”

probability ij, and 0 if the response is “same”, with probability 1 ij. The relative success counts O_ijfrom independent samples of independent responses are the maximum likelihood estimators (MLEs) for the ij’s. The population matrix is unknown and estimated from the data using its MLEO D

O_ij

i;jD1;:::;n. The observed data matrix O may not satisfy RM, although the underlying population matrix may do. In other words, the compliance of a matrix of discrimination probabilities with RM must be tested statistically. In the literature, a parametric hypothesis test based on a measure was proposed byUnl¨u et al.¨ (2010) and a nonparametric test based on permutations was derived in Dzhafarov et al.

(2011). However, these tests do not allow correcting the data for RM.

In this paper, a method is proposed for correcting a dissimilarity matrix for RM in an “optimal” way, with respect to the Euclidean metric. We interpret andO as points in then²-dimensional nonnegative orthant and propose finding that RM-compliant pointM of the orthant which minimizes the Euclidean normk O Mk (up to arbitrary > 0; see Sect.3). Stated in terms of convex optimization, this problem is solved by expressing it as an equivalent convex optimization problem:

minimize g.M / subject to M 2D;

wheregWRⁿC² !RCis a convex function andD RⁿC²is a convex feasible set.

168 M. Trendtel and A. ¨Unl¨u

Dans le document Algorithms from and for Nature (Page 173-176)