Supplement material for “Walsh-Hadamard Variational Inference for Bayesian Deep Learning”
Simone Rossi∗ Data Science Department
EURECOM (FR) simone.rossi@eurecom.fr
Sébastien Marmin* Data Science Department
EURECOM (FR)
sebastien.marmin@eurecom.fr
Maurizio Filippone Data Science Department
EURECOM (FR)
maurizio.filippone@eurecom.fr
A Matrix-variate Posterior Distribution Induced by
WHVIWe derive the parameters of the matrix-variate distributionq(W) =MN(M,U,V)of the weight matrixW˜ ∈RD×Dgiven byWHVI,
W˜ =S1Hdiag(˜g)HS2 with g˜∼ N(µ,Σ). (1) The meanM =S1Hdiag(µ)HS2derives from the linearity of the expectation. The covariance matricesUandV are non-identifiable: for any scale factors >0, we haveMN(M,U,V)equals MN(M, sU,1sV). Therefore, we constrain the parameters such thatTr(V) = 1. The covariance matrices verify (see e.g. Section 1 in the supplement of [1])
U =E
(W −M)(W −M)>
V = 1 Tr(U)E
(W −M)>(W −M) .
The Walsh-Hadamard matrixH is symmetric. Denoting byΣ1/2 a root of Σand considering ∼ N(0,I), we have
U =E h
S1Hdiag(Σ1/2)HS22Hdiag(Σ1/2)HS1
i
. (2)
If we define the matrixT2∈RD×D
2where theithrow is the column-wise vectorization of the matrix (Σ1/2i,j (HS2)i,j0)j,j0≤D. We have
(T2T2>)i,i0 =
D
X
j,j0=1
Σ1/2i,j Σ1/2i0,j(HS2)i,j0(HS2)i0,j0
=
D
X
j,j0,j00=1
Σ1/2i,j (HS2)i,j0E[jj00]Σ1/2i0,j00(HS2)i0,j0
=
D
X
j0=1
E
D
X
j=1
jΣ1/2i,j (HS2)i,j0
D
X
j00=1
j00Σ1/2i0,j00(HS2)i0,j0
=E
diag(Σ1/2)HS22Hdiag(Σ1/2)
i,i0
.
34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
∗
Using (2), a root ofU =U1/2U1/2>can be found:
U1/2=S1HT2. (3) Similarly forV, we have
V1/2= 1
pTr(U)S2HT1,
withT1=
vect
Σ1,:(HS1)>1,:>
... vect
ΣD,:(HS1)>d,:>
. (4)
B Geometric Interpretation of
WHVIThe matrixAin Section 2.2 expresses the linear relationship between the weightsW =S1HGHS2
and the variational random vectorg, i.e.vect(W) =Ag. Recall the definition of
A=
S1Hdiag(v1) ... S1Hdiag(vD)
, withvi= (S2)i,i(H):,i. (5) We show that aLQ-decomposition ofAcan be explicitly formulated.
Proposition. LetAbe aD2×Dmatrix such thatvect(W) =Ag, whereW is given byW = S1Hdiag(g)HS2. Then aLQ-decomposition ofAcan be formulated as
vect(W) = [s(2)i S1Hdiag(hi)]i=1,...,Dg
=LQg, (6)
wherehiis theith column ofH,L = diag((s(2)i s)i=1,...,D),diag(s(1)) = S1,diag(s(2)) = S2, andQ= [Hdiag(hi)]i=1,...,D.
Proof. Equation(6)derives directly from block ma- trix and vector operations. AsLis clearly lower tri- angular (even diagonal), let us proof thatQhas or- thogonal columns. Defining thed×dmatrixQ(i) = Hdiag(hi), we have:
Q>Q=
D
X
i=1
Q(i)>Q(i)
=
D
X
i=1
diag(hi)H>Hdiag(hi)
=
D
X
i=1
diag(h2i) =
D
X
i=1
1 DI=I.
Figure 1:Diagrammatic representation ofWHVI. The cube represent the high dimensional parameter space. The variational posterior (mean in orange) evolves during optimization in the (blue) subspace whose orientation (red) is controlled byS1 and S2.
This decomposition gives direct insight on the role of the Walsh-Hadamard transforms: with com- plexityDlog(D), they perform fast rotationsQof vectors living in a space of dimensionD(the plane inFig. 1) into a space of dimensionD2(the cube in Figure1). Treated as parameters gathered inL,S1andS2control the orientation of the subspace by distortion of the canonical axes.
We empirically evaluate the minimum RMSE, as a proxy for some measure of average distance, betweenW and any given pointΓ. More precisely, we compute forΓ∈RD×D,
min
s1,s2,g∈RD
1
D||Γ−diag(s1)Hdiag(g)Hdiag(s2)||Frob. (7)
Fig. 2shows this quantity evaluated forΓsampled with i.i.dU(−1,1)with increasing value ofD.
The bounded behavior suggests thatWHVIcan approximate any given matrices with a precision that does not increase with the dimension.
0.0 1.0 2.0 3.0
Dimension
Empiricaldistance
4 16 32 64 128
Figure 2:Distribution of the minimumRMSEbetweenS1HGHS2and a sample matrix with i.i.d.U(−1,1) entries. For each dimension, the orange dots represent 20 repetitions. The median distance is displayed in black.
Few outliers (with distance greater than 3.0) appeared, possibly due to imperfect numerical optimization. They were kept for the calculation of the median but not displayed.
C Additional Details on Normalizing Flows
In the general setting, given a probabilistic model with observationsx, latent variableszand model parametersθ, by introducing an approximate posterior distributionqφ(z)with parametersφ, the variational lower bound to the log-marginal likelihood is defined as
KL{qφ(z)||p(z|x)}=Eqφ(z)[logqφ(z)−logp(z|x)]
=Eqφ(z)[logqφ(z)−logpθ(x,z)−logp(x)]
≤ −Eqφ(z)[logpθ(x|z)−logqφ(z) + logp(z)] (8) wherepθ(x|z)is the likelihood function withθmodel parameters andp(z)is the prior on the latents.
The objective is then to minimize the negative variational bound (NELBO):
L(θ, φ) =−Eqφ(z)logpθ(x|z) +KL{qφ(z)||p(z))} . (9) Consider an invertible, continuous and differentiable functionf :RD →RD. Given˜z0∼q(z0), thenz˜1=f(˜z0)followsq(z1)defined as
q(z1) =q(z0)
det ∂f
∂z0
−1
. (10)
As a consequence, afterKtransformations the log-density of the final distribution is logq(zK) = logq(z0)−
K
X
k=1
log
det∂fk−1
∂zk−1
. (11)
We shall definefk(zk−1;λk)thekthtransformation which takes input from the previous flowzk−1
and has parametersλk. The final variational objective is
L(θ, φ) =−Eqφ(z)[logpθ(x|z)] +KL{qφ(z)||p(z)})
=Eqφ(z|x)[−logpθ(x|z)−logp(z) + logqφ(z)]
=Eq0(z0)[−logpθ(x|zK)−logp(zK) + logqK(zK)]
=Eq0(z0)[−logpθ(x|zK)−logp(zK) + logq0(z0)
−
K
X
k=1
log
det∂fk(zk−1;λk)
∂zk−1
#
=−Eq0(z0)logpθ(x|z) +KL{q0(z0)||p(zK)}
−Eq0(z0) K
X
k=1
log
det∂fk(zk−1;λk)
∂zk−1
. (12)
Setting the initial distributionq0to a fully factorized GaussianN(z0|µ,σI)and assuming a Gaussian prior on the generatedzK, theKLterm is analytically tractable. A possible family of transformation is theplanar flow[10]. For theplanar flow,f is defined as
f(z) =z+uh(w>z+b), (13)
whereλ= [u∈RD, w∈RD, b∈R]andh(·) = tanh(·). This is equivalent to a residual layer with single neuronMLP– as argued by Kingma et al. [5]. The log-determinant of the Jacobian offis
log
det∂f
∂z
=
det(I+u[h0(w>z+b)w]>)
=
1 +u>wh0(w>z+b)
. (14) Although this is a simple flow parameterization, a planar flow requires onlyO(D)parameters and thus it does not increase the time/space complexity ofWHVI. Alternatives can be found in [10,13,5,8].
D Additional Results
D.1 Experimental Setup for Bayesian DNN The experiments on BayesianDNNare run with the following setup. ForWHVI, we used a zero-mean prior overgwith fully factorized covarianceλI;λ= 10−5was chosen to obtain sensible variances in the output layer. It is possible to design a prior overgsuch that the prior onW has constant marginal variance and low cor- relations although empirical evaluations showed not to yield a significant improve- ment compared to the previous (simpler) choice. In the final implementation of
WHVIthat we used in all experiments,S1 andS2 are optimized. The dropout rate ofMCDis set to 0.005. We used classic Gaussian likelihood with optimized noise variance for regression and softmax likeli- hood for classification.
Table 1:List of dataset used in the experiments
NAME TASK N. D-IN D-OUT
EEG CLASS. 14980 14 2
MAGIC CLASS. 19020 10 2
MINIBOO CLASS. 130064 50 2
LETTER CLASS. 20000 16 26
DRIVE CLASS. 58509 48 11
MOCAP CLASS. 78095 37 5
CIFAR10 CLASS. 60000 3×28×28 10
BOSTON REGR. 506 13 1
CONCRETE REGR. 1030 8 1
ENERGY REGR. 768 8 2
KIN8NM REGR. 8192 8 1
NAVAL REGR. 11934 16 2
POWERPLANT REGR. 9568 4 1
PROTEIN REGR. 45730 9 1
YACHT REGR. 308 6 1
BOREHOL REGR. 200000 8 1
HARTMAN6 REGR. 30000 6 1
RASTRIGIN5 REGR. 10000 5 1
ROBOT REGR. 150000 8 1
OTLCIRCUIT REGR. 20000 6 1
Training is performed for 500 steps with fixed noise variance and for other 50000 steps with optimized noise variance. Batch size is fixed to 64 and for the estimation of the expected loglikelihood we used 1 Monte Carlo sample at train-time and 64 Monte Carlo samples at test-time. We choose the Adam optimizer [4] with exponential learning rate decayλt+1=λ0(1 +γt)−p, withλ0= 0.001,p= 0.3, γ= 0.0005andtbeing the current iteration.
Similar setup was also used for the BayesianCNNexperiment. The only differences are the batch size – increased to 256 – and the optimizer, which is run without learning rate decay.
D.2 Regression Experiments on Shallow Models
For a complete experimental evaluation ofWHVI, we also use the experimental setup proposed by Hernandez-Lobato and Adams [3] and adopted in several other works [2,7,14]. In this configuration, we use one hidden layer with 50 hidden units for all datasets with the exception ofPROTEINwhere the number of units is increased to 100. Results are reported inTable 2.
Table 2:TestRMSEand testMNLLfor regression datasets following the setup in [3].
TEST ERROR TEST MNLL
MODEL MCD MFG NNG WHVI MCD MFG NNG WHVI
DATASET
BOSTON 3.40(0.66) 3.04(0.64) 2.74(0.12) 2.56(0.15) 5.04(1.76) 3.19(0.89) 2.45(0.03) 2.55(0.15) CONCRETE 4.60(0.53) 5.24(0.53) 5.02(0.12) 5.01(0.25) 2.96(0.23) 3.03(0.15) 3.04(0.02) 2.95(0.06) ENERGY 1.18(0.03) 1.52(0.09) 0.48(0.02) 1.20(0.07) 3.00(0.07) 3.49(0.11) 1.42(0.00) 3.01(0.12) KIN8NM 0.09(0.00) 0.10(0.00) 0.08(0.00) 0.12(0.01) −1.09(0.04) −1.01(0.04) −1.15(0.00) −0.78(0.10) NAVAL 0.00(0.00) 0.01(0.00) 0.00(0.00) 0.01(0.00) −9.93(0.01) −6.48(0.02) −7.08(0.03) −6.25(0.01) POWERPLANT 4.20(0.12) 4.23(0.13) 3.89(0.04) 4.11(0.12) 2.76(0.03) 2.77(0.03) 2.78(0.01) 2.74(0.03) PROTEIN 4.35(0.04) 4.74(0.05) 4.10(0.00) 4.64(0.07) 2.80(0.01) 2.89(0.01) 2.84(0.00) 2.86(0.01) YACHT 1.72(0.32) 1.78(0.45) 0.98(0.08) 0.96(0.20) 2.73(0.74) 2.02(0.46) 2.32(0.00) 1.28(0.22)
D.3 ConvNets architectures
64 64 64 64 64
128 128 128
128 128
256 256 256
256 256 512 512
512
512 512 10
Figure 3:Architecture layout ofRESNET18.
For the experiments on Bayesian convolutional neural networks, we used architectures adapted to
CIFAR10 (see Tables3,4and5).
Table 3:ALEXNET
LAYER DIMENSIONS CONV 64×3×3×3 MAXPOOL
CONV 192×64×3×3 MAXPOOL
CONV 384×192×3×3 CONV 256×384×3×3 CONV 256×256×3×3 MAXPOOL
LINEAR 4096×4096 LINEAR 4096×4096
LINEAR 10×4096
Table 4:VGG16
LAYER DIMENSIONS CONV 32×3×3×3 CONV 32×32×3×3 MAXPOOL
CONV 64×32×3×3 CONV 64×64×3×3 MAXPOOL
CONV 128×64×3×3 CONV 128×128×3×3 CONV 128×128×3×3 MAXPOOL
CONV 256×128×3×3 CONV 256×256×3×3 CONV 256×256×3×3 MAXPOOL
CONV 256×256×3×3 CONV 256×256×3×3 CONV 256×256×3×3 MAXPOOL
LINEAR 10×256
Table 5:RESNET18
LAYER DIMENSIONS
RESNET BLOCK
3×3,64 3×3,64
×2
RESNET BLOCK
3×3,128 3×3,128
×2
RESNET BLOCK
3×3,256 3×3,256
×2
RESNET BLOCK
3×3,512 3×3,512
×2 AVGPOOL
LINEAR 10×512
Table 6:Complexity table forGPs with random feature and inducing points approximations. In the case of random features, we include both the complexity of computing random features and the complexity of treating the linear combination of the weights variationally (usingVIandWHVI).
COMPLEXITY
SPACE TIME
MEAN FIELD-RF O(DINNRF) +O(NRFDOUT) O(DINNRF) +O(NRFDOUT) WHVI-RF O(DINNRF) +O(√
NRFDOUT) O(DINNRF) +O(DOUTlogNRF)
INDUCING POINTS O(M) O(M3)
Note:Mis the number of pseudo-data/inducing points andNRFis the number of random features used in the kernel approximation.
D.4 Results - Gaussian Processes with Random Feature Expansion
We testWHVIfor scalableGPinference, by focusing onGPs with random feature expansions [6]. In
GPmodels, latent variablesf are given a priorp(f) =N(0|K); the assumption of zero mean can be easily relaxed. Given a random feature expansion of the kernel martix, sayK≈ΦΦ>, the latent variables can be rewritten as:
f =Φw (15)
withw∼ N(0,I). The random featuresΦare constructed by randomly projecting the input matrix Xusing a Gaussian random matrixΩand applying a nonlinear transformation, which depends on the choice of the kernel function. The resulting model is now linear, and considering regression problems such thaty=f +εwithε∼ N(0, σ2I), solvingGPs for regression becomes equivalent to solving standard linear regression problems. For a given set of random features, we treat the weights of the resulting linear layer variationally and evaluate the performance ofWHVI.
By reshaping the vector of parameterswof the linear model into aD×Dmatrix,WHVIallows for the linearizedGPmodel to reduce the number of parameters to optimize (seeTable 6). We compare
WHVIwith two alternatives; one isVIof the Fourier featuresGPexpansion that uses less random features to match the number of parameters used inWHVI, and another is the sparse Gaussian process implementation ofGPFLOW[9] with a number of inducing points (rounded up) to match the number of parameters used inWHVI.
We report the results on five datasets (10000 ≤N ≤200000,5≤D ≤8, seeTable 1). The data sets are generated from space-filling evaluations of well known functions in analysis of computer
41 73 137 265 0.025
0.050 0.075 0.100
Parameters
Testerror
Borehole
41 73 137 265 0.35
0.40 0.45
Parameters Robot
39 71 135 263 0.15
0.20 0.25 0.30
Parameters Hartman6
38 70 134 262 16
18 20 22
Parameters
Testerror
Rastrigin5
39 71 135 263 0.1
0.2 0.3
Parameters Otlcircuit
GP RFF - Mean field GP RFF - WHVI Sparse GP - Mean field
105 329 1161 4361 2
4 6 8
·10−2
Parameters
Testerror
Borehole
105 329 1161 4361 0.4
0.45
Parameters Robot
103 327 1159 4359 0.15
0.2 0.25 0.3
Parameters Hartman6
102 326 1158 4358 16
18 20 22
Parameters Rastrigin5
103 327 1159 4359 0.1
0.2 0.3
Parameters Otlcircuit
GP RFF - Full covariance GP RFF - WHVI Full covariance Sparse GP - Full covariance
Figure 4:Comparison of test error w.r.t. the number model parameters (top:mean field,bottom:full covariance).
experiments (see e.g. [12]). Dataset splitting in training and testing points is random uniform, 20%
versus 80 %. The input variables are rescaled between 0 and 1. The output values are standardized for training. AllGPs have the same prior (centeredGPwithRBFcovariance), initialized with equal hyperparameter values: each of theDlengthscale top
D/2, theGPvariance to1, the Gaussian likelihood standard deviation to0.02(prior observation noise). The training is performed with 12000 steps of Adam optimizer. The observation noise is fixed for the first 10000 steps. Learning rate is 6×10−4, except for the datasetHARTMAN6 with a learning rate of5×10−3. SparseGPs are run with whitened representation of the inducing points.
The results are shown inFig. 4 with diagonal covariance for the three variational posteriors and with full covariance. In both mean field and full covariance settings, this variant ofWHVIusing the reshaping ofW into a column largely outperforms the directVIof Fourier features. However, it appears that this improvement of the random feature inference forGPs is not enough to reach the performance ofVIusing inducing points. Inducing point approximations are based on the Nystroöm approximation of kernel matrices, which are known to lead to lower approximation error on the elements on the kernel matrix compared to random features approximations. This is the reason we attribute to the lower performance ofWHVIcompared to inducing points approximations in this experiment.
D.5 Extended results -DNNs
Being able to increase width and depth of a model without drastically increasing the number of variational parameters is one of the competitive advantages ofWHVI.Fig. 5shows the behavior of
WHVIfor different network configurations. At test time, increasing the number of hidden layers and the numbers of hidden features allow the model to avoid overfitting while delivering better performance. This evidence is also supported by the analysis of the testMNLLduring optimization of theELBO, as showed inFig. 6.
Thanks toWHVIstructure of the weights matrices, expanding and deepening the model is beneficial not only at convergence but during the entire learning procedure as well. Furthermore, the derived
NELBOis still a valid lower bound of the true marginal likelihood and, therefore, a suitable objective function for model selection. Differently from the issue addressed in [11], during our experiments we didn’t experience problems regarding initialization of variational parameters. We claim that this is possible thanks to both the reduced number of parameters and the effect of the Walsh-Hadamard transform.
Timing profiling of the Fast Walsh-Hadamard transform Key to the log-linear time complexity is the Fast Walsh-Hadamard transform, which allows to perform the operationHxinO(DlogD) time without requiring to generate and storeH. For our experimental evaluation, we implemented aFWHToperation inPYTORCH(v.0.4.1) in C++ and CUDA to leverage the full computational capabilities of modernGPUs. Fig. 8presents a timing profiling of our implementation versus the naivematmul(batch size of 512 samples and profiling repeated 1000 times). The breakeven point for theCPUimplementation is in the neighborhood of512/1024features, while onGPUwe seeFWHTis consistently faster.
64 128 256 512 3.8
3.9 4
Testerror
Powerplant
64 128 256 512 4
4.2 4.4
Protein
64 128 256 512 2
4 6
8·10−2 Mocap
64 128 256 512 0.38
0.4 0.42
Letter
64 128 256 512 2.66
2.68 2.7 2.72
Features
TestMNLL
64 128 256 512 2.7
2.75 2.8
Features
64 128 256 512 0.1
0.15 0.2 0.25
Features
64 128 256 512 0.9
0.95 1 1.05
Features 2 Hidden layers 3 Hidden layers
Figure 5:Analysis of model capacity for different features and hidden layers.
1 1.2 1.4 1.6
TestMNLL
Letter
0.18 0.2 0.22 0.24 0.26
Miniboo
0 10000 20000 30000 40000 50000
0.1 0.2 0.3
Step
TestMNLL
Mocap
0 10000 20000 30000 40000 50000
2.6 2.65 2.7 2.75 2.8
Step Powerplant
64 12 256 512 2 Hidden layers 3 Hidden layers
Figure 6:Comparison of test performance. Being able to increase features and hidden layers without worrying about overfitting/overparametrize the model is advantageous not only at convergence but during the entire learning procedure
101 102 103
Inferencetime[ms]
Boston
102 103 104
Powerplant
101 102 103
Yacht
101 102 103
Concrete
64 128 256 512 102
103 104
Hidden units Kin8NM
64 128 256 512 101
102 103
Hidden units Energy
64 128 256 512 103
104
Hidden units Protein
64 128 256 512 102
103 104
Hidden units Naval
WHVI(this work) MCD NNG
Figure 7:Inference time on the test set with 128 batch size and 64 Monte Carlo samples. Experiment repeated 100 times. Additional datasets available in the Supplement.
26 27 28 29 210 211
10−1 100 101
Dimension
Time
FWHT Matmul CUDA CPU
1 1.2 1.4 1.6
0 5 10 15 20
Matmul
Distribution of execution time vs batch size
0.06 0.07 0.08 0.09
0 100 200 300 400
Time
FWHT
64 128
256 512
1024
Figure 8:On the (left), time performance versus number of features (D) with batch size fixed to 512. On the (right) distribution of inference time versus batch size (D=512) withMATMULandFWHTonGPU.
Table7:TesterrorofBayesianDNNwith2hiddenlayersonregressiondatasets.NF:numberofhiddenfeatures TESTERROR DATASETBOSTONCONCRETEENERGYKIN8NMNAVALPOWERPLANTPROTEINYACHT MODELNF MCD643.80±0.885.43±0.692.13±0.120.17±0.220.07±0.00–4.36±0.122.02±0.51 1283.91±0.865.12±0.792.07±0.110.09±0.000.30±0.303.97±0.144.23±0.101.90±0.54 2563.62±1.015.03±0.742.04±0.110.10±0.000.07±0.003.91±0.114.09±0.112.09±0.66 5123.56±0.854.81±0.792.03±0.120.09±0.000.07±0.003.90±0.103.87±0.112.09±0.55 MFG644.06±0.726.87±0.542.42±0.120.11±0.000.01±0.004.38±0.124.85±0.124.31±0.62 1284.47±0.858.01±0.413.10±0.140.12±0.000.01±0.004.52±0.134.93±0.117.01±1.22 2565.27±0.989.41±0.544.03±0.100.13±0.000.01±0.004.79±0.125.07±0.128.71±1.31 5126.04±0.9010.84±0.464.90±0.110.16±0.000.01±0.005.53±0.165.26±0.1010.34±1.45 NNG643.20±0.266.90±0.591.54±0.180.07±0.000.00±0.003.94±0.053.90±0.023.57±0.70 1283.56±0.438.21±0.551.96±0.280.07±0.000.00±0.004.23±0.094.57±0.475.16±1.48 2564.87±0.948.18±0.573.41±0.550.07±0.000.00±0.004.07±0.004.88±0.005.60±0.65 5125.19±0.6211.67±2.065.12±0.370.10±0.000.00±0.004.97±0.00–5.91±0.80 WHVI643.33±0.825.24±0.770.73±0.110.08±0.000.01±0.004.07±0.114.49±0.120.82±0.18 1283.14±0.714.70±0.720.58±0.070.08±0.000.01±0.004.00±0.124.36±0.110.69±0.16 2562.99±0.854.63±0.780.52±0.070.08±0.000.01±0.003.95±0.124.24±0.110.76±0.13 5122.99±0.694.51±0.800.51±0.040.07±0.000.01±0.003.96±0.124.14±0.090.71±0.16 Table8:TestMNLLofBayesianDNNwith2hiddenlayersonregressiondatasets.NF:numberofhiddenfeatures TESTMNLL DATASETBOSTONCONCRETEENERGYKIN8NMNAVALPOWERPLANTPROTEINYACHT MODELNF MCD645.67±2.353.19±0.284.19±0.15−0.78±0.69−2.68±0.00–2.79±0.012.85±1.02 1286.90±2.933.20±0.364.15±0.15−0.87±0.02−1.00±2.272.74±0.052.76±0.022.95±1.27 2566.60±3.593.31±0.454.13±0.15−0.70±0.05−2.70±0.002.75±0.042.72±0.013.79±1.88 5127.28±3.313.45±0.594.13±0.17−0.76±0.03−2.71±0.002.77±0.042.68±0.023.76±1.65 MFG642.83±0.333.26±0.084.42±0.10−0.92±0.02−6.24±0.012.80±0.032.90±0.012.85±0.24 1282.99±0.413.41±0.054.91±0.09−0.83±0.02−6.23±0.012.83±0.032.92±0.013.38±0.29 2563.33±0.533.57±0.075.44±0.05−0.69±0.01−6.22±0.012.89±0.022.95±0.013.65±0.32 5123.69±0.543.73±0.055.83±0.05−0.49±0.01−6.19±0.013.04±0.032.98±0.013.86±0.31 NNG642.69±0.063.40±0.151.95±0.08−1.14±0.05−5.83±1.492.80±0.012.78±0.012.71±0.17 1282.72±0.093.56±0.082.11±0.12−1.19±0.04−6.52±0.092.86±0.022.95±0.123.06±0.27 2563.04±0.223.52±0.072.64±0.17−1.19±0.03−5.73±0.212.84±0.003.02±0.013.15±0.13 5123.13±0.143.91±0.203.07±0.07−0.80±0.00−5.30±0.053.51±0.00–3.21±0.14 WHVI643.68±1.403.19±0.342.18±0.37−1.13±0.02−6.25±0.012.73±0.032.82±0.012.56±1.33 1284.33±1.803.17±0.372.00±0.60−1.19±0.04−6.25±0.012.71±0.032.79±0.011.80±1.01 2564.99±2.653.35±0.592.06±0.72−1.23±0.04−6.25±0.012.70±0.032.77±0.011.53±0.53 5125.41±2.303.33±0.562.05±0.46−1.22±0.04−6.25±0.012.70±0.032.74±0.011.37±0.57