compositional data analysis

Top PDF compositional data analysis:

Relations between socio-economic factors and nutritional diet in Vietnam from 2004 to 2014: new insights using compositional data analysis

Relations between socio-economic factors and nutritional diet in Vietnam from 2004 to 2014: new insights using compositional data analysis

The aim of this study is to contribute to this literature by analyzing the evolution of diet patterns in Vietnam, focusing on macronutrient shares in the diet, instead of macronutrient volumes. This approach allows us to take into account the dependence among macronutrients and to avoid the problem of overestimation of total calorie intake when using household survey data. We use compositional data analysis (CODA) in order to analyze and to mo- del the relative information contained in those volumes and shares. CODA is a wellestablished eld of statistics with diverse elds of application, such as geology or economics ( Pawlowsky-Glahn and Buccianti (2011); Pawlowsky- Glahn et al. (2015)).This method has been recently applied in medical and nutritional epidemiology studies( Dumuid et al. (2017); Leite (2016); Mert et al. (2016)). A composition is a vector of D components for which the relative information is relevant (for example a vector of D shares). It can be
En savoir plus

28 En savoir plus

Representation Learning of Compositional Data

Representation Learning of Compositional Data

first.last@{u-bordeaux.fr,data61.csiro.au} Abstract We consider the problem of learning a low dimensional representation for compo- sitional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learn- ing directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the ex- ponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.
En savoir plus

12 En savoir plus

High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations

High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations

L. Chen, H. Liu, J.-P. A. Kocher, H. Li, and J. Chen. glmgraph: an r package for variable selection and predictive modeling of structured genomic data. Bioinformatics, 31(24):3991–3993, 2015. J. J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barcelo-Vidal. Isometric logratio transformations for compositional data analysis.

29 En savoir plus

Discriminant Analysis for Multiway Data

Discriminant Analysis for Multiway Data

References 1. Rasmus Bro. Multi-way Analysis in the food industry: models, algorithms and applications. PhD thesis, Royal Veterinary and Agricultural University, 1998. 2. D Galanaud, V Perlbarg, R Gupta, RD Stevens, P Sanchez, E Tollar, NM Champfleur, J Dinkel, S Faivre, G Soto-Ares, B Veber, V Cottenceau, F Masson, T Tourdias, E Andr, G Audib- ert, E Schmitt, D Ibarrola, F Dailler, A Vanhaudenhuyse, L Tshibanda, JF Payen, JF Le Bas, A Krainik, N Bruder, N Girard, S Laureys, H Benali, and L Puybasset. Assessment of white matter injury and outcome in severe brain trauma: a prospective multicenter cohort. Anesthesi- ology, 117(6):1300–10, December 2012.
En savoir plus

11 En savoir plus

High-Dimensional Topological Data Analysis

High-Dimensional Topological Data Analysis

ANALYSIS Fr´ ed´ eric Chazal INTRODUCTION Modern data often come as point clouds embedded in high dimensional Euclidean spaces, or possibly more general metric spaces. They are usually not distributed uniformly, but lie around some highly nonlinear geometric structures with nontriv- ial topology. Topological data analysis (TDA) is an emerging field whose goal is to provide mathematical and algorithmic tools to understand the topological and geometric structure of data. This chapter provides a short introduction to this new field through a few selected topics. The focus is deliberately put on the mathe- matical foundations rather than specific applications, with a particular attention to stability results asserting the relevance of the topological information inferred from data.
En savoir plus

22 En savoir plus

Advanced Data Interpolating Variational Analysis. Application to climatological data

Advanced Data Interpolating Variational Analysis. Application to climatological data

Brasseur, P., Beckers, J.-M., Brankart, J.-M., and Schoenauen, R.: Sea- sonal temperature and salinity fields in the Mediterranean Sea: Climato- logical analyses of a historical data set, Deep-Sea Res. I, 43, 159–192, doi: 10.1016/0967-0637(96)00012-X , 1996.

1 En savoir plus

Advanced Data Interpolating Variational Analysis. Application to climatological data.

Advanced Data Interpolating Variational Analysis. Application to climatological data.

doi:10.1016/0967-0637(96)00012-X, 1996. Troupin, C., Sirjacobs, D., Rixen, M., Brasseur, P., Brankart J.-M., Barth, A., Alvera-Azc´arate, A., Capet, A., Ouberdous, M. , Lenartz, F., Toussaint, M.-E., and J.-M. Beckers: Advanced Data Interpolating Variational Analysis. Application to climatological data, submitted to Ocean Model.

1 En savoir plus

Grid Analysis of Radiological Data

Grid Analysis of Radiological Data

INTRODUCTION Harnessing the full power of computer-aided image analysis into clinical research and practice remains an open issue. Given the amount of data produced by X-ray Computed Tomography (CT), Magnetic Resonance Imaging (MRI), or PET-scan, and the difficulty to interpret medical images, algorithms for medical image analysis, processing, and diagnostic assistance have been developed these last 15 years or so. Some of these algorithms have reached a high level of usability and proved to have a real impact in the clinical domain. However, their widespread adoption by clinicians is not realized yet. Two stringent examples, amongst many others, are radiotherapy, which could greatly benefit of exploiting advances in segmentation and registration algorithms, and intra-operative situations as well as intervention planning, which could exploit modern high-performance computing systems for augmented reality (Kikinis, 1998). In the 90’s, G.A Moore has coined the term “crossing the chasm” (Moore, 1991) for the issue of selling high- tech products to mainstream customers. Crossing the long-lasting chasm between, on one hand the advances in computer science and engineering in the field of medical images analysis, and on the other hand clinical research and practice, is a challenge of the same nature.
En savoir plus

32 En savoir plus

Immersive Data Exploration and Analysis

Immersive Data Exploration and Analysis

A menu is displayed on top of the chair’s tablet to choose parameters for a data selection based on event type or user ID, multi-selection is available. Upon confirmation a virtual cube is generated in front of the user at arm’s length. with the selection information written on it. This “selection cube” materializes the selection. Grabbing the cube with the right controller applies a highlight, i.e. a glowing effect on the selected data points. The cube can also be released on the top of the virtual plank to filter out all the data points that are not selected. Multiple selection cubes can be chained on the plank keeping visible only the data points at the intersection of the selection cubes. To undo a filter, the user can simply remove a cube from the virtual plank. A temporal zoom on the time axis is also available: the user can select two markers on the curved horizontal axis using the raycast. These markers are then used to change the visualization according to the new time interval.
En savoir plus

5 En savoir plus

Data on Social Network Analysis on Malaria

Data on Social Network Analysis on Malaria

As for the findings on the field we can say that there have been an increase interest by the scientific community over the years by the increase number of publications and a steady growth of new scientists on the area. In regard to the countries investing on the area, new players are coming forward, especially among the countries with high rate of neglected disease incidence, such as India. As for the network analysis we could see that there is a high collaboration due to its clustering coefficient, which is in accordance with the scientific communities. Within those authors we found some key players in terms of the number of publications and those players might be a point for a further study. This study however has not shown deeper studies over specific authors evolutions using network analysis or morphological evolution using a similar methods. Those studies are part of our current work and might be demonstrated on future analysis.
En savoir plus

6 En savoir plus

Analysis of longitudinal imaging data

Analysis of longitudinal imaging data

Compound Symmetric (CS) intra-visit correlation structure Naive-OLS (N-OLS) model which include subject indicator variables as covariates.. Fast, but only valid with CS intra-visit corre[r]

29 En savoir plus

Analysis of ordinal longitudinal data

Analysis of ordinal longitudinal data

Analysis of ordinal longitudinal data Ψ ijk (s, t) = F ijk (s, t){1 − γ isj − γ itk + F ijk (s, t)} {γ isj − F ijk (s, t)}{γ itk − F ijk (s, t)} Using this definition, F ijk (s, t) = P [Z is ≤ j, Z it ≤ k] can be expressed in terms of Ψ ijk (s, t) , γ isj and γ itk

138 En savoir plus

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Data Engineering for the Analysis of Semiconductor Manufacturing Data

2.4 Other Feature Engineering Problems There are some common feature engineering problems that we have not discussed here, because they are not specific to manufacturing data. We briefly mention three of these problems. (1) Occasionally there are missing feature values in the data. We simply throw out all cases that have any missing values. (2) Bad sensor readings can cause outliers. To handle this problem, we have an upper and lower limit for each sensor reading. Cases are flagged and discarded when any feature value is outside of its limits. (3) Often the features are highly correlated with each other. This confuses the analysis, since highly correlated features can act as “synonyms” in the decision tree. When the decision tree induction algorithm is run on several batches of data from the same process, minor variations in the data can cause radi- cally different trees to be generated. We screen the data for highly correlated fea- tures by generating a table of correlations for all pairs of features.
En savoir plus

11 En savoir plus

Compositional Contract Abstraction for System Design

Compositional Contract Abstraction for System Design

Abstract interpretation (AI) is a formal framework proposed by P. Cousot and R. Cousot [ 10 ], [ 11 ], [ 12 ] which addresses the above problems by allowing the systematic simplification of certain classes of systems, making them amenable to formal verification: from undecidable to decidable, or from high to low complexity. At its core, AI offers formal means to travel back and forth between concrete (detailed and realistic) and abstract (simpler but approximate) representations of a system. Central to this is the Galois connection (α, γ) which relates concrete and abstract domains, such that the concretization function γ is the best possible approximation of an inverse for the abstraction function α. Based on this concept, AI theory offers powerful techniques to abstract classes of sys- tems defined through fixpoint equations by “widening.” A correctness proof of an abstract system representation obtained by applying the AI framework carries over to the concrete system. On the other hand, finding a bug in an abstraction may be a false alarm resulting from oversimplification. In this way AI can be seen as a dual activity to testing. The tool support for AI is well-developed and mature [ 13 ], [ 14 ], [ 27 ] and has been successfully applied in various areas, in particular the static analysis of programs.
En savoir plus

17 En savoir plus

Compositional characterisation of the Themis family

Compositional characterisation of the Themis family

In light of these recent associations, we investigate in this paper the surface mineralogy of a sample of Themis family members using a combined dataset of visible, near-infrared, and mid-infrared spectra. Specifically, we acquired near-infrared spectral data for 15 members of the Themis family (Sect. 2 ), thus expanding the number of surveyed objects in this wavelength range by a factor of 2 ( Ziffer et al. 2011 ; de León et al. 2012 ). We complemented this dataset with existing spectra in the visi- ble ( Bus & Binzel 2002 ; Lazzaro et al. 2004 ) and mid-infrared ( Licandro et al. 2012 ; Hargrove et al. 2015 ) ranges to perform a thorough analysis of the composition of the Themis family. Multi-wavelength spectral analysis has proven to be a powerful tool for constraining the surface composition of asteroids. For many asteroid classes that are relatively featureless in the visible and near-infrared (VNIR) wavelength ranges (e.g., the P- and D- classes), mid-infrared (MIR) spectroscopy provides a nice com- plement. Indeed, most major mineral groups and silicate glasses
En savoir plus

10 En savoir plus

Interactive-Compositional Authoring of Sound Spatialization

Interactive-Compositional Authoring of Sound Spatialization

Automation in DAWs is also sometimes used to store, edit and lay out spatial controllers in compositional sessions. This type of workflow was adopted by the ViMic [Braasch et al., 2008] and Spatium [Penha and Oliveira, 2013] systems, among others, and made possible through dedicated control plug-ins that communicate with the auditory rendering engines. The AscoGraph application [Coffy et al., 2014] also offers visual interfaces for handling time structures that control electronic actions and processes, and has been used by some composers as an authoring tool for sound spatialization. In these editors, one automation track is typically allocated to each unidimensional parameter (e.g. a spatial coordinate); as such, it is difficult to graphically apprehend the spatial composition, which intrinsically requires a multidimensional standpoint.
En savoir plus

21 En savoir plus

Compositional simulation in perception and cognition

Compositional simulation in perception and cognition

Overall, the hypothetical simulation model based on people's predictions explains participants' inferences best, suggesting they approach the inference task by simulat- i[r]

103 En savoir plus

Analysis of data virtualization & enterprise data standardization in business intelligence

Analysis of data virtualization & enterprise data standardization in business intelligence

In order to be used for data integration DV platform must do data cleansing, data transformation and data correlation. DW does each of these stages separately and [r]

59 En savoir plus

The Role of Data Pre-Processing in Intelligent Data Analysis

The Role of Data Pre-Processing in Intelligent Data Analysis

4 Before the data are analyzed, on a given data file: (i) The length of each record is checked and records with too few or too many attribute values than defined in the application are eliminated, (ii) Each value of each attribute vector is also checked for improper type. For example, if the attribute vector is defined as a timestamp and the attribute value is real, the user is informed. (iii) Each value of each attribute vector is then checked for out-of- range data. This is especially important if the process engineers have different goals in each process (regulations and quality control standards). Facilities have been designed for importing or generating range files for a particu- lar application. Typical information in a range file consists of the upper and lower range for the attributes of inter- est. At the end of the error checking, a secondary data file can be generated that would contain clean data. No mechanism has yet been developed to replace the missing, corrupt, or out-of-range data attributes with other val- ues (e.g. average of the remaining values in the attribute vector). (iv) Once a clean data file is generated, the linear dependency between independent attributes can be checked. As part of the correlation checking mechanism, we have developed facilities that allow the user to look at the correlation between any pair of independent variables and, if desired, eliminate certain attributes from the analysis. In the presence of noisy data, a large number of irrel- evant and linearly dependent attributes (that otherwise should be eliminated) can produce complex decision trees, requiring efficient pruning methods or some node quality and tree size control mechanisms.
En savoir plus

6 En savoir plus

Simulation Logic, Applets and Compositional Verification

Simulation Logic, Applets and Compositional Verification

Unité de recherche INRIA Sophia Antipolis 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex France Unité de recherche INRIA Futurs : Parc Club Orsay Université - ZAC des Vi[r]

38 En savoir plus

Show all 10000 documents...