Haut PDF The Future of Data Analysis in the Neurosciences

The Future of Data Analysis in the Neurosciences

The Future of Data Analysis in the Neurosciences

Towards adaptive models? Today, parametric models are still the obvious choice in neuroscience. For instance, many big-sample studies (i.e., data from hundreds of animals or humans) currently apply the same parametric models as previous small-sample studies (i.e., a few dozen animals or humans). However, concentrating on sample size, parametric analyses such as Student's t-test, ANOVA, and Pearson's linear correlation on brain data from many hundred individuals may not yield an improved quality of statistical insight on a neurobiological phenomenon that could not already be achieved with a dozen participants (Box 1). An important caveat relies in their systematic inability to grow in complexity no matter how much data is collected and analyzed [19]. In any classification, a linear parametric classifier will always make predictions based on a linear decision boundary between classes, whereas a non-parametric classifier can learn a non-linear boundary whose shape grows more complex with more data. Analogously, parametric independent component analysis (ICA) and principal component analysis (PCA) require presetting the component number, yet the form and number of clusters in brain data is not known to neuroscientists. Such finite mixture models may give way to infinite mixture models that reframe the cluster number as a function of data availability and hence gradually yield more clusters without bound. Similarly, classical hidden Markov models may get upgraded to infinite hidden Markov models and we might see more applications of non-parametric decision trees and nearest neighbors in future neuroscientific studies. Finally, it is currently debated whether increasingly used "deep" neural network algorithms with many non-linear hidden layers are more accurately viewed as parametric or non-parametric.
En savoir plus

25 En savoir plus

The future of urban models in the Big Data and AI era: a bibliometric analysis (2000-2019)

The future of urban models in the Big Data and AI era: a bibliometric analysis (2000-2019)

suggests, in line with the assumptions of this work, that the physically inspired “cellular automaton” method is in the process of being replaced by “deep learning” methods to monitor and predict traffic flow. The word “model” is also typical of pre-2012 Traffic Flow studies, which suggest that, at least for traffic issues, we might assist to “the end of theory” announced by Anderson. However, if we look to other post-2012 words, we observe the acronym “MFD” together with an interest for new types of vehicles (autonomous vehicles, smart vehicles and bicycles). It suggest that, in addition to a novel consideration for smart mobility and bicycles, Traffic Flow studies are still focussing on the fundamental diagram. This diagram gives a relation between traffic flux and traffic density. Traffic operators use it to monitor urban congestion. In line with the example developed in 3.1, it seems that, so far, Traffic Flow research is integrating AI and Big Data techniques without abandoning classical approaches. To confirm this observation, we look at the evolution of the number of Traffic Flow publications in two important transportation journals: Transportation Research Part B and Transportation Research Part C. The difference between the two journals is topical: Part B focusses on physical models, whereas Part C focusses on new technologies. As shown in Figure 6, the number of traffic flow publications published in Part C increased in recent years, but the issue remained important in Part B. According to us, monitoring the evolving distribution of Traffic Flow publications between these two journals is a good way to track the research dynamics of the research area. Taking into account the evolving citation behaviour of the authors publishing in these two journals could also be an interesting way of following the current dynamics (analysing the evolving scope of their cited references).
En savoir plus

23 En savoir plus

The future of metabolomics in ELIXIR

The future of metabolomics in ELIXIR

Training in metabolite identification requires materials and case studies related to both data acquisition and bioinformatic analysis of the acquired data and a multi-disciplinary training team of ana- lytical chemists and bioinformaticians to deliver courses. The pro- vision of courses currently focuses on hands-on training at training centres, as described above at the BMTC, and which typically can train 6–12 scientists per course. However, in the growing discipline of metabolomics, there is a requirement to provide training to larger numbers that is only achievable through online training resources. The matching of trainee learning objectives to the type of course provided is key and recent examples of online courses have dem- onstrated their power in delivering Massive Open Online Courses (MOOCs) or more specialised Small Private Online Courses (SPOCs). At BMTC, the introductory course on metabolomics MOOC has been used by greater than 3000 active learners and the first SPOC focussed on data processing and analysis in metabo- lomics was completed by more than 50 people. However, courses for greater levels of hands-on training in the laboratory are focused on training in the laboratory; through the use of video media we can envisage some courses operating via online resources.
En savoir plus

17 En savoir plus

Analysis of present day and future OH and methane lifetime in the ACCMIP simulations

Analysis of present day and future OH and methane lifetime in the ACCMIP simulations

scenario, the key driver of the evolution of OH and methane lifetime is methane itself, since its concentration more than doubles by 2100 and it consumes much of the OH that ex- ists in the troposphere. Stratospheric ozone recovery, which drives tropospheric OH decreases through photolysis mod- ifications, also plays a partial role. In the other scenarios, where methane changes are less drastic, the interplay be- tween various competing drivers leads to smaller and more diverse OH and methane lifetime responses, which are diffi- cult to attribute. For all scenarios, regional OH changes are even more variable, with the most robust feature being the large decreases over the remote oceans in RCP8.5. Through a regression analysis, we suggest that differences in emis- sions of non-methane volatile organic compounds and in the simulation of photolysis rates may be the main factors caus- ing the differences in simulated present day OH and methane lifetime. Diversity in predicted changes between present day and future OH was found to be associated more strongly with differences in modelled temperature and stratospheric ozone changes. Finally, through perturbation experiments we cal- culated an OH feedback factor (F ) of 1.24 from present day conditions (1.50 from 2100 RCP8.5 conditions) and a cli- mate feedback on methane lifetime of 0.33 ± 0.13 yr K − 1 , on average. Models that did not include interactive strato- spheric ozone effects on photolysis showed a stronger sen- sitivity to climate, as they did not account for negative ef- fects of climate-driven stratospheric ozone recovery on tro- pospheric OH, which would have partly offset the overall OH/methane lifetime response to climate change.
En savoir plus

26 En savoir plus

Characterization of PET/CT images using texture analysis: the past, the present… any future?

Characterization of PET/CT images using texture analysis: the past, the present… any future?

82. Tixier F, Vriens D, Le Rest CC, Hatt M, Disselhorst JA, Oyen WJG, et al. Comparison of tumor uptake heterogeneity characterization between static and parametric 18F-FDG PET images in Non-Small Cell Lung Cancer. J Nucl Med. 2016;in press. 83. Nyflot MJ, Yang F, Byrd D, Bowen SR, Sandison GA, Kinahan PE. Quantitative radiomics: impact of stochastic effects on textural feature analysis implies the need for standards. J. Med. Imaging Bellingham Wash. 2015;2:041002.

43 En savoir plus

Traceability in chemical measurements: the role of data analysis

Traceability in chemical measurements: the role of data analysis

3 ) = 50.5 ± 0.2 mg/kg (fig. 4). 250 The coefficient of determination r 2 = 0.9996 which most would take as a convincing level 251 of linearity yet the difference in the result between linear and quadratic models is 8%. 252 Both of these fitting models are empirical and both fit the data rather well. Shall one 253 simply ignore the results from the quadratic model? While the Aikake Information 254 Criterion (aic) or Fisher’s F-test of the residual variances can evaluate which model fits 255 data better, they do not tell us which is the correct model. Once the measurement model 256 is adopted, the fitting method has to be chosen. For linear regression, chemists often 257 use ordinary and weighted least squares with a variety of weighting alternatives such as 258 w = x −2 , x −1 , 1, y −1 , and y −2 [26,27]. With replicate measurements often lacking, expert 259
En savoir plus

18 En savoir plus

The Future of Key Actors in the European Research Area

The Future of Key Actors in the European Research Area

48 Imagining the potential of seeds to turn into saplings, if not the forest of 2020, is one of the main advantages of scenario methods for strategic (goal seeking) purposes. Along these lines an analysis of Diagram 4 provides a range of insights. The first has to do with the Diagram as a frame for the scenarios: is this frame too narrow or too open, given the specific subject and aims of this HLEG? The second question has to do with the way the specific points, the scenarios within the frame, were chosen – why these particular points? Answers to both of these questions depend, in part, on the methods used to construct both the frame and the stories painted within it. Were the methods rigorous, meaning a consistent theory and model that tests the theory? Turning to the case at hand and looking at the results presented in Diagram 4 we can see that the RA papers and the associated scenarios map into fairly narrow bands. This draws attention to ways in which the scenarios were constructed. For instance, even the most transformative scenario, the C=7 and (F+H)/2=8 of the national governments scenario, treats government as external to research and hence treats government policy as an ‘instrument’ that influences research as an ‘object’ that is outside the politico-administrative apparatus proper. Certainly there is considerable discussion in a number of the RA papers about how research within government needs to improve in order to become more effective at addressing the significantly changed research landscape outside of government. But this way of drawing the line between actor and acted, instrument and object, makes it difficult to address the whole set of possibilities that are described by a world of research in which the public sector is no longer an administrative apparatus but a series of experiments that have merged into a new system of ambient governance. In other words, the construction of the model that underlies these scenarios cannot give rise to a scenario that would help to detect weak signals from an emerging system where governance and research merge. A scenario where government in its ‘modern’ form, as ‘external’ decision maker using instruments (policies) to influence an object (research), fades away.
En savoir plus

61 En savoir plus

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Abstract We have analyzed manufacturing data from several different semiconductor manu- facturing plants, using decision tree induction software called Q-YIELD. The soft- ware generates rules for predicting when a given product should be rejected. The rules are intended to help the process engineers improve the yield of the product, by helping them to discover the causes of rejection. Experience with Q-YIELD has taught us the importance of data engineering — preprocessing the data to enable or facilitate decision tree induction. This paper discusses some of the data engi- neering problems we have encountered with semiconductor manufacturing data. The paper deals with two broad classes of problems: engineering the features in a feature vector representation and engineering the definition of the target concept (the classes). Manufacturing process data present special problems for feature engineering, since the data have multiple levels of granularity (detail, resolution). Engineering the target concept is important, due to our focus on understanding the past, as opposed to the more common focus in machine learning on predicting the future.
En savoir plus

11 En savoir plus

Immigration and the Future of the Welfare State in Europe

Immigration and the Future of the Welfare State in Europe

Table 4 extends the analysis of labor market e¤ects by replacing our measure of native worker skill, former education, by more speci…c measures of human capital. This strategy allows us to abstract even more clearly from potential endogenous preferences correlated with formal education. Panel 1 shows results for a measure of ISCO four-digit level occupation-speci…c human capital, derived from a question in the second and …fth round of the European Social Survey. This question asks people to specify how long it would take a third person to learn the skills required to perform the same job. 34 Higher values indicate higher levels of speci…c experience and human capital. Panels 2 and 3 show results for measures of task skill intensities as popularized by Peri and Sparber (2009), among others. We follow these authors to draw upon the O*Net data- base that provides data on the importance of a full set detailed communicative, cognitive and manual abilities for more than 400 highly detailed occupations in the United States. 35 We obtain ISCO four-digit level indicators of the im- portance of communication and manual skills following the de…nition outlined in Peri and Sparber (2009). We now use these indicators to approximate labor market relevant skills that are - to some extent - less correlated with potential other preference-forming e¤ects of higher formal education (e.g. lower levels of ethnocentricity). 36 Throughout our estimations, we obtain very similar results. Our estimates for the interaction of skill with immigrants’ skill ratio remain with the expected sign and signi…cance at the 5% level. These estimates are robust to the inclusion of occupation-level …xed e¤ects but lose some (but not always all) signi…cance once we further restrict our identi…cation to country- occupation cells. This is not surprising, since in that latter case we now identify 3 2 In models of attitudes to immigration, higher e¤ects for individuals with more than 12 years of education are often interpreted as "preference e¤ects" rather than pure labor market e¤ects (Ortega and Polavieja, 2012).
En savoir plus

39 En savoir plus

MetExplore: Omics data analysis in the context of metabolic networks

MetExplore: Omics data analysis in the context of metabolic networks

[MetExploreViz] Maxime Chazalviel, Clément Frainay, Nathalie Poupin, Florence Vinson, Benjamin Merlet, Yoann Gloaguen, Ludovic Cottret and Fabien Jourdan. MetExploreViz: web component for interactive metabolic network visualization. (2017) Bioinformatics. [MetaboRank] Frainay, C., Aros, S., Chazalviel, M., Garcia, T., Vinson, F., Weiss, N., … Jourdan, F. (2018). MetaboRank: network-based recommendation system to interpret and enrich metabolomics results. Bioinformatics (Oxford, England), 35(2), 274–283. doi:10.1093/bioinformatics/bty577 [MetExplore V2] Ludovic Cottret, Clément Frainay, Maxime Chazalviel, Floréal Cabanettes, Yoann Gloaguen, Etienne Camenen, Benjamin Merlet, Jean-Charles Portais, Stéphanie Heux, Nathalie Poupin, Florence Vinson and Fabien Jourdan. MetExplore: Manage and Explore metabolic networks. (2018) Nucelic Acids Research. [DOI] F. van Ham and A. Perer, "“Search, Show Context, Expand on Demand”: Supporting Large Graph Exploration with Degree-of-Interest," in IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 953-960, Nov.-Dec. 2009. doi: 10.1109/TVCG.2009.108
En savoir plus

2 En savoir plus

The EADGENE microarray data analysis workshop

The EADGENE microarray data analysis workshop

The results from the workshop may at first glance seem contradictory: the real data results were quite di fferent between groups (both in numbers of dif- ferentially expressed genes and gene order) [9] while the results from the sim- ulated data indicate that most of the approaches gave good and comparable results [15]. It could be argued that methods that give similar results for sim- ulated data should give similar results for real data, if the two sets of data are comparable. The di fference between the results from real and simulated data is most likely due to di fferences in the expected statistical power to detect dif- ferentially expressed genes between the two sets of data: while the real data consisted of 48 microarrays, any comparison between two time points within an infection had only four microarrays contributing to each time point, while the contrast was indirect via a reference design. The simulated data consisted of 10 microarrays for a direct A versus B comparison making it more powerful than the comparisons within the real data. It could be argued retrospectively that for comparison of methods the real data had too little power and too many possible scenarios to be tested, while the simulated data had too much power to reveal subtle di fferences between methods. This emphasises the benefits of this kind of workshop, since this finding was only apparent after combining and contrasting the approaches and results of the di fferent groups, and the ob- servation will be fed forward into future workshops.
En savoir plus

12 En savoir plus

A Generalisation of the Mixture Decomposition Problem in the Symbolic Data Analysis Framework

A Generalisation of the Mixture Decomposition Problem in the Symbolic Data Analysis Framework

Key-words: Mixture decomposition, Symbolic Data Analysis, Data Mining, Clustering, Partitioning 1. Introduction In a symbolic data table, a cell can contain, a distribution (Schweitzer (1984) says that "distributions are the number of the future"!), or intervals, or several values linked by a taxonomy and logical rules, etc.. The need to extend standard data analysis methods (exploratory, clustering, factorial analysis, discrimination,...) to symbolic data table is increasing in order to get more accurate information and summarise extensive data sets by the description of the underlying concepts contained in Data Bases (as towns or socio-economic groups) considered as new kinds of units.
En savoir plus

14 En savoir plus

Big data analysis in the field of transportation

Big data analysis in the field of transportation

Analyse de données volumineuses dans le domaine du transport ideas from the NMF literature. Many options might be possible, depending on the form of f ϑ (·). The most commonly used algorithm is the so-called multiplicative update, an alternating optimization method with respect to Φ and Λ, that was proposed in the seminal papers [Lee and Seung, 1999, 2001]. Other algorithms include ADMM [Boyd et al., 2011; Sun and Fevotte, 2014], alternating projected gradient [Lin, 2007], and for Bayesian approaches, Monte-Carlo methods [Paisley et al., 2014] and variational approximations [Alquier and Guedj, 2017]. A numerical comparison of many algorithms can be found in [Lin, 2007]. In practice, the multiplicative update is efficient in many settings and is very simple to use: it does not depend on any tuning parameter such as the step size in gradient based method. So this is the method we will use from now. This method iterates a step in Φ, and a step in Λ. Each step is shown to improve the fit criterion in [Lee and Seung, 2001]. Note that the author claims that it also leads to convergence, but as argued in [Gonzalez and Zhang, 2005] the proof of this fact is actually incomplete. We explicit the multiplicative update in the case of mixture of multinomials below.
En savoir plus

147 En savoir plus

Inference in the age of big data: Future perspectives on neuroscience

Inference in the age of big data: Future perspectives on neuroscience

One may think that differences between the two ways of establishing neurobiological conclusions from brain measurements are mostly of technical relevance. Yet there is an often-overlooked misconception that statistical models with high explanatory power necessarily also exhibit high predictive power (Friedman, 2001; Lo et al., 2015; Shmueli, 2010; Wu et al., 2009). Put differently, a neurobiological effect assessed to be statistically significant by a p-value may sometimes not yield successful predictability based on cross- validation, and vice versa (cf. Fig. 4; Kriegeskorte et al., 2006). We also find it interesting to note that out-of-sample generalization with cross-validation puts the unavoidable theoretical modeling assumptions to an empirical test by directly assessing the model performance in unseen data (Kriegeskorte, 2015). In classical inference, the desired relevance of a statistical relationship in the general population remains grounded in formal mathematical proofs, without explicit evaluation on unseen data. Moreover, their many theoretical differences are also more practically manifested in the high-dimensional setting where classical inference needs to address the multiple comparisons problem (i.e., accounting for many statistical inferences performed in parallel), whereas pattern generalization involves tackling the curse of dimensionality (i.e., difficulties of inferring relevant statistical structure in observations with thousands of variables) (Domingos, 2012; Friston et al., 2008; Huys et al., 2016). We therefore caution that care needs to be taken when combining both inferential regimes in practical data analysis (Bzdok, 2016; Yarkoni and Westfall, 2016).
En savoir plus

54 En savoir plus

The future of academic libraries

The future of academic libraries

This article discusses various aspects of the history and survival of libraries, thanks to their adaptability, di- versity and variability as well as their future viewed from the past due to: the rapid growth of content, devel- opment of new services increasingly oriented to the user, impressive technological changes, creation of new job profiles, virtualization of their user communities, content production and globalization. It also shows the existence of several futures based on a swot analysis and changes in key functions such as acquisitions, gateways to information and its preservation, some positive and negative points that depend largely on the academic institutions they serve, management of new content (in particular the impact of the Internet, the preservation of heritage material and data management), its new involvement in the information value chain (such as hosting and management of repositories, knowledge management, evaluation of research and text and data mining) among others. It concludes with several important questions that need to be addressed to reach some conclusions of their possible future.
En savoir plus

16 En savoir plus

Exploiting the Potential of the Future “Maritime Big Data”

Exploiting the Potential of the Future “Maritime Big Data”

Index Terms— Maritime surveillance, early detection, heterogeneous correlation, CISE, weak signals analysis 1. CURRENT STATE OF PLAY While maritime surveillance has been radically transformed by the introduction of the Automatic Identification System (AIS) in 2002 through the IMO SOLAS Agreement – suddenly populating the screen of the vessel traffic management systems (VTMS) well beyond the range of the coastal radars, the operational maritime surveillance capabilities have not much evolved since. Voluntary reporting systems (mainly AIS, VMS for fishery vessels and LRIT in distant sea lanes) remain the essential source of vessel monitoring, leaving in the shade the smaller boats… and the deliberately cheating ones. Furthermore, non- cooperative ship detection provided by maritime radars (on board ships and on the coast) is now automatically fused with AIS, no more supported by additional VHF voice contact and binoculars to confirm the vessel identity and its planned route.
En savoir plus

5 En savoir plus

Contributions to the statistical analysis of microarray data.

Contributions to the statistical analysis of microarray data.

We also hope that a better distinction among ipsilateral breast cancers of tumors that are genetically related to their primary tumors, that is, true recurrences, will help reveal genetic differ- ences that would provide new information on radioresistance and tumor aggressiveness. To date, little is known about the differen- tial or similarity of the pangenomic expression or the nature of both new primary tumors and ipsilateral breast cancers. Kreike et al. ( 54 ) performed a gene expression analysis of 18 000 cDNAs in nine pairs of primary breast cancer with their ipsilateral breast recurrences among women who were younger than 51 years at the time of their initial breast-conserving therapy. Paired data analysis showed no set of genes that had consistently different levels of expression in primary tumors and local recurrences. Another route that has still scarcely been explored is the search for a biologic sig- nature to predict the risk of local recurrence, especially after breast-conserving treatment ( 54 – 56 ). A better distinction between new primary tumors and true recurrences is needed to perform a supervised study based on the occurrence of true recurrences only and not of all ipsilateral breast cancers.
En savoir plus

224 En savoir plus

The effects of encoding data in diversity studies and the applicability of the weighting index approach for data analysis from different molecular markers.

The effects of encoding data in diversity studies and the applicability of the weighting index approach for data analysis from different molecular markers.

In plant breeding programs, the use of the comparative approach in germplasm banks is more common in studies of genetic resources (Laurentin 2009 ). Generally, in this type of study, genetic diversity is evaluated using dissimilarity coefficients to establish the genetic distance matrices. Thus, the use of robust coefficients is a key for the determination of the true genetic variability. The choice of the most appro- priate coefficients depends on the type of markers, ploidy of the organism and the objective of each study (Kosman and Leonard 2005 ). To separate the types of markers, two classes are formed in accordance with the discriminatory ability. The first is formed by dominant markers, which are not able to distinguish the heterozygous genotypes. Included in this class are the following: random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), inter-simple sequence repeat (ISSR) and diversity arrays technology (DArTs). The other class is composed of
En savoir plus

13 En savoir plus

Analysis of Sponsored Data in the Case of Competing Wireless Service Providers

Analysis of Sponsored Data in the Case of Competing Wireless Service Providers

γ 1 = γ 2 Opt. (γ 1 , γ 2 ) Fig. 4 ISP revenues (left) and Consumer Surplus (right) with CP-revenue-maximizing γ values. Figure 4 displays the impact of sponsoring on the revenues of ISPs and on con- sumer surplus. In terms of ISP revenue first, we can see that ISP 1, the one with the highest reputation, makes more money than ISP 2. For very low values of s the three policies give the same revenues. ISP 2 always prefers sponsoring to no sponsoring, while for ISP 1 an identical sponsoring is always the best option, something not so obvious at first sight. As a second choice for ISP 1, depending on the advertisement level, preference varies between no sponsoring and a differentiated one. Sponsor- ing is also always preferred by users, since giving a higher consumer surplus. On the other hand, the consumer surplus decreases as the advertisement level increases: the increased sponsoring level does not compensate enough the negative external- ity of ads. Interestingly, there is no significant difference for users between the two sponsoring options, and no dominance either.
En savoir plus

17 En savoir plus

Analysis of acoustic communication channel characterization data in the surf zone

Analysis of acoustic communication channel characterization data in the surf zone

To test whether a geometric ray model is sufficient for modeling acoustic propagation under surf- zone bubble clouds, an Oases model was constructed of the Scripps Pie[r]

65 En savoir plus

Show all 10000 documents...