Keywords: information extraction, eventextraction, neural networks, word embeddings Abstract: With the increasing amount of data and the
exploding number data sources, the extraction of in- formation about events, whether from the perspective of acquiring knowledge or from a more directly oper- ational perspective, becomes a more and more ob- vious need. This extraction nevertheless comes up against a recurring difficulty: most of the information is present in documents in a textual form, thus un- structured and difficult to be grasped by the machine. From the point of view of Natural Language Process- ing (NLP), the extraction of events from texts is the most complex form of Information Extraction (IE) tech- niques, which more generally encompasses the ex- traction of named entities and relationships that bind them in the texts. The eventextraction task can be represented as a complex combination of rela- tions linked to a set of empirical observations from texts. Compared to relations involving only two en- tities, there is therefore a new dimension that often requires going beyond the scope of the sentence, which constitutes an additional difficulty. In practice, an event is described by a trigger (the word or phrase that evokes the event) and a set of participants in that event (that is, arguments or roles) whose values are text excerpts.
Biological systems rely on different operating principles: information is not represented in frames, but by means of data- driven pulsed messages exchanged by complex nervous cells; information processing is not performed algorithmically, but supported by specific neural circuitry. For example, Barlow and Levick ( 1965 ) demonstrated that an inhibitory mechanism is at the basis of the computation the biological retina performs to extract the direction of motion of an object in the visual field. Inspired by these studies, we hereby present an architecture that does not rely on capturing and processing frames. We make use of neuromorphic retinas ( Mead and Mahowald, 1988; Culurciello et al., 2003; Culurciello and Andreou, 2006; Lichtsteiner et al., 2008; Delbruck et al., 2010; Posch et al., 2011; Serrano- Gotarredona and Linares-Barranco, 2013 ): they are frame-free devices whose pixels, each independently and asynchronously, can directly communicate with the next processing stage without having to wait for a global synchronization step that collects all their output in a frame. By doing so the precise timing at which a pixel is activated becomes a computational variable which is readily available. Neuromorphic pixels react only to light changes and are blind to a steady state of illumination, in analogy with ganglion cells, their biological counterparts. Since motion induces sparse spatio-temporal activity, retina pre-processing opens the way to lighter methods of analysis ( Bauer et al., 2007; Serrano-Gotarredona et al., 2009; Clady et al., 2014 ) more suited for hardware implementations in smart sensors.
2.2.6 Statistical Conditioning of Data
We have seen in the previous sections that the results of the Bayesian methods stands only for specific hypothesis made on the input and output spaces. Conditions such as zero-mean or uncorrelation (spatial and temporal) although restricting can be extended to the general neural network addressed methodology. The principal transformation or feature extraction applied to the data set is usually the normalization. One of the advantage of the normalization is to be able to interpret the statistics of the signals using probabilistic terminology as it is the case in the Bayesian approach. Therefore, the output and input space is detrended and is normalized so that the variance of the signals is unity. The output space consists of zero mean signals and the absolute values of the outputs are probabilities of taking a certain value. Moreover, since the signals are normalized, the error of prediction will also be a probability of wrong forecast. Its mean and variance will be meaningful characteristics. Analysis of the error mean (bias) and variance will be used to obtain confidence limits on the forecasting and then on the probability of fitness of a given model.
The authors use sensitivity analysis method applied to a neural network, for determining the in"uence of several variables on heat rate. The network is a combination of self-organizing and backpropagation neural networks. There are 24 inputs and 2 outputs. The self-organizing network works as an organizer, and rearranges the original training patterns in clusters. Then, the centroids of these clusters are used as inputs for the multilayer perceptron. This network has 24 input units, 10 hidden units, and 2 outputs. Once a reasonable error rate is reached, the derivatives are computed. It appears, that for that kind of network, these derivatives are functions of the weights in the network, and of the input pattern. Thus, it must be averaged over all input patterns. These values may be ranked in the order of sensitivity. The greater the derivative, the more important the input variable. This implies that a small change in this input variable is likely to aect the output variable. The authors applied the method to the heat rate, and tried to apply the method further to secure information. There is no comparison with classical methods such as Principal Component Analysis.
Furthermore, the computational cost is in theory an order of magnitude better using event-based sampling methods , although this may be not always the case in practice , as further discussed in this paper.
However, using event-based simulation methods is quite tiresome: Models can be simulated if and only if the next spike-time can be explicitly computed in reasonable time. This is the case only for a subset of existing neuron models so that not all models can be used. An event-based simulation kernel is more complicated to use than a clock-based. Existing simulators are essentially clock-based. Some of them integrate event-based as a marginal tool or in mixtures with clock-based methods . According to this collective review, only the fully supported, scientifically validated, pure event- based simulators is MVASpike , the NEURON software proposing a well-defined event-based mechanism , while several other implementations (e.g.: DAMNED , MONSTER ) exists but are not publicly supported.
Concerning feature extraction, usually spatial decomposition is performed to extract the ERP components, including Principal Component Analysis, Independent Component Analysis, etc. These methods define the decompo- sition in terms of statistical proprieties the components should satisfy in a specific time window. However, ERPs describe several temporal components (peaks), thus, spatial decomposition should be performed for each interesting interval occurring in the windows of interest. To this end, some algorithms have been proposed to study where the discriminative information lies into the spatio-temporal plane. They visualize a matrix of separability measures into the spatio-temporal plane of the experimental conditions. The matrix is obtained by computing a separability index for each pair of spatial electrode measurement and time sample. Several measures of separability have been used, for instance the signed-r 2 Blankertz et al. (2010), Fisher score and Stu- dent’s t-statistic Müller et al. (2004), or the area under the ROC curve Green & Swets (1966). Separability matrix should be sought as to automatically de- termine intervals with fairly constant spatial patterns and high separability values. This proves difficult and heuristics are often employed to approximate interval borders. In addition, the three first aforementioned measures rely on the assumption that the class distributions are Gaussian, which is seldom ve- rified.
In contrast, BRIAN ( Goodman and Romain, 2009 ) and NEST ( Gewaltig and Diesmann, 2007 ) are simulators often considered to be playing in the same league as EDLUT. As is the case with EDLUT, Brian claims to be mainly oriented to efficiently simulate medium-scale neural networks (tens of thousands of neurons) while NEST is designed for very large-scale neural networks (up to 1.86 billion neurons connected by 11.1 trillion synapses on the Japanese K supercomputer; Kunkel et al., 2014 ). These simulators mainly implement point neuron models, although some models with few compartments can be simulated. Similarly, they consider neurons to be just means to an end. They use neurons to understand the behavior of the neural network behind. Both are natively implementing time-driven simulation methods in CPU and particularly BRIAN also implements a hybrid CPU-GPU co-processing scheme for time- driven models. Having said that, the conclusions and approaches proposed in the paper regarding time-driven methods would have a direct impact on Brian and a substantial impact on NEST since CPU-GPU co-processing is still missing. The other fundamental pillar of the methodology proposed here, the event- driven scheme, is not included in BRIAN but it does exist in NEST. Whilst the event-driven EDLUT framework (originally an event-driven scheme) was adapted to also perform time-driven neural simulations ( Garrido et al., 2011 ), the time-driven NEST framework (originally a time-driven scheme) was adapted to also perform event-driven neural simulations ( Morrison et al., 2007; Hanuschkin et al., 2010 ). Thus, both simulators can perform combined event- and time-driven simulations. In fact, NEST proposes an event-driven method that presents similarities to our synchronous event-driven method. Both event-driven methods minimize the number of spike predictions by processing all the synchronous input spikes conjointly and thus make only one prediction.
Likelihood methods using either an information- geometric ( Nakahara and Amari, 2002; Amari and Nakahara, 2006; Shimazaki et al., 2012 ) or a point-process ( Ogata, 1981; Chornoboy et al., 1988; Okatan et al., 2005 ) representation provide an alternative parametric model-based approach to analyzing ensemble neural spiking activity. Likelihood methods can relate the ensemble activity to any relevant covariates. When the parametric model accurately describes the data, these analyses have important optimality properties. However, this approach has an important shortcoming that is especially relevant for analysis of coincident spiking activity. At an arbitrarily small time scale these methods do not allow simultaneous spiking ( Ogata, 1981; Karr, 1991; Daley and Vere-Jones, 2003 ). Current analysis of spiking activity from neuronal ensembles circumvent this lim- itation by either assuming neurons are independent conditioned on history, or simply ignoring simultaneous events. Ventura et al. (2005) developed a likelihood procedure to overcome this limitation for analyzing a pair of neurons. In Kass et al. (2011) , Kass et al. extend Ventura’s approach to multiple neu- rons. Solo recently reported a simultaneous-event multivariate point process (SEMPP) model to correct this key limitation in general ( Solo, 2007 ).
In secondary analysis of electronic health records, a crucial task consists in correctly identi- fying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narra- tives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich repre- sentation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by pre- senting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during a1111111111
4.1 Entity Extraction
Our approach relies on Long Short-Term Mem- ory Networks (LSTMs) ( Hochreiter and Schmid- huber , 1997 ). The architecture of our model is presented in Figure 1 . For a given sequence of tokens, represented as vectors, we compute rep- resentations of left and right contexts of the se- quence at every token. These representations are computed using two LSTMs (forward and back- ward LSTM in figure 1 ). Then these representa- tions are concatenated and linearly projected to a n-dimensional vector representing the number of categories. Finally, as Huang et al. ( 2015 ), we add a CRF layer to take into account the previous la- bel during prediction. Following preliminary ex- periments, we built one specific classifier for each entity type (EVENT or TIMEX3).
Mathieu Serrurier February 21, 2020
In this paper, we consider survival analysis with right-censored data which is a common situation in predictive maintenance and health field. We propose a model based on the estimation of two-parameter Weibull distribution conditionally to the features. To achieve this result, we de- scribe a neural network architecture and the associated loss functions that takes into account the right-censored data. We extend the approach to a finite mixture of two-parameter Weibull distributions. We first validate that our model is able to precisely estimate the right parameters of the conditional Weibull distribution on synthetic datasets. In numerical ex- periments on two real-word datasets (METABRIC and SEER), our model outperforms the state-of-the-art methods. We also demonstrate that our approach can consider any survival time horizon.
Split2 tests the generalization capabilities of networks: one subclass from each class is not seen by the network during training. This subclass is then used to test the model. Unimodal classiﬁcation based on visual information has awful results. The most likely reason is that the 4 webcams have a wide viewpoint of the scene. Therefore, the action is only present on a few pixels. Moreover, since the images from the 4 webcams are concatenated and then resized before extraction, there is probably a great loss of information. The network may base its classiﬁcation on the room brightness, the location of the person in the room, the person outﬁt, etc. However, this issue does not occur for the audio modality which has much better results. We notice that the visual modality has a bad inﬂuence in the case of the multimodal MAFnet but has no impact on the multimodal MHA .
More detail of results on CIFAR-10 is reported in Table. III. D-EB E/PD reaches a higher final accuracy and lower final loss no matter λ(0). Even though D-EB E/PD has a higher FASD than AdaBound with λ(0) = 0.01 and λ(0) = 0.05, the FVA(±FASD) range of D-EB E/PD is always higher than the range of AdaBound. Additionally it only takes about 32 to 38 epochs to reach 95% best accuracy in any group. All the indicators are very stable across different groups for D- EB E/PD. One can also note that for all the 4 state-of-the- art algorithms, they all perform very bad with λ(0) = 0.05, they cannot even reach the 95% best accuracy. We also implemented the same experiments with λ(0) = 0.25. Except our algorithm, no other one reaches a reasonable accuracy value, which can be explained by the fact that during the PD phase of E/PD control our learning rate can decrease to a low level while the counterparts can not. Those results are available as appendices.
Acknowledgments Working on this thesis was equivalent to living in a roller coaster during almost 4 years. Exciting times, I must admit, in which I visited amazing places as Svalbard or Hong Kong while enjoying my work, and I also had a life in Paris I’ve never dreamed of. Even so, in an almost unnoticeable way, I began to forget why I chose to be a mathematician, since that willingness to learn slowly transformed into a distaste for what I was doing. At some point, near to what was supposed to be the end, I was only worried I wouldn’t finish this manuscript in time, but then I simply faced reality and said : if Brexit was postponed several times, why can’t I do the same ? Still, sometimes it seemed closer than my long-awaited thesis defense, even Notre-Dame got burned down in between, and I was also afraid that GRRM, a professional pro- crastinator, could publish his last book at any moment. By the end this PhD was a mix between uneasiness, impostor syndrome, total relief and more than a bit of hair turning white, but still, we made it.
The interest in brain-like computation has led to the design of a plethora of innovative neuromorphic systems. Individually, spiking neural networks (SNNs), event-driven simulation and digital hardware neuromorphic systems get a lot of attention. Despite the popularity of event-driven SNNs in soft- ware, very few digital hardware architectures are found. This is because existing hardware solutions forevent management scale badly with the num- ber of events. This paper introduces the structured heap queue, a pipelined digital hardware data structure, and demonstrates its suitability forevent management. The structured heap queue scales gracefully with the number of events, allowing the efficient implementation of large scale digital hardware event-driven SNNs. The scaling is linear for memory, logarithmic for logic resources and constant for processing time. The use of the structured heap queue is demonstrated on field-programmable gate array (FPGA) with an image segmentation experiment and a SNN of 65 536 neurons and 513 184 synapses. Events can be processed at the rate of 1 every 7 clock cycles and a 406×158 pixel image is segmented in 200 ms.
or mouse social defeat) have been shown to be signiﬁcantly associated to overall survival. It is even more surprising that many random signatures can outperform most breast cancer signatures . Several authors have suggested that the selected sets of genes are not unique and are strongly inﬂu- enced by the subset of patients included in the training cohort [9, 10] and by the variable selection procedures [11– 14]. For low-dimensional data, the reference method to study associations with time-to-event endpoints is the Cox propor- tional hazards model. In the context of high-dimensional data (number of covariates > >number of observations), the Cox model may be nonidentiﬁable. Extensions, based on boosting or penalized regression, are proposed in the litera- ture to overcome these hurdles [15–18], as they shrink the regression coeﬃcients towards zero. Alternatively to the Cox extensions, methods based on random forests have been adapted for survival analysis . This nonparametric metho- d—random survival forest (RSF)—combines multiple deci- sion trees built on randomly selected subsets of variables. Since feature selection methods are questioned, it seems important to thoroughly assess and compare existing strate- gies that are signiﬁcant components in prognostic signature development. Many studies were interested in false discovery rates or prognostic performances achieved by multiple vari- able selection methods and compared them on simulated or real datasets [20–23]. However, the impact of the training set on the stability of the results was only assessed by Michiels et al.  on a binary endpoint with a selection based on Pear- son’s correlation and did not evaluate most recent approaches. The main objective of this publication is to compare six typical diﬀerent feature selection methods which are com- monly used for high-dimensional data in the context of sur- vival analysis. For this purpose and as recommended in the literature , a simulation study is performed, with special focus on variable selection and prediction performance according to multiple data conﬁgurations (sample size of the training set, number of genes associated with survival). Feature selection methods are then applied on published data to explore stability and prognostic performances in a real breast cancer dataset.
The article is structured as follows. Section 2 introduces a general class of epidemic models, to which the simulation/estimation techniques subsequently described apply and next review events related to these models, that may correspond to health crisis situations and generally occur very rarely. Simulation- based procedures for estimating the probability of occurrence of these events are described in Section 3, while practical applications of these techniques, based on real data sets in some cases, are considered in Section 4 for illustration purpose. Some concluding remarks are finally collected in Section 5. In this work, it is shown that crude Monte-Carlo method often fail to provide good estimates of rare events. Importance sampling methods are a well-known alternative to estimate the occurrence probabilities of rare events. However, their efficiency relies on the choice of proper instrumental distributions, which is very complicated for most probabilistic models encountered in practice. Particle systems with genealogical selection offer an efficient computationally-based tool for estimating the targeted small probabilities.
2 WORD EMBEDDING IN KEYPHRASE
2.1 Word embedding
Word embedding represents words as vectors. It is based on the "Distributional Hypothesis" where words that are used and occur in the same contexts tend to purport similar meanings. Word em- bedding follows the idea that contextual information constitutes a viable representation of linguistic items. Word embedding methods are generally supervised and use machine learning algorithms to build word representation. They can be categorized in two main types : count-based and Neural Network (NN)-based .
Works like this thesis are possible because of people like Héctor Cancela and Gerardo Rubino, always ready to support and to help, far beyond their formal duties. Because of these two persons, writing this thesis has been, above all, a great pleasure to me. I have also learned from them that everyday work can be lived with pleasure. Pro- fessional honesty, ethics and passion for the work, are also things that I learned from Héctor Cancela and Gerardo Rubino. For all this, my sincere gratitude for them both. I also want to say that this work would not have been possible without the valuable contribution of people like the jury members, who agree to participate, offering their knowledge, ability, and above all, their time. My sincere thanks then to Nicolás Stier, Gustavo Guerberoff, Ernesto Mordecki, Alvaro Martín and Antonio Mautone. A special thanks to the Postgraduate Committee and to María Inés Sánchez Grezzi, secretary of the PEDECIBA, who were commissioned, for a long time, of all the documentation and formalities related to my doctorate.