Results:
Overall, the estimation procedure with the SAEM algorithm in a non-linear mixed effect modelling framework for **count** **data** models, showed satisfactory performance with low bias and high precision. For parameter estimation, the absolute value of relative bias was less than 0.92% and 4.13 % for fixed and random effects and RMSE was less than 12.34 % and 13.13% for fixed and random effects, across all tested models. For standard error estimation, the absolute value of relative bias was less than 1.7 and 1.6 % for fixed and random effects, and RMSE was less than 1 and 1.54% for fixed and random effects. The variances of over-dispersion parameters, shown to be biased when estimated with LAPLACE, were precisely estimated with SAEM, exhibiting relative bias of 1.62%, 1.26% and 2.38% for p0, δ and OVDP . Detailed results are listed below. The distribution of REE and AEE for all models and all parameters is shown in Figure 1a-f, while the numerical results are represented in Table I The summary for imprecision estimate (RMSE) for all parameters and their standard error estimates across all models using SAEM is shown in Table II.

En savoir plus
Prediction of subplastidial localization of chloroplast proteins from spectral **count** **data** - Comparison of machine learning algorithms
Thomas Burger (1) , Samuel Wieczorek (1) , Christophe Masselon (1) , Daniel Salvi (2) , Norbert Rolland (2) , Myriam Ferro (1) .
ED

This paper generalizes Poisson-Multifractal for correlated time series of **count** **data**. We show that the model has useful properties; it captures long-term time dependence and exible dependence between types of **count**. Based on real **data**, the correlated multifractal model is used to model the number of claims of two separate coverages in automobile insurance. Smoothed values of the underlying process can be estimated, and a specic property of the model allows us to split the unobserved process into separate elements. These elements can be considered as climatic, economic or social factors aecting the frequency of claims, which can be associated with exogeneous informations. Even if the model proposed in this paper implies dependence between **count** variables, we think that it can be easily generalized in many directions: to model dependence between claim cost and frequency, or between the claims frequency of dierent insurance products.

En savoir plus
kernel in equation (1) with h = h(n) ∈ [0, 1] an arbitrary sequence of smoothing parameters that fulﬁlls lim
n→∞ h(n) = 0. In practice, when X is a binary variable
or observations of X are **count** **data**, generalized linear models (GLM) studied by McCullagh and Nelder (1989) for these cases can serve as parametric start regres- sion models r( ·; θ). Then, the estimation accuracy of chosen parametric regression model can be improved by the nonparametric correction term which is the second factor in the right side of equality in equation (8). One can refer to Glad (1998) and Fan et al. (2009) as both related references for continuous version of estimator in equation (8).

En savoir plus
Multivariate **count** **data** are defined as the number of items of different categories issued from sampling within a population, which individuals are grouped into categories. The analysis of multivariate **count** **data** is a recurrent and crucial issue in numerous modelling problems, particularly in the fields of biology and ecology (where the **data** can represent, for example, children counts associated with multitype branching processes), sociology and econometrics. Denoting by K the number of categories, multivariate **count** **data** analysis relies on modelling the joint distribution of the K-dimensional random vector N = (N 0 , . . . , N K−1 ) with discrete components. We focus

En savoir plus
Abstract:
In this research, we focus on two additional dimensions of the issue of spatial autocorrelation in spillovers measures. First, we use patent **data** as dependent variable. These are **count** **data**, characterised by non-negative integer numbers and an important number of zeros. The model usually used in this case, is based on Poisson distribution. The methods used to take spatial auto-correlation into account are not anymore appropriate. We then propose in this article to use a new original method to estimate spatial dimension of spillovers with **count** **data**. The method is based on generalised cross entropy approach. Second, contrary to previous studies which use spatial aggregated **data**, we concentrate on individual **data**. The idea is that the local dimension can be smaller than this of a region, a French department or even an American MSA. This allows us to take a smaller geographical dimension and to test if rms benet from spillovers of their very close neighbours or of their further neighbours.

En savoir plus
• The nonparametric nature of the estimated inter-event distribution that enables arbitrary shape, the only constraint being the size of the support.
Smoother estimated inter-event distributions are obtained for larger sample sizes and for smaller intrinsic dispersions of the **count** **data** (and consequently smaller dispersions of the es- timated inter-event distributions). The first point is illustrated by comparing the results of the large sample simulation experiment (Figures 1 and 2) with the results of the small sample simu- lation experiment (Figures 3 to 6). The second point is illustrated by comparing the estimation from real **data** (Figures 11 and 12) with the results of the small sample simulation experiment. Hence, a sufficiently large sample size is required to apply this kind of estimation method (which is reasonable if one considers the degree of incompleteness of the **data**). The proposed methods are mainly useful in relatively high censoring situations since, in most applications, stationarity can only be assumed over relatively short observation periods. Moreover, interpretations may often be deduced by comparing the inter-event distributions estimated over consecutive observa- tion periods. Hence, it is interesting to design a follow-up experiment with a sufficient number of observation dates. This also enables to assess the sample homogeneity by comparing the results obtained over consecutive observation periods and grouped observation periods as illustrated with the real **data**.

En savoir plus
these classes provide tools for relative and absolute dating and analysis of (chronological) patterns.
tabula includes functions for matrix seriation (seriate_*), as well as chronological model- ing and dating (date_*) of archaeological assemblages and objects. Resulting models can be checked for stability and refined with resampling methods (refine_*). Estimated dates can then be displayed as a tempo or activity plot (Dye, 2016) to assess rhythms over long periods. Beyond these, tabula provides several tests (test_*) and measures of diversity within and between archaeological assemblages (index_*): heterogeneity and evenness (Brillouin, Shan- non, Simpson, etc.), richness and rarefaction (Chao1, Chao2, ACE, ICE, etc.), turnover and similarity (Brainerd-Robinson, etc.). Finally, the package makes it easy to visualize **count** **data** and statistical thresholds (plot_*): rank vs. abundance plots, heatmaps, and Ford (1962) and Bertin (1977) diagrams.

En savoir plus
5. CONCLUSION
In this paper, we propose a multivalued extension of MSVST associated with an appropriate 2D-1D wavelet transform, which proved very efficient to denoise Poisson **count** **data**. The proposed algorithm is performs as well as the 2D-MSVST applied to summed frames. But unlike 2D denoising, 2D-1D extension fully exploits the information in the whole **data** set and recovers the information along the z-axis which is of paramount importance in many science fields such as hyperspectral imaging.

CLAUDE MANTÉ, SAÏKOU OUMAR KIDÉ, A.F. YAO, BASTIEN MÉRIGOT Abstract. A frequent issue in the study of species abundance consists in modeling empirical distributions of repeated counts by parametric probability distributions. In this setting, it is desirable that the chosen family of distri- butions is exible enough to take into account very diverse patterns, and that its parameters possess clear biological/ecological meanings. This is the case of the Negative Binomial distribution, chosen in this work for modeling counts of marine shes and invertebrates. This distribution depends on a vector (K, P) of parameters, and ranges from the Poisson distribution (when K → +∞) to Fisher's log-series, when K → 0. Besides, these parameters have biologi- cal/ecological interpretations detailed in the literature and reminded hereafter. We focus on the comparison of three estimators of K, P and the parameter α of Fisher's log-series, revisiting a nice paper of Rao (1971) about a three- parameter unstandardized variant of the Negative Binomial distribution. We investigate the coherency of values of the parameters resulting from these dif- ferent estimators, with both real **count** **data** collected in the Mauritanian Ex- clusive Economic Zone during the period 1987-2010 and realistics simulations of theses **data**.

En savoir plus
In **count** **data** models, where a non-linearity is produced by the non-negative discrete nature of the **data**, the standard generalized method of moments (GMM) for the estimation of xed eects models is not directly applicable. The usual panel **data** estimator for **count** models with correlated xed eects is the Pois- son conditional maximum likelihood estimator proposed by Hausman, Hall et Griliches (1984). This estimator is the same as the Poisson maximum likelihood estimator in a model with specic constants. But this estimator is inconsistent if the regressors are predetermined and so not strictly exogenous. To solve this problem, Chamberlain (1992) and Wooldridge (1997) have developed a quasi- dierenced GMM estimator. Blundell, Grith and Windmeijer (2002) have ex- tended this estimator to dynamic linear models. Following Blundell, Grith and Windmeijer (2002), we will estimate the equation (6) with this quasi-dierenced GMM estimator (see appendix for technical details).

En savoir plus
variability for North American fires compares well even in relative magnitude, whereas the CO emissions show less variability than the burned area estimates in Russia. It is possible that this di fference can be attributed to the poorer **data** quality for the Russian burned area estimates as pointed out by Wotawa et al. ( 2001 ). Another factor could be the higher fraction of ground fires in Russia, which are not detected by the satellite 10

We illustrate our work using polio **data**, a classic **data** set for time series of counts, rst used by Zeger (1988), and later by Chan & Ledolter (1995), Kuk & Cheng (1997), Oh & Lin (2001), Jung & Liesenfeld (2001), and Farrell et al. (2007). We show that the t of this new model is interesting and can be advantageously compared to the Poisson-AR(1) model. For example, unlike the Poisson-AR(1) model, the multifractal **count** model of this paper can be estimated directly, without requiring simulations. A formal comparison of our approach with the Poisson-AR(1), illustrates major dierences between models, and shows that the multifractal **count** distribution captures an unobserved time dependence structure, not present in the other model.

En savoir plus
zeros. Pro
eedings of ESANN 2012, 133-138
[4℄ Ridgway J. (2011). Hidden Markov models for time series of
ount **data**. Rapport de stage
[5℄ Hamilton J.D. (1989). A new approa
h to the e
onomi
analysis of nonstationary time series
and the business
y
le. E
onometri
a, 57, 357-384.

Other models such as models based on copulas, where the marginals are …xed and the depen- dence structure is based on a copula (see e.g. Joe (1997) and Frees and Wang (2006)), could have been considered. A review on time series models for **count** **data** can be found in the sur- vey of McKenzie (2003), the monographs of Cameron and Trivedi (1998) and Kedem and Fokianos (2002). All examples considered for N in this paper satisfy the constraints on the process fW k ; k 2 N + g given in Müller and P‡ug (2001).

1. Introduction
In many applications, it is now frequent to have to sum- marize large matrices with a large amount of missing **data** that may evolve along the time. For instance, such **data** are commonly produced by e-commerce systems which record in continuous time all purchases of products made by cus- tomers. It is of great interest for those companies to cluster both customers and products to better understand the pur- chasing behaviors for marketing and purchase prediction. The simultaneous clustering of rows and columns of matri- ces is known as a co-clustering problem. We propose in this paper to add a third dimension of analysis to co-clustering by handling the dynamic of the **count** **data** generation.

En savoir plus
Institut Pasteur 75724 Paris France 14050 Caen France 91191 Gif-sur-Yvette France
ABSTRACT
We propose in this paper a Multi-Scale Variance Stabilizing Transform (MSVST) for approximately Gaussianizing and sta- bilizing the variance of a sequence of independent Poisson random variables (RVs) filtered by a low-pass linear filter. This approach is shown to be fast, very well adapted to ex- tremely low-**count** situations and easily applicable to any di- mensional **data**. It is shown that the RV transformed using Anscombe VST can be reasonably considered as stabilized for an intensity λ & 10, using Fisz VST for λ & 1 and us- ing our VST (after low-pass filtering) for λ & 0.1. We then use the MSVST technique to stabilize the detail coefficients of the Isotropic Undecimated Wavelet Transform (IUWT) of multi-dimensional Poisson **count** **data**. We use a hypothe- sis testing framework in the wavelet domain to denoise the Gaussianized and stabilized coefficients, and then apply the inverse MSVST-IUWT to get the estimated intensity image underlying the Poisson **data**. Finally, potential applicability of our approach is illustrated on an astronomical example where isotropic structures must be recovered.

En savoir plus
Second, it is surprising to see that the optimal configuration in 90nm with an SRAM L2 contains no L2 cache, but has 16KB L1I and 32KB L1D caches. This is largely due to the fact that decent sized caches can be built at the L1 level in 90nm and still fit in a single cycle. The model shows that it is not a good tradeoff to increase cache size at the expense of core **count** at 90nm in SRAM. Unfortunately, many times L2 caches are used to decouple a main processor pipeline from the rest of the cache system, but at larger feature sizes may not be the optimal design. At all process nodes, the area advantage gained from using DRAM as an L2 cache makes it worth building an L2 cache.

En savoir plus
fraction of 1%, which leads to an underestimation of the real uranium content up to a factor of 2.6 for 50% of uranium.
The simulation tool has also been used to check the validity of a semiempirical formula used to correct the reference cali- bration coeff cient measured in Bessines in a tubeless borehole fille with air, for in situ gamma-ray attenuation with different fl ids f lling the borehole and the tube, and with different tubes housing the NGRS probe. The discrepancy between the correction calculated with the semiempirical formula and the one calculated with MCNP can reach up to 26%. In order to improve the accuracy of this correction, a multiparametric analysis has been performed with a large series of simulated **data**, evidencing linear correlations between the correction and different parameters including information on the borehole diameter, density of filli g f uids, tube material, and thick- ness. This alternative approach leads to an estimation of the calibration coeff cient correction with a precision better than 3%. This approach has also been tested in the case of a Geiger Muller probe, showing that the same formulas can be used for both NGRS and Geiger probes in view to correct for gamma attenuation.

En savoir plus
Iñaki Soto-Rey 1* , Benjamin Trinczek 1 , Yannick Girardeau 2,3 , Eric Zapletal 2 , Nadir Ammour 4 , Justin Doods 1 , Martin Dugas 1 and Fleur Fritz 1
Abstract
Background: With the increase of clinical trial costs during the last decades, the design of feasibility studies has become an essential process to reduce avoidable and costly protocol amendments. This design includes timelines, targeted sites and budget, together with a list of eligibility criteria that potential participants need to match. The present work was designed to assess the value of obtaining potential study participant counts using an automated patient **count** cohort system for large multi-country and multi-site trials: the Electronic Health Records for Clinical Research (EHR4CR) system.

En savoir plus