You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Associate Editor: Pierre Neuvial
Reviewer : (chose to remain anonymous)
Reviewer: Reviewing history
Paper submitted 2022-09-22
Reviews received 2022-01-15
Decision: major revision 2023-01-27
Paper revised 2023-07-10
Reviews received 2023-10-20
Paper accepted 2023-10-22
Summary
The paper focuses on the computation of the empirical Fisher information matrix of models with latent variables. In that case, the EM algorithm can be used to estimate the parameters and the Louis’ formulae for the Fisher matrix. This last formula is based on the second derivative of the log likelihood of complete data, while the definition of the Fisher matrix is the expectation of the score function. The idea of this paper is to propose an estimate based only on the score function.
My point of view on this paper is positive. I think the proposed estimator can be useful in models where the second derivative of the likelihood is difficult to obtain.
Comments
(1) Section 2 is based on independent variables that are not identically distributed. What are the conditions on $Y$ to apply the law of large numbers? Could it be generalized to dependent variables?
We have replaced Proposition 1 which was incomplete by a remark. Indeed in order to keep focus on our main contributions which consist in proposing a FIM estimate and a numerical method to evaluate it, we have added references to conditions and results on the law of large numbers in the case of independent variables that are not identically distributed in the remark on asymptotic results in the iid case. However, the proposed methodology can not be extended to dependent variables directly; such work is beyond the scope of our paper.
(2) In section 3.2.1, the SAEM algorithm first estimates the parameters and then after $K$ iterations, estimates the FIM. While in Section 3.2.3, the quantity $\Delta_k$ is estimated during the first $K$ iterations. Why? Which gain is expected?
We emphasize here the difference between both algorithms proposed in Sections 3.2.1 and 3.2.3. The first one is proposed in models belonging to the curved exponential family. In such models, estimation of the parameters and of expected individual sufficient statistics ($s_i$) can be performed in parallel simultaneously during the algorithm. Note in particular that in those models the quantities ($\Delta_i$) depend only on the estimates ($s_i$). Therefore the FIM estimate can be evaluated as an explicit function of all these quantities. Conversely, the second algorithm is proposed for more general models potentially out of the curved exponential family. In this case, it is not possible to compute in parallel both estimates of parameters and expected sufficient statistics. Therefore we compute direct estimates of the quantities ($\Delta_i$) during the algorithm. The advantage of this second algorithm is that it can be applied to a wide range of models, also those which do not belong to the curved exponential family.
(3) Simulation section 4.1.
How are the matrices estimated? Directly from their definitions between equations (1) and (2) or using the definition (3)?
You are right, we should have included details on how we compute both estimators in the linear mixed effects and Poisson mixture models. The estimators can be computed explicitly in both models by applying formula 3 and Louis' formula (Louis, 1982). Thus the algorithms described in section 3 of the paper are not required to conduct this part of the simulation study. These clarifications have been added in section 4.1.
What happens if the parameters are also estimated and not assumed to be known?
As this part of the numerical study aims to illustrate the asymptotic properties of $I_{n,sco}$ and $I_{n,obs}$ , we have not considered the case of unknown parameters in the previous version of the paper. In the revised paper, we have added additional experiments in the linear mixed-effects model to compare the ability to evaluate the parameter estimates' precision through coverages computed with either $I_{n,sco}$ or $I_{n,obs}$ evaluated in either the true or the estimated parameter values. The results show that the uncertainty related to the parameter es timation leads to a slight deterioration of the coverage rates which diminishes when n increases.
In section 4.1.2, it is said "Note that in this particular model", which model is it?
This remark concerns the linear mixed-effects model. We have added some details to the sentence.
Estimator $I_{n,sco}$ seems more variable and more biased than the observed Fisher Information matrix. Why? What is the consequence on the coverage rates of the parameters?
We agree with this remark. The difference between estimators $I_{n,sco}$ and $I_{n,obs}$ in terms of bias and variance strongly depends on the components of the matrices and the model. As exemplified in Section 2, none of the two estimators is systematically better than the other. As part of this revision, we have compared empirical coverages computed from either $I_{n,sco}$ or $I_{n,obs}$ in the linear mixed-effects model. Although $I_{n,sco}$ and $I_{n,obs}$’s components have different empirical properties, the coverages obtained with $I_{n,sco}$ and $I_{n,obs}$ are similar. This is expected since both estimators are unbiased.
In Table 1, the bias is not always decreasing with n. Why? Is it due to the number of repetitions?
You are right. We repeated the experiment several times by considering the same number of repetitions as in the paper and then by increasing the number of repetitions. However, this phenomenon still occurs in the linear mixed-effects and Poisson mixture models but for varying components of the matrices. As the biases are very small whatever the value of n, we guess it is an artifact related to the orders of magnitude of the already slight biases.
Tables 3 and 4 have 3 lines exactly identical. I guess it is a mistake?
This is not a mistake. In the Poisson mixture model, some components of $I_{n,sco}$ and $I_{n,obs}$ have the same analytical expressions. This is why some lines of Tables 3 and 4 show identical values for $I_{n,sco}$ and $I_{n,obs}$.
(4) Simulation section 4.2.
A short introduction of the section explaining its aim and the difference with the previous section would help the reader.
The objectives are different for the simulations conducted in Sections 4.1 and 4.2. In section 4.1, the properties of the two FIM estimators are studied, while in section 4.2, the properties of the algorithm proposed in section 3 are studied. This was not clear enough in the previous version of the article. A short paragraph has been added at the beginning of each of the two sections to clarify it.
Why are $k = 3000$ iterations needed for the SAEM algorithm? It seems a lot for this non-linear mixed effects model.
We agree with the reviewer that fewer iterations are usually needed to estimate the parameters using the SAEM algorithm in this model. Our proposed algorithm differs from the classical SAEM algorithm in that it implements a stochastic approximation of the derivatives of the individual complete log-likelihoods. For the algorithm to converge well in practice, both the parameter estimates and the estimates of the derivatives of the individual complete log-likelihoods must stabilize over the iterations. Closely examining these different quantities over the iterations shows that the latter stabilizes much more slowly than the parameter estimates. To guarantee the convergence of all quantities, we have therefore chosen to increase the total number of iterations of the algorithm to $3000$.
How many Monte Carlo repetitions are used to estimate the true matrices? Is it enough?
The true matrices are estimated by Monte-Carlo integration based on $10^5$ iterations, including $5000$ burn-in, of a Metropolis-Hastings algorithm. These settings were chosen after several trials so that the estimates converged accurately.
What are the coverage rates obtained with $I_{n,obs}$ in Section 4.2.2? What $i$ the loss?
The coverage rates obtained with In,obs are now provided in Section 4.2.2. They are very similar to those obtained with $I_{n,sco}$ so there is no loss in quantifying the MLE’s uncertainty to use $I_{n,obs}$ rather than $I_{n,sco}$. The real advantages of using $I_{n,sco}$ are that this FIM estimate is structurally positive and that it requires stochastic approximation only on the first-order derivatives of the complete log-likelihood, contrary to $I_{n,sco}$ which requires deriving the complete log-likelihood at the second order and thus implies more complicated formulas since the model does not belong to the exponential family.
(5) Could you provide the code?
The R scripts used for the simulation studies are now fully visible in the paper’s Github repository.
The text was updated successfully, but these errors were encountered:
Associate Editor: Pierre Neuvial
Reviewer : (chose to remain anonymous)
Reviewer: Reviewing history
Summary
The paper focuses on the computation of the empirical Fisher information matrix of models with latent variables. In that case, the EM algorithm can be used to estimate the parameters and the Louis’ formulae for the Fisher matrix. This last formula is based on the second derivative of the log likelihood of complete data, while the definition of the Fisher matrix is the expectation of the score function. The idea of this paper is to propose an estimate based only on the score function.
My point of view on this paper is positive. I think the proposed estimator can be useful in models where the second derivative of the likelihood is difficult to obtain.
Comments
(1) Section 2 is based on independent variables that are not identically distributed. What are the conditions on$Y$ to apply the law of large numbers? Could it be generalized to dependent variables?
(2) In section 3.2.1, the SAEM algorithm first estimates the parameters and then after$K$ iterations, estimates the FIM. While in Section 3.2.3, the quantity $\Delta_k$ is estimated during the first $K$ iterations. Why? Which gain is expected?
(3) Simulation section 4.1.
(4) Simulation section 4.2.
(5) Could you provide the code?
The text was updated successfully, but these errors were encountered: