Misplaced Pages

History index model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Model in functional data analysis
This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (August 2022)
This article needs attention from an expert in Mathematics. The specific problem is: confirm accuracy and perhaps a lede for average reader. WikiProject Mathematics may be able to help recruit an expert. (August 2022)

In statistical analysis, the standard framework of varying coefficient models (also known as concurrent regression models), where the current value of a response process is modeled in dependence on the current value of a predictor process, is disadvantageous when it is assumed that past and present values of the predictor process influence current response. In contrast to these approaches, the history index model includes the effect of recent past values of the predictor through the history index function. Specifically, the influence of past predictor values is modeled by a smooth history index functions, while the effects on the response are described by smooth varying coefficient functions.

Definition

In Functional data analysis, functional data are considered as realizations of a Stochastic process X ( t ) , t I {\displaystyle X(t),t\in {\mathcal {I}}} that is an L 2 {\displaystyle L^{2}} process on a bounded and closed interval I {\displaystyle {\mathcal {I}}} .

Let the current functional response process Y ( t ) {\displaystyle Y(t)} at time t {\displaystyle t} depends on the recent history of the predictor process X {\displaystyle X} in a sliding window of length Δ {\displaystyle \Delta } .

Then the history index model is defined as

E { Y ( t ) | X ( t ) } = β 0 + β 1 ( t ) 0 Δ γ ( u ) X ( t u ) d u , {\displaystyle \mathrm {E} \{Y(t)|X(t)\}=\beta _{0}+\beta _{1}(t)\int _{0}^{\Delta }\gamma (u)X(t-u)du,} (1)

for t [ Δ , T ] {\displaystyle t\in } with a suitable T > 0 {\displaystyle T>0} . Then, a ''history index function'' is γ ( ) {\displaystyle \gamma (\cdot )} defining the history index factor at β 1 ( ) {\displaystyle \beta _{1}(\cdot )} by quantifying the influence of the recent history of the predictor values on the response. In most cases, γ ( ) {\displaystyle \gamma (\cdot )} is assumed to be smooth. For identifiability, γ ( ) {\displaystyle \gamma (\cdot )} is normalized by requiring that 0 Δ γ 2 ( u ) d u = 1 {\displaystyle \int _{0}^{\Delta }\gamma ^{2}(u)du=1} and that γ ( 0 ) > 0 {\displaystyle \gamma (0)>0} , which is no real restriction as { β 1 ( t ) } { γ ( u ) } = β 1 ( t ) γ ( u ) {\displaystyle \{-\beta _{1}(t)\}\{-\gamma (u)\}=\beta _{1}(t)\gamma (u)} .

Estimation of the history index model

Estimation of the history index function

At each fixed time point t {\displaystyle t} , the model in (1) reduces to a functional linear model between the scalar response Y ( t ) {\displaystyle Y(t)} and the functional predictor X ( t ) , t Δ s t . {\displaystyle X(t),t-\Delta \leq s\leq t.} Also, X C ( s ) = X ( s ) E { X ( s ) } {\displaystyle X^{C}(s)=X(s)-\mathrm {E} \{X(s)\}} is a centered functional covariate and Y C ( s ) = Y ( s ) E { Y ( s ) } {\displaystyle Y^{C}(s)=Y(s)-\mathrm {E} \{Y(s)\}} is a centered response process. Writing the model as

E { Y C ( t ) | X C ( t ) } = β 1 ( t ) 0 Δ γ ( s ) X C ( t s ) d s = 0 Δ α t ( s ) X C ( t s ) d s , {\displaystyle \mathrm {E} \{Y^{C}(t)|X^{C}(t)\}=\beta _{1}(t)\int _{0}^{\Delta }\gamma (s)X^{C}(t-s)ds=\int _{0}^{\Delta }\alpha _{t}(s)X^{C}(t-s)ds,} (2)

with regression parameter functions α t ( s ) = β 1 ( t ) γ ( s ) , {\displaystyle \alpha _{t}(s)=\beta _{1}(t)\gamma (s),} the functions α t ( s ) {\displaystyle \alpha _{t}(s)} contain the factor γ ( s ) {\displaystyle \gamma (s)} for each t {\displaystyle t} . To satisfy the constraint 0 Δ γ 2 ( u ) d u = 1 {\displaystyle \int _{0}^{\Delta }\gamma ^{2}(u)du=1} and stabilize resulting estimators, over an equidistant grid of time points ( t 1 , , t R ) {\displaystyle (t_{1},\ldots ,t_{R})} in [ Δ , T ] , {\displaystyle ,} we can define

γ ( s ) = Σ r = 1 R α t r ( s ) [ 0 Δ { Σ r = 1 R α t r ( s ) } 2 d s ] 1 / 2 {\displaystyle \gamma (s)={\frac {\Sigma _{r=1}^{R}\alpha _{t_{r}}(s)}{^{1/2}}}} . (3)

When the history index function is recovered, model (1) reduces to a varying coefficient model.

Estimation of the varying coefficient function

Once the estimate of γ ( s ) {\displaystyle \gamma (s)} has been obtained, the remaining unknown component in model (2) is the varying coefficient function β 1 {\displaystyle \beta _{1}} . Define X ~ ( t ) = 0 Δ γ ( s ) X C ( t s ) d s . {\displaystyle {\tilde {X}}(t)=\int _{0}^{\Delta }\gamma (s)X^{C}(t-s)ds.} From (2),

c o v { X ( t ) , Y ( t ) } = c o v [ E { X C ( t ) | X } , E { Y C ( t ) | X } ] + E [ c o v ( X C ( t ) , Y C ( t ) | X ) ] = β 1 ( t ) 0 Δ γ ( s ) c o v { X ( t s ) , X ( t ) } d s {\displaystyle \mathrm {cov} \{X(t),Y(t)\}=\mathrm {cov} +\mathrm {E} =\beta _{1}(t)\int _{0}^{\Delta }\gamma (s)\mathrm {cov} \{X(t-s),X(t)\}ds} ,

c o v { X ( t ) , X ~ ( t ) } = 0 Δ γ ( s ) c o v { X ( t s ) , X ( t ) } d s , {\displaystyle \mathrm {cov} \{X(t),{\tilde {X}}(t)\}=\int _{0}^{\Delta }\gamma (s)\mathrm {cov} \{X(t-s),X(t)\}ds,}

and therefore β 1 ( t ) = c o v { X ( t ) , Y ( t ) } / 0 Δ γ ( s ) c o v { X ( t s ) , X ( t ) } d s . {\displaystyle \beta _{1}(t)=\mathrm {cov} \{X(t),Y(t)\}/\int _{0}^{\Delta }\gamma (s)\mathrm {cov} \{X(t-s),X(t)\}ds.}

Application of the history index model

The applications of the varying coefficient model, which considers both the past and present information at the same time, have received an increasing attention in recent years. For example, Sentürk et al. proposes a time varying lagged regression model to assess the association between predictors, such as cognitive and functional impairment scores, with the frequency of clinic visits of older adults. Also, Zemplenyi et al. suggests a function-on-function regression model that leverages data from nearby DNA methylation probes to identify epigenetic regions that exhibit windows of susceptibility to ambient particulate matter less 2.5 microns (PM2.5). In this trend, the history index model have also been used in various situations.

Delay differential equation

The modeling of time dynamical systems is of interest in multiple scientific fields. A delay differential equation (DDE) is a natural extension of a variety of differential equations, such as ordinary differential equation, random differential equation and stochastic differential equation, when observed processes have an aftereffect.

For dynamic learning of random differential equations with a delay (RDED), Dubey et al. utilize functional linear regression with history index to learn the distributed delay, where the regression parameter function then corresponds to a history index function for the process of interest.

Let ( X ( ) , U ( ) ) {\displaystyle (X(\cdot ),\mathbf {U} (\cdot ))} denote multivariate stochastic process where X ( ) {\displaystyle X(\cdot )} is a continuously differentiable process of interest. U ( ) = ( U 1 ( ) , , U J ( ) ) T {\displaystyle \mathbf {U} (\cdot )=(U_{1}(\cdot ),\ldots ,U_{J}(\cdot ))^{T}} is a vector function of additional covariates and [ t 0 , T ] {\displaystyle } is a time window of interest. The model is defined as

d X ( t ) d t = α ( t ) + 0 τ 0 γ ( s , t ) X ( t s ) d s + 0 τ 1 γ 1 ( s , t ) U ( t s ) d s + Z ( t ) , t [ t 0 , T ] , {\displaystyle {\frac {dX(t)}{dt}}=\alpha (t)+\int _{0}^{\tau _{0}}\gamma (s,t)X(t-s)ds+\int _{0}^{\tau _{1}}\gamma _{1}(s,t)U(t-s)ds+Z(t),t\in ,}

X ( t ) = g ( t ) , t [ t 0 τ 0 , t 0 ] , {\displaystyle X(t)=g(t),t\in ,}

where g {\displaystyle g} is an initial condition process, τ 0 {\displaystyle \tau _{0}} , τ 1 {\displaystyle \tau _{1}} are delays, α ( t ) {\displaystyle \alpha (t)} is a smooth function, γ ( s , t ) , γ 1 ( s , t ) {\displaystyle \gamma (s,t),\gamma _{1}(s,t)} are history index functions, and Z ( ) {\displaystyle Z(\cdot )} is a random drift process that is independent of ( X ( ) , U ( ) ) {\displaystyle (X(\cdot ),\mathbf {U} (\cdot ))} . For the purpose of illustration and technical derivations, we assume that U ( ) {\displaystyle U(\cdot )} is a univariate process: the corresponding multivariate generalization is straightforward. By using the RDED described above, it is utilized to predict the growth rate of COVID-19 cases in the United States.

References

  1. Cardot, Hervé; Ferraty, Frédéric; Sarda, Pascal (1999), "Functional Linear Model", Statistics & Probability Letters, vol. 45, pp. 11–22, CiteSeerX 10.1.1.56.6256
  2. Morris, Jeffrey S. (2015-04-10). "Functional Regression". Annual Review of Statistics and Its Application. 2 (1): 321–359. arXiv:1406.4068. Bibcode:2015AnRSA...2..321M. doi:10.1146/annurev-statistics-010814-020413. ISSN 2326-8298. S2CID 18637009.
  3. Yao, Fang; Müller, Hans-Georg; Wang, Jane-Ling (2005-12-01). "Functional linear regression analysis for longitudinal data". The Annals of Statistics. 33 (6). arXiv:math/0603132. doi:10.1214/009053605000000660. ISSN 0090-5364. S2CID 1202441.
  4. Malfait, Nicole; Ramsay, James O. (2003). "The Historical Functional Linear Model". The Canadian Journal of Statistics. 31 (2): 115–128. doi:10.2307/3316063. ISSN 0319-5724. JSTOR 3316063. S2CID 55092204.
  5. ^ Şentürk, Damla; Müller, Hans-Georg (2010). "Functional Varying Coefficient Models for Longitudinal Data". Journal of the American Statistical Association. 105 (491): 1256–1264. doi:10.1198/jasa.2010.tm09228. ISSN 0162-1459. S2CID 14296231.
  6. Ramsay, J. O.; Silverman, B. W. (2005). "Functional Data Analysis". Springer Series in Statistics. doi:10.1007/b98888. ISBN 978-0-387-40080-8. ISSN 0172-7397.
  7. Müller, Hans-Georg (2016). "Peter Hall, Functional Data Analysis and Random Objects". The Annals of Statistics. 44 (5): 1867–1887. doi:10.1214/16-AOS1492. ISSN 0090-5364. JSTOR 43974701.
  8. Sentürk, Damla; Ghosh, Samiran; Nguyen, Danh V. (2014-05-01). "Exploratory time varying lagged regression: modeling association of cognitive and functional trajectories with expected clinic visits in older adults". Computational Statistics & Data Analysis. 73: 1–15. doi:10.1016/j.csda.2013.11.001. ISSN 0167-9473. PMC 3890149. PMID 24436504.
  9. Zemplenyi, M.; Meyer, M.; Cardenas, A.; Hivert, M.; Rifas-Shiman, S.; Gibson, Heike; Kloog, I.; Schwartz, J.; Oken, E.; DeMeo, D.; Gold, D. (2021). "Function-on-function regression for the identification of epigenetic regions exhibiting windows of susceptibility to environmental exposures". The Annals of Applied Statistics. 15 (3): 1366–1385. arXiv:1912.07359. doi:10.1214/20-aoas1425. PMC 9615608. PMID 36313278. S2CID 209376792.
  10. Imkeller, Peter; Schmalfuss, Björn (2001-04-01). "The Conjugacy of Stochastic and Random Differential Equations and the Existence of Global Attractors". Journal of Dynamics and Differential Equations. 13 (2): 215–249. doi:10.1023/A:1016673307045. ISSN 1572-9222. S2CID 3120200.
  11. ^ Dubey, Paromita; Chen, Yaqing; Gajardo, Álvaro; Bhattacharjee, Satarupa; Carroll, Cody; Zhou, Yidong; Chen, Han; Müller, Hans-Georg (2021). "Learning delay dynamics for multivariate stochastic processes, with application to the prediction of the growth rate of COVID-19 cases in the United States". Journal of Mathematical Analysis and Applications. 514 (2): 125677. arXiv:2109.07059. doi:10.1016/j.jmaa.2021.125677. PMC 8494512. PMID 34642503.
Categories: