Getting Started with spaero

The spaero package (pronounced sparrow) currently supports the estimation of distributional properties along rolling windows of time series. Such estimates may in some cases provide signals that the system generating the data is approaching a critical transition. Examples of critical transitions include the eutrophication of lakes, changes in climate, and the emergence or eradication of infectious diseases. The spearo package will be developed to further support statistical methods to anticipate critical transitions in infectious disease systems. Because these methods will be based on generic properties of dynamical systems, they have the potential to apply to a broad range of models. spearo also provides functions to support computational experiments designed to evaluate these methods for applications relevant to infectious disease systems. This document provides a rudimentary demonstration of the application of such methods to simulated data.

Our simulated data is a time series produced by a stochastic SIR simulator included in the spaero package. See Keeling and Rohani (2008) for an introduction to the SIR model. The simulator is capable of including time dependent parameters. Gillespie’s direct method is used to update the model variables during the simulation. Transitions between states occurs according to the rules given in Table~, which makes use of the symbols defined in Table~. Because some of the transition rates may change continuously with time and because the simulation algorithm updates the rates only at points of time when the model’s state variables are updated, these simulations are not in general exact. However, for many realistic scenarios birth and death updates occur frequently enough that the simulation should be highly accurate. Also, in newer versions of pomp (versions >= 1.13.4), the ``hmax’’ parameter of pomp’s method can be used to set the maximum simulation time that can elapse before an update of the reaction rates.

Before demonstrating the statistical analysis functions of spearo, we provide an overview of the simulation functions. These functions essentially provide a convenient interface to the general simulation capabilities of the pomp package. The user calls the function to create an object of class . This object contains the model structure as well as default parameters for the simulation. Simulations of the model may then be run using the method in the pomp package. (To reproduce the output of the code chunks in this document, then, the pomp package must be installed.)

library(spaero)

sim <- create_simulator()
simout <- pomp::simulate(sim)

This code creates a new pomp object and runs a simulation with the default parameters. The variable simout contains a second pomp object that contains the simulation results. These results may be extracted like so:

as(simout, "data.frame")

##    time reports      S I  R      N cases gamma_t mu_t d_t eta_t beta_par_t p_t
## 1     0       0 100000 0  0 100000     0       0    0   0     0          0   0
## 2     1       0 100025 0  1 100026     1       0    0   0     0          0   0
## 3     2       0  99933 0  2  99935     1       0    0   0     0          0   0
## 4     3       0  99958 0  3  99961     1       0    0   0     0          0   0
## 5     4       0 100049 0  3 100052     0       0    0   0     0          0   0
## 6     5       0 100135 0  5 100140     2       0    0   0     0          0   0
## 7     6       0 100148 0  7 100155     2       0    0   0     0          0   0
## 8     7       1 100130 0 11 100141     4       0    0   0     0          0   0
## 9     8       0 100081 0 11 100092     0       0    0   0     0          0   0
## 10    9       0 100152 0 11 100163     0       0    0   0     0          0   0

Note that the covariates (i.e., the time-dependent components of the parameters) are included in addition to the state variables and observables. Also, β_t in Table~ is named with the string “beta_par_t” and not “beta_t”. Similarly, the parameter β is named “beta_par”. This name was chosen because “beta” cannot be used in the C code defining the simulator due to conflicts with the name of the beta function.

In addition to simulating the dynamics of disease spread, the pomp object also simulates imperfect observation of the dynamics. A cases variable is included in the output and it counts the total number of recoveries that occurred in the preceding interval between observations. A corresponding number of reports is simulated by sampling from a binomial probability mass function with a number of trials equal to the number of cases and a reporting probability equal to a user-supplied parameter, ρ.

Observation times and parameters can be set at simulation run time:

pars <- sim@params
pars["rho"] <- 0.5
as(pomp::simulate(sim, params=pars, times=seq(1, 4)), "data.frame")

##   time reports      S I R      N cases gamma_t mu_t d_t eta_t beta_par_t p_t
## 1    1       0 100027 0 0 100027     0       0    0   0     0          0   0
## 2    2       1 100011 0 1 100012     1       0    0   0     0          0   0
## 3    3       1 100054 0 2 100056     1       0    0   0     0          0   0
## 4    4       0 100084 0 2 100086     0       0    0   0     0          0   0

However, the covariate table that determines the time dependence of rates cannot be set at simulation time.

One can also set the number of replicates.

if (utils::packageVersion("pomp") < "2.0.0") {
  pomp::simulate(sim, nsim=2, times=seq(1, 2), as.data.frame = TRUE)
} else {
  pomp::simulate(sim, nsim=2, times=seq(1, 2), format = "data.frame")
}

##   time .id      S I R      N cases reports
## 1    1   1 100145 0 1 100146     1       0
## 2    1   2 100060 0 1 100061     1       0
## 3    2   1 100141 0 1 100142     0       0
## 4    2   2 100001 0 1 100002     1       0

The random number seed allows simulations to be reproduced.

as(pomp::simulate(sim, seed=342), "data.frame")

##    time reports      S I  R      N cases gamma_t mu_t d_t eta_t beta_par_t p_t
## 1     0       0 100000 0  0 100000     0       0    0   0     0          0   0
## 2     1       1 100013 0  2 100015     2       0    0   0     0          0   0
## 3     2       0 100008 0  7 100015     5       0    0   0     0          0   0
## 4     3       0  99947 0  7  99954     0       0    0   0     0          0   0
## 5     4       0  99966 0  8  99974     1       0    0   0     0          0   0
## 6     5       0  99936 0  9  99945     1       0    0   0     0          0   0
## 7     6       0  99895 0  8  99903     0       0    0   0     0          0   0
## 8     7       0  99929 0  8  99937     0       0    0   0     0          0   0
## 9     8       1 100021 0 11 100032     3       0    0   0     0          0   0
## 10    9       0 100014 0 10 100024     0       0    0   0     0          0   0

as(pomp::simulate(sim, seed=342), "data.frame")

##    time reports      S I  R      N cases gamma_t mu_t d_t eta_t beta_par_t p_t
## 1     0       0 100000 0  0 100000     0       0    0   0     0          0   0
## 2     1       1 100013 0  2 100015     2       0    0   0     0          0   0
## 3     2       0 100008 0  7 100015     5       0    0   0     0          0   0
## 4     3       0  99947 0  7  99954     0       0    0   0     0          0   0
## 5     4       0  99966 0  8  99974     1       0    0   0     0          0   0
## 6     5       0  99936 0  9  99945     1       0    0   0     0          0   0
## 7     6       0  99895 0  8  99903     0       0    0   0     0          0   0
## 8     7       0  99929 0  8  99937     0       0    0   0     0          0   0
## 9     8       1 100021 0 11 100032     3       0    0   0     0          0   0
## 10    9       0 100014 0 10 100024     0       0    0   0     0          0   0

An SIS model is also available. This model is identical to the SIR model except that the recovery event in Table~ results in an infective individual becoming a susceptible.

sim_sis <- create_simulator(process_model="SIS")
as(pomp::simulate(sim_sis), "data.frame")

##    time reports      S I R      N cases gamma_t mu_t d_t eta_t beta_par_t p_t
## 1     0       0 100000 0 0 100000     0       0    0   0     0          0   0
## 2     1       0  99925 0 0  99925     0       0    0   0     0          0   0
## 3     2       0 100001 0 0 100001     1       0    0   0     0          0   0
## 4     3       1 100021 0 0 100021     2       0    0   0     0          0   0
## 5     4       0 100002 0 0 100002     3       0    0   0     0          0   0
## 6     5       0  99987 0 0  99987     3       0    0   0     0          0   0
## 7     6       0  99922 0 0  99922     2       0    0   0     0          0   0
## 8     7       0  99957 0 0  99957     2       0    0   0     0          0   0
## 9     8       1  99897 0 0  99897     1       0    0   0     0          0   0
## 10    9       0  99816 0 0  99816     2       0    0   0     0          0   0

Now let’s simulate data according to the SIR model where the transmission rate starts out well below the threshold value and gradually increases. Note that parameters corresponding to the initial conditions (i.e., S₀, I₀, and R₀) are normalized to sum to N₀. Thus we can specify that the simulation begins with a population of 100,000 susceptibles as follows.

params <- c(gamma=24, mu=0.014, d=0.014, eta=1e-4, beta_par=0,
            rho=0.9, S_0=1, I_0=0, R_0=0, N_0=1e5, p=0)
covar <- data.frame(gamma_t=c(0, 0), mu_t=c(0, 0), d_t=c(0, 0), eta_t=c(0, 0),
                    beta_par_t=c(0, 24e-5), p_t = c(0, 0), time=c(0, 300))
times <- seq(0, 200, by=1/12)

sim <- create_simulator(params=params, times=times, covar=covar)
so <- as(pomp::simulate(sim, seed=272), "data.frame")
plot(ts(so[, "reports"], freq=12), ylab="No. reports")

By eye, we can see the distribution of reports seems to change over time. We can summarize these changes by computing statistics over moving windows.

st1 <- get_stats(so[, "reports"], center_kernel="uniform",
                 center_trend="local_constant", center_bandwidth=360,
                 stat_bandwidth=360)
plot_st <- function(st) {
  plot_vars <- ts(cbind(Residual=st$centered$x[, 1], Mean=st$stats$mean,
                  Autocorrelation=st$stats$autocor, Variance=st$stats$variance,
                  DeltaVariance=st$stats$variance_first_diff,
                  Skewness=st$stats$skew), freq=12)
  plot(plot_vars, main="")
}
plot_st(st1)

The increasing trends in the statistics are potential warning signals that the system is approaching the epidemic threshold. Readers interested in this type of analysis can find guidelines in Dakos et al. (2012) and may also want to consider performing it with the function in the earlywarnings package described in that paper. We’ll next review the input parameters and implementation of .

Two key parameters that the user must provide to are the shape and size of the rolling window. There is a rolling window for an estimate of the mean and for an estimate of moments within the window. Arguments controlling these windows are prefixed with “center_” and “stat_” respectively. An estimate of the mean is necessary because the calculations of the statistics involve deviations from the mean. supports estimation of the mean via several methods and users may also estimate the mean using other methods, subtract it from the input time series, and then set the “center_trend” argument to “assume_zero”. Regarding the shapes of windows, a rectangular window function and a Gaussian-shaped function are available by providing either “uniform” or “gaussian” to the kernel arguments. The rectangular function may be preferred for ease of interpretation while the Gaussian function may be preferred for obtaining a smoother series of estimates. The width of the window is controlled by the bandwidth arguments. For a window centered on a particular index, the absolute difference between that index and all other indices in the time series is divided by the bandwidth to determine a distance to all other observations. This distance is then plugged into a kernel function corresponding to the window type. For the gaussian window, the kernel function is a Gaussian probability density function with a standard deviation of one. For the rectangular window, the kernel function equals one if the distance is less than one and zero otherwise. The “backward_only” argument determines whether the rectangular window is backward-looking by controlling whether the kernel function is forced to zero for any indices greater than the window’s index. In other words, “backward_only = TRUE” makes the rectangular window right-aligned instead of centered. The output of the kernel function is a weight for each observation. These weights are used in the estimators described next. Note that these bandwidth conventions are different from those of . Note also that only when “backward_only = TRUE” does the bandwidth correspond directly to the size of a moving window.

By default, computes statistics via weighted sample moments. To clarify, the estimate of the moment for the moving window centered on index i of the time series x is where w_ij is a kernel weight, f_j(x) is the value of the moment at index j, and N_i = ∑_jw_ij is a normalization constant. Table~ provides the formulas for the statistics computed in terms of these moment estimates. In some cases, users may obtain less biased estimates by setting the “stat_trend” argument to “local_linear”. This replaces the weighted average estimate with a prediction from a local linear regression. This method can reduce bias near the ends of the time series if a trend exists such that f_j(x) for j near i tend to be above or below the expected value of f_i(x) across repeated realizations of a time series. If the prediction is less than zero for variance or kurtosis, it is replaced with zero.

Let’s look at the effect of changing some of these parameters on the computed statistics. First we try a Gaussian window.

st2 <- get_stats(so[, "reports"], center_kernel="gaussian",
                 center_trend="local_constant", center_bandwidth=360,
                 stat_bandwidth=360, stat_kernel="gaussian")
plot_st(st2)

Next, we’ll increase the bandwidths.

st3 <- get_stats(so[, "reports"], center_kernel="gaussian",
                 center_trend="local_constant", center_bandwidth=720,
                 stat_bandwidth=720, stat_kernel="gaussian")
plot_st(st3)

That concludes our initial overview of the package. The current version of spaero is just a starting point and the package will continue to be actively developed for the foreseeable future.

Funding

The development of this package was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number U01GM110744.

Disclaimer

The content is solely the responsibility of the authors and does not necessarily reflect the official views of the National Institutes of Health.

References

Dakos, Vasilis, Stephen R. Carpenter, William A. Brock, Aaron M. Ellison, Vishwesha Guttal, Anthony R. Ives, Sonia Kéfi, et al. 2012. “Methods for Detecting Early Warnings of Critical Transitions in Time Series Illustrated Using Simulated Ecological Data.” PLoS ONE 7 (7): e41010. https://doi.org/10.1371/journal.pone.0041010.

Keeling, Matt J., and Pejman Rohani. 2008. Modeling Infectious Diseases in Humans and Animals. Princeton, New Jersey: Princeton UP.