Package 'fastcpd'

Title: Fast Change Point Detection via Sequential Gradient Descent
Description: Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
Authors: Xingchi Li [aut, cre, cph] , Xianyang Zhang [aut, cph]
Maintainer: Xingchi Li <[email protected]>
License: GPL (>= 3)
Version: 0.14.5
Built: 2024-08-28 05:22:01 UTC
Source: https://github.com/doccstat/fastcpd

Help Index


Bitcoin Market Price (USD)

Description

The average USD market price across major bitcoin exchanges.

Usage

bitcoin

Format

A data frame with 1354 rows and 2 variables:

date

POSIXct,POSIXt (TZ: "UTC") from 2019-01-02 to 2023-10-28

price

The average USD market price across major bitcoin exchanges

Source

<https://www.blockchain.com/explorer/charts/market-price>

Examples

if (requireNamespace("ggplot2", quietly = TRUE)) {
  p <- ggplot2::ggplot(bitcoin, ggplot2::aes(x = date, y = price)) +
    ggplot2::geom_line()
  print(p)

  result <- suppressWarnings(fastcpd.garch(
    diff(log(bitcoin$price[600:900])), c(1, 1),
    beta = "BIC", cost_adjustment = "BIC"
  ))
  summary(result)
  bitcoin$date[result@cp_set + 600]
  plot(result)

  cp_dates <- bitcoin[600 + result@cp_set + 1, "date"]
  ggplot2::ggplot(
    data = data.frame(
      x = bitcoin$date[600:900], y = bitcoin$price[600:900]
    ),
    ggplot2::aes(x = x, y = y)
  ) +
    ggplot2::geom_line(color = "steelblue") +
    ggplot2::geom_vline(
      xintercept = cp_dates,
      color = "red",
      linetype = "dotted",
      linewidth = 0.5,
      alpha = 0.7
    ) +
    ggplot2::labs(
      x = "Year",
      y = "Bitcoin price in USD"
    ) +
    ggplot2::annotate(
      "text",
      x = cp_dates,
      y = 2000,
      label = as.character(cp_dates),
      color = "steelblue"
    ) +
    ggplot2::theme_bw()
}

Find change points efficiently

Description

fastcpd() takes in formulas, data, families and extra parameters and returns a fastcpd object.

Usage

fastcpd(
  formula = y ~ . - 1,
  data,
  beta = "MBIC",
  cost_adjustment = "MBIC",
  family = NULL,
  cost = NULL,
  cost_gradient = NULL,
  cost_hessian = NULL,
  line_search = c(1),
  lower = rep(-Inf, p),
  upper = rep(Inf, p),
  pruning_coef = 0,
  segment_count = 10,
  trim = 0.02,
  momentum_coef = 0,
  multiple_epochs = function(x) 0,
  epsilon = 1e-10,
  order = c(0, 0, 0),
  p = ncol(data) - 1,
  cp_only = FALSE,
  vanilla_percentage = 0,
  warm_start = FALSE,
  ...
)

Arguments

formula

A formula object specifying the model to be fitted. The (optional) response variable should be on the LHS of the formula, while the covariates should be on the RHS. The naming of variables used in the formula should be consistent with the column names in the data frame provided in data. The intercept term should be removed from the formula. The response variable is not needed for mean/variance change models and time series models. By default, an intercept column will be added to the data, similar to the lm() function. Thus, it is suggested that users should remove the intercept term by appending - 1 to the formula. Note that the fastcpd.family functions do not require a formula input.

data

A data frame of dimension T×dT \times d containing the data to be segmented (where each row denotes a data point ztRdz_t \in \mathbb{R}^d for t=1,,Tt = 1, \ldots, T) is required in the main function, while a matrix or a vector input is also accepted in the fastcpd.family functions.

beta

Penalty criterion for the number of change points. This parameter takes a string value of "BIC", "MBIC", "MDL" or a numeric value. If a numeric value is provided, the value will be used as the penalty. By default, the mBIC criterion is used, where β=(p+2)log(T)/2\beta = (p + 2) \log(T) / 2. This parameter usage should be paired with cost_adjustment described below. Discussions about the penalty criterion can be found in the references.

cost_adjustment

Cost adjustment criterion. It can be "BIC", "MBIC", "MDL" or NULL. By default, the cost adjustment criterion is set to be "MBIC". The "MBIC" and "MDL" criteria modify the cost function by adding a negative adjustment term to the cost function. "BIC" or NULL does not modify the cost function. Details can in found in the references.

family

Family class of the change point model. It can be "mean" for mean change, "variance" for variance change, "meanvariance" for mean and/or variance change, "lm" for linear regression, "binomial" for logistic regression, "poisson" for Poisson regression, "lasso" for penalized linear regression, "ar" for AR(pp) models, "arma" for ARMA(pp, qq) models, "arima" for ARIMA(pp, dd, qq) models, "garch" for GARCH(pp, qq) models, "var" for VAR(pp) models and "custom" for user-specified custom models. Omitting this parameter is the same as specifying the parameter to be "custom" or NULL, in which case, users must specify the custom cost function.

cost

Cost function to be used. cost, cost_gradient, and cost_hessian should not be specified at the same time with family as built-in families have cost functions implemented in C++ to provide better performance. If not specified, the default is the negative log-likelihood for the corresponding family. Custom cost functions can be provided in the following two formats:

  • cost = function(data) {...}

  • cost = function(data, theta) {...}

Users can specify a loss function using the second format that will be used to calculate the cost value. In both formats, the input data is a subset of the original data frame in the form of a matrix (a matrix with a single column in the case of a univariate data set). In the first format, the specified cost function directly calculates the cost value. fastcpd() performs the vanilla PELT algorithm, and cost_gradient and cost_hessian should not be provided since no parameter updating is necessary for vanilla PELT. In the second format, the loss function i=stl(zi,θ)\sum_{i = s}^t l(z_i, \theta) is provided, which has to be optimized over the parameter θ\theta to obtain the cost value. A detailed discussion about the custom cost function usage can be found in the references.

cost_gradient

Gradient of the custom cost function. Example usage:

cost_gradient = function(data, theta) {
  ...
  return(gradient)
}

The gradient function takes two inputs, the first being a matrix representing a segment of the data, similar to the format used in the cost function, and the second being the parameter that needs to be optimized. The gradient function returns the value of the gradient of the loss function, i.e., i=stl(zi,θ)\sum_{i = s}^t \nabla l(z_i, \theta).

cost_hessian

Hessian of the custom loss function. The Hessian function takes two inputs, the first being a matrix representing a segment of the data, similar to the format used in the cost function, and the second being the parameter that needs to be optimized. The gradient function returns the Hessian of the loss function, i.e., i=st2l(zi,θ)\sum_{i = s}^t \nabla^2 l(z_i, \theta).

line_search

If a vector of numeric values is provided, a line search will be performed to find the optimal step size for each update. Detailed usage of line_search can be found in the references.

lower

Lower bound for the parameters. Used to specify the domain of the parameters after each gradient descent step. If not specified, the lower bound is set to be -Inf for all parameters. lower is especially useful when the estimated parameters take only positive values, such as the noise variance.

upper

Upper bound for the parameters. Used to specify the domain of the parameters after each gradient descent step. If not specified, the upper bound is set to be Inf for all parameters.

pruning_coef

Pruning coefficient $c_0$ used in the pruning step of the PELT algorithm with the default value 0. If cost_adjustment is specified as "MBIC", an adjustment term plog(2)p\log(2) will be added to the pruning coefficient. If cost_adjustment is specified as "MDL", an adjustment term plog2(2)p\log_2(2) will be added to the pruning coefficient. Detailed discussion about the pruning coefficient can be found in the references.

segment_count

An initial guess of the number of segments. If not specified, the initial guess of the number of segments is 10. The initial guess affects the initial estimates of the parameters in SeGD.

trim

Trimming for the boundary change points so that a change point close to the boundary will not be counted as a change point. This parameter also specifies the minimum distance between two change points. If several change points have mutual distances smaller than trim * nrow(data), those change points will be merged into one single change point. The value of this parameter should be between 0 and 1.

momentum_coef

Momentum coefficient to be applied to each update. This parameter is used when the loss function is bad-shaped so that maintaining a momentum from previous update is desired. Default value is 0, meaning the algorithm doesn't maintain a momentum by default.

multiple_epochs

A function can be specified such that an adaptive number of multiple epochs can be utilized to improve the algorithm's performance. multiple_epochs is a function of the length of the data segment. The function returns an integer indicating how many epochs should be performed apart from the default update. By default, the function returns zero, meaning no multiple epochs will be used to update the parameters. Example usage:

multiple_epochs = function(segment_length) {
  if (segment_length < 100) 1
  else 0
}

This function will let SeGD perform parameter updates with an additional epoch for each segment with a length less than 100 and no additional epoch for segments with lengths greater or equal to 100.

epsilon

Epsilon to avoid numerical issues. Only used for the Hessian computation in Logistic Regression and Poisson Regression.

order

Order of the AR(pp), VAR(pp) or ARIMA(pp, dd, qq) model.

p

Number of covariates in the model. If not specified, the number of covariates will be inferred from the data, i.e., p = ncol(data) - 1. This parameter is superseded by order in the case of time series models: "ar", "var", "arima".

cp_only

If TRUE, only the change points are returned. Otherwise, the cost function values together with the estimated parameters for each segment are also returned. By default the value is set to be FALSE so that plot can be used to visualize the results for a built-in model. cp_only has some performance impact on the algorithm, since the cost values and estimated parameters for each segment need to be calculated and stored. If the users are only interested in the change points, setting cp_only to be TRUE will help with the computational cost.

vanilla_percentage

The parameter vv is between zero and one. For each segment, when its length is no more than vTvT, the cost value will be computed by performing an exact minimization of the loss function over the parameter. When its length is greater than vTvT, the cost value is approximated through SeGD. Therefore, this parameter induces an algorithm that can be interpreted as an interpolation between dynamic programming with SeGD (v=0v = 0) and the vanilla PELT (v=1v = 1). The readers are referred to the references for more details.

warm_start

If TRUE, the algorithm will use the estimated parameters from the previous segment as the initial value for the current segment. This parameter is only used for the "glm" families.

...

Other parameters for specific models.

  • include.mean is used to determine if a mean/intercept term should be included in the ARIMA(pp, dd, qq) or GARCH(pp, qq) models.

  • r.clock is used to create an RcppClock object to record the time spent in the C++ code. Default is an empty string. If set to any non-empty string, an object with specified name will be created. Usage: library(RcppClock); plot(VARIABLE_NAME).

  • r.progress is used to control the progress bar. By default the progress bar will be shown. To disable it, set r.progress = FALSE.

  • p.response is used to specify the number of response variables. This parameter is especially useful for linear models with multivariate responses.

Value

A fastcpd object.

Gallery

https://github.com/doccstat/fastcpd/tree/main/tests/testthat/examples

References

Xingchi Li, Xianyang Zhang (2024). “fastcpd: Fast Change Point Detection in R.” arXiv:2404.05933, https://arxiv.org/abs/2404.05933.

Xianyang Zhang, Trisha Dawn (2023). “Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis.” In Ruiz, Francisco, Dy, Jennifer, van de Meent, Jan-Willem (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 series Proceedings of Machine Learning Research, 1129-1143.

See Also

fastcpd.family for the family-specific function; plot.fastcpd() for plotting the results, summary.fastcpd() for summarizing the results.

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 200
  p <- 4
  d <- 2
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta_1 <- matrix(runif(8, -3, -1), nrow = p)
  theta_2 <- matrix(runif(8, -1, 3), nrow = p)
  y <- rbind(
    x[1:125, ] %*% theta_1 + mvtnorm::rmvnorm(125, rep(0, d), 3 * diag(d)),
    x[126:n, ] %*% theta_2 + mvtnorm::rmvnorm(75, rep(0, d), 3 * diag(d))
  )
  result_mlm <- fastcpd(
    cbind(y.1, y.2) ~ . - 1, cbind.data.frame(y = y, x = x), family = "lm"
  )
  summary(result_mlm)
}
if (
  requireNamespace("mvtnorm", quietly = TRUE) &&
    requireNamespace("stats", quietly = TRUE)
) {
  set.seed(1)
  n <- 400 + 300 + 500
  p <- 5
  x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p))
  theta <- rbind(
    mvtnorm::rmvnorm(1, mean = rep(0, p - 3), sigma = diag(p - 3)),
    mvtnorm::rmvnorm(1, mean = rep(5, p - 3), sigma = diag(p - 3)),
    mvtnorm::rmvnorm(1, mean = rep(9, p - 3), sigma = diag(p - 3))
  )
  theta <- cbind(theta, matrix(0, 3, 3))
  theta <- theta[rep(seq_len(3), c(400, 300, 500)), ]
  y_true <- rowSums(x * theta)
  factor <- c(
    2 * stats::rbinom(400, size = 1, prob = 0.95) - 1,
    2 * stats::rbinom(300, size = 1, prob = 0.95) - 1,
    2 * stats::rbinom(500, size = 1, prob = 0.95) - 1
  )
  y <- factor * y_true + stats::rnorm(n)
  data <- cbind.data.frame(y, x)
  huber_threshold <- 1
  huber_loss <- function(data, theta) {
    residual <- data[, 1] - data[, -1, drop = FALSE] %*% theta
    indicator <- abs(residual) <= huber_threshold
    sum(
      residual^2 / 2 * indicator +
        huber_threshold * (
          abs(residual) - huber_threshold / 2
        ) * (1 - indicator)
    )
  }
  huber_loss_gradient <- function(data, theta) {
    residual <- c(data[nrow(data), 1] - data[nrow(data), -1] %*% theta)
    if (abs(residual) <= huber_threshold) {
      -residual * data[nrow(data), -1]
    } else {
      -huber_threshold * sign(residual) * data[nrow(data), -1]
    }
  }
  huber_loss_hessian <- function(data, theta) {
    residual <- c(data[nrow(data), 1] - data[nrow(data), -1] %*% theta)
    if (abs(residual) <= huber_threshold) {
      outer(data[nrow(data), -1], data[nrow(data), -1])
    } else {
      0.01 * diag(length(theta))
    }
  }
  huber_regression_result <- fastcpd(
    formula = y ~ . - 1,
    data = data,
    beta = (p + 1) * log(n) / 2,
    cost = huber_loss,
    cost_gradient = huber_loss_gradient,
    cost_hessian = huber_loss_hessian
  )
  summary(huber_regression_result)
}

set.seed(1)
p <- 1
x <- matrix(rnorm(375 * p, 0, 1), ncol = p)
theta <- rbind(rnorm(p, 0, 1), rnorm(p, 2, 1))
y <- c(
  rbinom(200, 1, 1 / (1 + exp(-x[1:200, ] %*% theta[1, , drop = FALSE]))),
  rbinom(175, 1, 1 / (1 + exp(-x[201:375, ] %*% theta[2, , drop = FALSE])))
)
data <- data.frame(y = y, x = x)
result_builtin <- suppressWarnings(fastcpd.binomial(data))
logistic_loss <- function(data, theta) {
  x <- data[, -1, drop = FALSE]
  y <- data[, 1]
  u <- x %*% theta
  nll <- -y * u + log(1 + exp(u))
  nll[u > 10] <- -y[u > 10] * u[u > 10] + u[u > 10]
  sum(nll)
}
logistic_loss_gradient <- function(data, theta) {
  x <- data[nrow(data), -1, drop = FALSE]
  y <- data[nrow(data), 1]
  c(-(y - 1 / (1 + exp(-x %*% theta)))) * x
}
logistic_loss_hessian <- function(data, theta) {
  x <- data[nrow(data), -1]
  prob <- 1 / (1 + exp(-x %*% theta))
  (x %o% x) * c((1 - prob) * prob)
}
result_custom <- fastcpd(
  formula = y ~ . - 1,
  data = data,
  epsilon = 1e-5,
  cost = logistic_loss,
  cost_gradient = logistic_loss_gradient,
  cost_hessian = logistic_loss_hessian
)
result_builtin@cp_set
result_custom@cp_set


if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 480
  p_true <- 6
  p <- 50
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta_0 <- rbind(
    runif(p_true, -5, -2),
    runif(p_true, -3, 3),
    runif(p_true, 2, 5),
    runif(p_true, -5, 5)
  )
  theta_0 <- cbind(theta_0, matrix(0, ncol = p - p_true, nrow = 4))
  y <- c(
    x[1:80, ] %*% theta_0[1, ] + rnorm(80, 0, 1),
    x[81:200, ] %*% theta_0[2, ] + rnorm(120, 0, 1),
    x[201:320, ] %*% theta_0[3, ] + rnorm(120, 0, 1),
    x[321:n, ] %*% theta_0[4, ] + rnorm(160, 0, 1)
  )
  small_lasso_data <- cbind.data.frame(y, x)
  result_no_vp <- fastcpd.lasso(
    small_lasso_data,
    beta = "BIC",
    cost_adjustment = NULL,
    pruning_coef = 0
  )
  summary(result_no_vp)
  result_20_vp <- fastcpd.lasso(
    small_lasso_data,
    beta = "BIC",
    cost_adjustment = NULL,
    vanilla_percentage = 0.2,
    pruning_coef = 0
  )
  summary(result_20_vp)
}

Find change points efficiently in AR(pp) models

Description

fastcpd_ar() and fastcpd.ar() are wrapper functions of fastcpd() to find change points in AR(pp) models. The function is similar to fastcpd() except that the data is by default a one-column matrix or univariate vector and thus a formula is not required here.

Usage

fastcpd_ar(data, order = 0, ...)

fastcpd.ar(data, order = 0, ...)

Arguments

data

A numeric vector, a matrix, a data frame or a time series object.

order

A positive integer specifying the order of the AR model.

...

Other arguments passed to fastcpd(), for example, segment_count. One special argument can be passed here is include.mean, which is a logical value indicating whether the mean should be included in the model. The default value is TRUE.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
  x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd.ar(x[3 + seq_len(n)], 3)
summary(result)
plot(result)
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
  x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <-
  fastcpd.ar(x[3 + seq_len(n)], 3, beta = "MDL", cost_adjustment = "MDL")
summary(result)
plot(result)

Find change points efficiently in ARIMA(pp, dd, qq) models

Description

fastcpd_arima() and fastcpd.arima() are wrapper functions of fastcpd() to find change points in ARIMA(pp, dd, qq) models. The function is similar to fastcpd() except that the data is by default a one-column matrix or univariate vector and thus a formula is not required here.

Usage

fastcpd_arima(data, order = 0, ...)

fastcpd.arima(data, order = 0, ...)

Arguments

data

A numeric vector, a matrix, a data frame or a time series object.

order

A vector of length three specifying the order of the ARIMA model.

...

Other arguments passed to fastcpd(), for example, segment_count. One special argument can be passed here is include.mean, which is a logical value indicating whether the mean should be included in the model. The default value is TRUE.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 271
w <- rnorm(n + 1, 0, 3)
dx <- rep(0, n + 1)
x <- rep(0, n + 1)
for (i in 1:180) {
  dx[i + 1] <- 0.8 * dx[i] + w[i + 1] - 0.5 * w[i]
  x[i + 1] <- x[i] + dx[i + 1]
}
for (i in 181:n) {
  dx[i + 1] <- -0.6 * dx[i] + w[i + 1] + 0.3 * w[i]
  x[i + 1] <- x[i] + dx[i + 1]
}
result <- fastcpd.arima(
  diff(x[1 + seq_len(n)]),
  c(1, 0, 1),
  segment_count = 3,
  include.mean = FALSE
)
summary(result)
plot(result)

Find change points efficiently in ARMA(pp, qq) models

Description

fastcpd_arma() and fastcpd.arma() are wrapper functions of fastcpd() to find change points in ARMA(pp, qq) models. The function is similar to fastcpd() except that the data is by default a one-column matrix or univariate vector and thus a formula is not required here.

Usage

fastcpd_arma(data, order = c(0, 0), ...)

fastcpd.arma(data, order = c(0, 0), ...)

Arguments

data

A numeric vector, a matrix, a data frame or a time series object.

order

A vector of length two specifying the order of the ARMA model.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 300
w <- rnorm(n + 3, 0, 3)
x <- rep(0, n + 3)
for (i in 1:200) {
  x[i + 3] <- 0.1 * x[i + 2] - 0.3 * x[i + 1] + 0.1 * x[i] +
    0.1 * w[i + 2] + 0.5 * w[i + 1] + w[i + 3]
}
for (i in 201:n) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.1 * x[i + 1] - 0.3 * x[i] -
    0.6 * w[i + 2] - 0.1 * w[i + 1] + w[i + 3]
}
result <- suppressWarnings(
  fastcpd.arma(
    data = x[3 + seq_len(n)],
    order = c(3, 2),
    segment_count = 3,
    lower = c(rep(-1, 3 + 2), 1e-10),
    upper = c(rep(1, 3 + 2), Inf),
    line_search = c(1, 0.1, 1e-2),
    beta = "BIC",
    cost_adjustment = "BIC"
  )
)
summary(result)
plot(result)

Find change points efficiently in logistic regression models

Description

fastcpd_binomial() and fastcpd.binomial() are wrapper functions of fastcpd() to find change points in logistic regression models. The function is similar to fastcpd() except that the data is by default a matrix or data frame with the response variable as the first column and thus a formula is not required here.

Usage

fastcpd_binomial(data, ...)

fastcpd.binomial(data, ...)

Arguments

data

A matrix or a data frame with the response variable as the first column.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 500
  p <- 4
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta <- rbind(rnorm(p, 0, 1), rnorm(p, 2, 1))
  y <- c(
    rbinom(300, 1, 1 / (1 + exp(-x[1:300, ] %*% theta[1, ]))),
    rbinom(200, 1, 1 / (1 + exp(-x[301:n, ] %*% theta[2, ])))
  )
  result <- suppressWarnings(fastcpd.binomial(cbind(y, x)))
  summary(result)
  plot(result)
}

Find change points efficiently in GARCH(pp, qq) models

Description

fastcpd_garch() and fastcpd.garch() are wrapper functions of fastcpd() to find change points in GARCH(pp, qq) models. The function is similar to fastcpd() except that the data is by default a one-column matrix or univariate vector and thus a formula is not required here.

Usage

fastcpd_garch(data, order = c(0, 0), ...)

fastcpd.garch(data, order = c(0, 0), ...)

Arguments

data

A numeric vector, a matrix, a data frame or a time series object.

order

A positive integer vector of length two specifying the order of the GARCH model.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 400
sigma_2 <- rep(1, n + 1)
x <- rep(0, n + 1)
for (i in seq_len(200)) {
  sigma_2[i + 1] <- 20 + 0.5 * x[i]^2 + 0.1 * sigma_2[i]
  x[i + 1] <- rnorm(1, 0, sqrt(sigma_2[i + 1]))
}
for (i in 201:400) {
  sigma_2[i + 1] <- 1 + 0.1 * x[i]^2 + 0.5 * sigma_2[i]
  x[i + 1] <- rnorm(1, 0, sqrt(sigma_2[i + 1]))
}
result <- suppressWarnings(
  fastcpd.garch(x[-1], c(1, 1), include.mean = FALSE)
)
summary(result)
plot(result)

Find change points efficiently in penalized linear regression models

Description

fastcpd_lasso() and fastcpd.lasso() are wrapper functions of fastcpd() to find change points in penalized linear regression models. The function is similar to fastcpd() except that the data is by default a matrix or data frame with the response variable as the first column and thus a formula is not required here.

Usage

fastcpd_lasso(data, ...)

fastcpd.lasso(data, ...)

Arguments

data

A matrix or a data frame with the response variable as the first column.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

if (
  requireNamespace("dplyr", quietly = TRUE) &&
    requireNamespace("ggplot2", quietly = TRUE) &&
    requireNamespace("mvtnorm", quietly = TRUE) &&
    requireNamespace("reshape2", quietly = TRUE)
) {
  set.seed(1)
  n <- 480
  p_true <- 5
  p <- 50
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta_0 <- rbind(
    runif(p_true, -5, -2),
    runif(p_true, -3, 3),
    runif(p_true, 2, 5),
    runif(p_true, -5, 5)
  )
  theta_0 <- cbind(theta_0, matrix(0, ncol = p - p_true, nrow = 4))
  y <- c(
    x[1:80, ] %*% theta_0[1, ] + rnorm(80, 0, 1),
    x[81:200, ] %*% theta_0[2, ] + rnorm(120, 0, 1),
    x[201:320, ] %*% theta_0[3, ] + rnorm(120, 0, 1),
    x[321:n, ] %*% theta_0[4, ] + rnorm(160, 0, 1)
  )
  result <- fastcpd.lasso(
    cbind(y, x),
    multiple_epochs = function(segment_length) if (segment_length < 30) 1 else 0
  )
  summary(result)
  plot(result)

  thetas <- result@thetas
  thetas <- cbind.data.frame(thetas, t(theta_0))
  names(thetas) <- c(
    "segment 1", "segment 2", "segment 3", "segment 4",
    "segment 1 truth", "segment 2 truth", "segment 3 truth", "segment 4 truth"
  )
  thetas$coordinate <- c(seq_len(p_true), rep("rest", p - p_true))
  molten <- reshape2::melt(thetas, id.vars = "coordinate")
  molten <- dplyr::mutate(
    molten,
    segment = gsub("segment ", "", variable),
    segment = gsub(" truth", "", segment),
    height = as.numeric(gsub("segment.*", "", segment)) +
      0.2 * as.numeric(grepl("truth", variable)),
    parameter = ifelse(grepl("truth", variable), "truth", "estimated")
  )
  ggplot2::ggplot() +
    ggplot2::geom_point(
      data = molten,
      ggplot2::aes(
        x = value, y = height, shape = coordinate, color = parameter
      ),
      size = 4
    ) +
    ggplot2::ylim(0.8, 4.4) +
    ggplot2::ylab("segment") +
    ggplot2::theme_bw()
}

Find change points efficiently in linear regression models

Description

fastcpd_lm() and fastcpd.lm() are wrapper functions of fastcpd() to find change points in linear regression models. The function is similar to fastcpd() except that the data is by default a matrix or data frame with the response variable as the first column and thus a formula is not required here.

Usage

fastcpd_lm(data, ...)

fastcpd.lm(data, ...)

Arguments

data

A matrix or a data frame with the response variable as the first column.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 300
  p <- 4
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta_0 <- rbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
  y <- c(
    x[1:100, ] %*% theta_0[1, ] + rnorm(100, 0, 3),
    x[101:200, ] %*% theta_0[2, ] + rnorm(100, 0, 3),
    x[201:n, ] %*% theta_0[3, ] + rnorm(100, 0, 3)
  )
  result_lm <- fastcpd.lm(cbind(y, x))
  summary(result_lm)
  plot(result_lm)

  set.seed(1)
  n <- 600
  p <- 4
  d <- 2
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta_1 <- matrix(runif(8, -3, -1), nrow = p)
  theta_2 <- matrix(runif(8, -1, 3), nrow = p)
  y <- rbind(
    x[1:350, ] %*% theta_1 + mvtnorm::rmvnorm(350, rep(0, d), 3 * diag(d)),
    x[351:n, ] %*% theta_2 + mvtnorm::rmvnorm(250, rep(0, d), 3 * diag(d))
  )
  result_mlm <- fastcpd.lm(cbind.data.frame(y = y, x = x), p.response = 2)
  summary(result_mlm)
}

Find change points efficiently in mean change models

Description

fastcpd_mean() and fastcpd.mean() are wrapper functions of fastcpd() to find the mean change. The function is similar to fastcpd() except that the data is by default a matrix or data frame or a vector with each row / element as an observation and thus a formula is not required here.

Usage

fastcpd_mean(data, ...)

fastcpd.mean(data, ...)

Arguments

data

A matrix, a data frame or a vector.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  p <- 3
  data <- rbind(
    mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(100, p)),
    mvtnorm::rmvnorm(400, mean = rep(50, p), sigma = diag(100, p)),
    mvtnorm::rmvnorm(300, mean = rep(2, p), sigma = diag(100, p))
  )
  result <- fastcpd.mean(data)
  summary(result)
}
set.seed(1)
data <- c(rnorm(10000), rnorm(1000) + 1)
(result_time <- system.time(
  result <- fastcpd.mean(data, r.progress = FALSE, cp_only = TRUE)
))
result@cp_set

set.seed(1)
data <- c(rnorm(10000), rnorm(10000, 1), rnorm(10000))
(result_time <- system.time(
  result <- fastcpd.mean(data, r.progress = FALSE, cp_only = TRUE)
))
result@cp_set


set.seed(1)
data <- c(rnorm(10000), rnorm(10000, 1), rnorm(10000))
(result_time <- system.time(
  result <- fastcpd.mean(
    data, beta = "BIC", cost_adjustment = "BIC",
    r.progress = FALSE, cp_only = TRUE
  )
))
result@cp_set

Find change points efficiently in mean variance change models

Description

fastcpd_meanvariance(), fastcpd.meanvariance(), fastcpd_mv(), fastcpd.mv() are wrapper functions of fastcpd() to find the meanvariance change. The function is similar to fastcpd() except that the data is by default a matrix or data frame or a vector with each row / element as an observation and thus a formula is not required here.

Usage

fastcpd_meanvariance(data, ...)

fastcpd.meanvariance(data, ...)

fastcpd_mv(data, ...)

fastcpd.mv(data, ...)

Arguments

data

A matrix, a data frame or a vector.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
p <- 1
result <- fastcpd.meanvariance(c(
  rnorm(300, 0, 1),
  rnorm(400, 10, 1),
  rnorm(300, 0, 10),
  rnorm(300, 0, 1),
  rnorm(400, 10, 1),
  rnorm(300, 10, 10)
))
summary(result)
plot(result)
if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  p <- 4
  result <- fastcpd.mv(
    rbind(
      mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(1, p)),
      mvtnorm::rmvnorm(400, mean = rep(10, p), sigma = diag(1, p)),
      mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(100, p)),
      mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(1, p)),
      mvtnorm::rmvnorm(400, mean = rep(10, p), sigma = diag(1, p)),
      mvtnorm::rmvnorm(300, mean = rep(10, p), sigma = diag(100, p))
    )
  )
  summary(result)
}

set.seed(1)
data <- c(rnorm(2000, 0, 1), rnorm(2000, 1, 1), rnorm(2000, 1, 2))
(result_time <- system.time(
  result <- fastcpd.variance(data, r.progress = FALSE, cp_only = TRUE)
))
result@cp_set


set.seed(1)
data <- c(rnorm(2000, 0, 1), rnorm(2000, 1, 1), rnorm(2000, 1, 2))
(result_time <- system.time(
  result <- fastcpd.variance(
    data, beta = "BIC", cost_adjustment = "BIC",
    r.progress = TRUE, cp_only = TRUE
  )
))
result@cp_set

Find change points efficiently in Poisson regression models

Description

fastcpd_poisson() and fastcpd.poisson() are wrapper functions of fastcpd() to find change points in Poisson regression models. The function is similar to fastcpd() except that the data is by default a matrix or data frame with the response variable as the first column and thus a formula is not required here.

Usage

fastcpd_poisson(data, ...)

fastcpd.poisson(data, ...)

Arguments

data

A matrix or a data frame with the response variable as the first column.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 1100
  p <- 3
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  delta <- rnorm(p)
  theta_0 <- c(1, 0.3, -1)
  y <- c(
    rpois(500, exp(x[1:500, ] %*% theta_0)),
    rpois(300, exp(x[501:800, ] %*% (theta_0 + delta))),
    rpois(200, exp(x[801:1000, ] %*% theta_0)),
    rpois(100, exp(x[1001:1100, ] %*% (theta_0 - delta)))
  )
  result <- fastcpd.poisson(cbind(y, x))
  summary(result)
  plot(result)
}

Find change points efficiently in time series data

Description

fastcpd_ts() and fastcpd.ts() are wrapper functions for fastcpd() to find change points in time series data. The function is similar to fastcpd() except that the data is a time series and the family is one of "ar", "var", "arma", "arima" or "garch".

Usage

fastcpd_ts(data, family = NULL, order = c(0, 0, 0), ...)

fastcpd.ts(data, family = NULL, order = c(0, 0, 0), ...)

Arguments

data

A numeric vector, a matrix, a data frame or a time series object.

family

A character string specifying the family of the time series. The value should be one of "ar", "var", "arima" or "garch".

order

A positive integer or a vector of length less than four specifying the order of the time series. Possible combinations with family are:

  • "ar", NUMERIC(1): AR(pp) model using linear regression.

  • "var", NUMERIC(1): VAR(pp) model using linear regression.

  • "arima", NUMERIC(3): ARIMA(pp, dd, qq) model using forecast::Arima().

  • "garch", NUMERIC(2): GARCH(pp, qq) model using tseries::garch().

...

Other arguments passed to fastcpd(), for example, segment_count. One special argument can be passed here is include.mean, which is a logical value indicating whether the mean should be included in the model. The default value is TRUE.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 400
w <- rnorm(n + 4, 0, 0.1)
x <- rep(NA, n)
for (i in 1:200) {
  x[i] <- w[i + 4] - 5 / 3 * w[i + 3] + 11 / 12 * w[i + 2] - 5 / 12 * w[i + 1] +
    1 / 6 * w[i]
}
for (i in 201:n) {
  x[i] <- w[i + 4] - 4 / 3 * w[i + 3] + 7 / 9 * w[i + 2] - 16 / 27 * w[i + 1] +
    4 / 27 * w[i]
}
result <- fastcpd.ts(
  x,
  "arma",
  c(0, 4),
  lower = c(-2, -2, -2, -2, 1e-10),
  upper = c(2, 2, 2, 2, Inf),
  line_search = c(1, 0.1, 1e-2),
  trim = 0.05
)
summary(result)
plot(result)

Find change points efficiently in VAR(pp) models

Description

fastcpd_var() and fastcpd.var() are wrapper functions of fastcpd_ts() to find change points in VAR(pp) models. The function is similar to fastcpd_ts() except that the data is by default a matrix with row as an observation and thus a formula is not required here.

Usage

fastcpd_var(data, order = 0, ...)

fastcpd.var(data, order = 0, ...)

Arguments

data

A matrix, a data frame or a time series object.

order

A positive integer specifying the order of the VAR model.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
n <- 300
p <- 2
theta_1 <- matrix(c(-0.3, 0.6, -0.5, 0.4, 0.2, 0.2, 0.2, -0.2), nrow = p)
theta_2 <- matrix(c(0.3, -0.4, 0.1, -0.5, -0.5, -0.2, -0.5, 0.2), nrow = p)
x <- matrix(0, n + 2, p)
for (i in 1:200) {
  x[i + 2, ] <- theta_1 %*% c(x[i + 1, ], x[i, ]) + rnorm(p, 0, 1)
}
for (i in 201:n) {
  x[i + 2, ] <- theta_2 %*% c(x[i + 1, ], x[i, ]) + rnorm(p, 0, 1)
}
result <- fastcpd.var(x, 2)
summary(result)

Find change points efficiently in variance change models

Description

fastcpd_variance() and fastcpd.variance() are wrapper functions of fastcpd() to find the variance change. The function is similar to fastcpd() except that the data is by default a matrix or data frame or a vector with each row / element as an observation and thus a formula is not required here.

Usage

fastcpd_variance(data, ...)

fastcpd.variance(data, ...)

Arguments

data

A matrix, a data frame or a vector.

...

Other arguments passed to fastcpd(), for example, segment_count.

Value

A fastcpd object.

See Also

fastcpd()

Examples

set.seed(1)
data <- c(rnorm(300, 0, 1), rnorm(400, 0, 100), rnorm(300, 0, 1))
result <- fastcpd.variance(data)
summary(result)
if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  p <- 3
  result <- fastcpd.variance(
    rbind(
      mvtnorm::rmvnorm(
        300, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
      ),
      mvtnorm::rmvnorm(
        400, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
      ),
      mvtnorm::rmvnorm(
        300, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
      )
    )
  )
  summary(result)
}

set.seed(1)
data <- c(rnorm(3000, 0, 1), rnorm(3000, 0, 2), rnorm(3000, 0, 1))
(result_time <- system.time(
  result <- fastcpd.variance(data, r.progress = FALSE, cp_only = TRUE)
))
result@cp_set


set.seed(1)
data <- c(rnorm(3000, 0, 1), rnorm(3000, 0, 2), rnorm(3000, 0, 1))
(result_time <- system.time(
  result <- fastcpd.variance(
    data, beta = "BIC", cost_adjustment = "BIC",
    r.progress = FALSE, cp_only = TRUE
  )
))
result@cp_set

An S4 class to store the output created with fastcpd()

Description

This S4 class stores the output from fastcpd() and fastcpd.family. A fastcpd object consist of several slots including the call to fastcpd(), the data used, the family of the model, the change points, the cost values, the residuals, the estimated parameters and a boolean indicating whether the model was fitted with only change points or with change points and parameters, which you can select using @.

Slots

call

The call of the function.

data

The data passed to the function.

order

The order of the time series model.

family

The family of the model.

cp_set

The set of change points.

cost_values

The cost function values for each segment.

residuals

The residuals of the model with change points. Used only for built-in families.

thetas

The estimated parameters for each segment. Used only for built-in families.

cp_only

A boolean indicating whether fastcpd() was run to return only the change points or the change points with the estimated parameters and cost values for each segment.


Occupancy Detection Data Set

Description

Data set for binary classification of room occupancy from temperature, humidity, light and CO2 measurements. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

Usage

occupancy

Format

A data frame with 9752 rows and 7 variables:

date

Character in the format "YYYY-MM-DD hh:mm:ss" from 2015-02-11 14:48:00 to 2015-02-18 09:19:00

Temperature

Temperature in Celsius

Humidity

Humidity

Light

Light

CO2

CO2

HumidityRatio

Humidity Ratio

Occupancy

Binary variable with values 0 (unoccupied) and 1

Source

<https://github.com/LuisM78/Occupancy-detection-data>


Plot the data and the change points for a fastcpd object

Description

Plot the data and the change points for a fastcpd object

Usage

## S3 method for class 'fastcpd'
plot(
  x,
  color_max_count = Inf,
  data_point_alpha = 0.8,
  data_point_linewidth = 0.5,
  data_point_size = 1,
  legend_position = "none",
  panel_background = ggplot2::element_blank(),
  panel_border = ggplot2::element_rect(fill = NA, colour = "grey20"),
  panel_grid_major = ggplot2::element_line(colour = "grey98"),
  panel_grid_minor = ggplot2::element_line(colour = "grey98"),
  segment_separator_alpha = 0.8,
  segment_separator_color = "grey",
  segment_separator_linetype = "dashed",
  strip_background = ggplot2::element_rect(fill = "grey85", colour = "grey20"),
  xlab = NULL,
  ylab = NULL,
  ...
)

## S4 method for signature 'fastcpd,missing'
plot(
  x,
  color_max_count = Inf,
  data_point_alpha = 0.8,
  data_point_linewidth = 0.5,
  data_point_size = 1,
  legend_position = "none",
  panel_background = ggplot2::element_blank(),
  panel_border = ggplot2::element_rect(fill = NA, colour = "grey20"),
  panel_grid_major = ggplot2::element_line(colour = "grey98"),
  panel_grid_minor = ggplot2::element_line(colour = "grey98"),
  segment_separator_alpha = 0.8,
  segment_separator_color = "grey",
  segment_separator_linetype = "dashed",
  strip_background = ggplot2::element_rect(fill = "grey85", colour = "grey20"),
  xlab = NULL,
  ylab = NULL,
  ...
)

Arguments

x

A fastcpd object.

color_max_count

Maximum number of colors to use for the plotting of segments.

data_point_alpha

Alpha of the data points.

data_point_linewidth

Linewidth of the data points.

data_point_size

Size of the data points.

legend_position

Position of the legend.

panel_background

Background of the panel.

panel_border

Border of the panel.

panel_grid_major

Major grid lines of the panel.

panel_grid_minor

Minor grid lines of the panel.

segment_separator_alpha

Alpha of the segment separator lines.

segment_separator_color

Color of the segment separator lines.

segment_separator_linetype

Linetype of the segment separator lines.

strip_background

Background of the strip.

xlab

Label for the x-axis.

ylab

Label for the y-axis.

...

Ignored.

Value

No return value, called for plotting.

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  p <- 1
  x <- mvtnorm::rmvnorm(300, rep(0, p), diag(p))
  theta_0 <- matrix(c(1, -1, 0.5))
  y <- c(
    x[1:100, ] * theta_0[1, ] + rnorm(100, 0, 1),
    x[101:200, ] * theta_0[2, ] + rnorm(100, 0, 1),
    x[201:300, ] * theta_0[3, ] + rnorm(100, 0, 1)
  )
  result <- fastcpd.lm(cbind(y, x), r.clock = "fastcpd_profiler")
  summary(result)
  plot(result)

  if (requireNamespace("RcppClock", quietly = TRUE)) {
    library(RcppClock)
    plot(fastcpd_profiler)
  }
}

Print the call and the change points for a fastcpd object

Description

Print the call and the change points for a fastcpd object

Usage

## S3 method for class 'fastcpd'
print(x, ...)

## S4 method for signature 'fastcpd'
print(x, ...)

Arguments

x

A fastcpd object.

...

Ignored.

Value

Return a (temporarily) invisible copy of the fastcpd object. Called primarily for printing the change points in the model.


Show the available methods for a fastcpd object

Description

Show the available methods for a fastcpd object

Usage

## S3 method for class 'fastcpd'
show(object)

## S4 method for signature 'fastcpd'
show(object)

Arguments

object

A fastcpd object.

Value

No return value, called for showing a list of available methods for a fastcpd object.


Show the summary of a fastcpd object

Description

Show the summary of a fastcpd object

Usage

## S3 method for class 'fastcpd'
summary(object, ...)

## S4 method for signature 'fastcpd'
summary(object, ...)

Arguments

object

A fastcpd object.

...

Ignored.

Value

Return a (temporarily) invisible copy of the fastcpd object. Called primarily for printing the summary of the model including the call, the change points, the cost values and the estimated parameters.


Transcription Profiling of 57 Human Bladder Carcinoma Samples

Description

Transcriptome analysis of 57 bladder carcinomas on Affymetrix HG-U95A and HG-U95Av2 microarrays

Usage

transcriptome

Format

A data frame with 2215 rows and 43 variables:

3

Individual 3

4

Individual 4

5

Individual 5

6

Individual 6

7

Individual 7

8

Individual 8

9

Individual 9

10

Individual 10

14

Individual 14

15

Individual 15

16

Individual 16

17

Individual 17

18

Individual 18

19

Individual 19

21

Individual 21

22

Individual 22

24

Individual 24

26

Individual 26

28

Individual 28

30

Individual 30

31

Individual 31

33

Individual 33

34

Individual 34

35

Individual 35

36

Individual 36

37

Individual 37

38

Individual 38

39

Individual 39

40

Individual 40

41

Individual 41

42

Individual 42

43

Individual 43

44

Individual 44

45

Individual 45

46

Individual 46

47

Individual 47

48

Individual 48

49

Individual 49

50

Individual 50

51

Individual 51

53

Individual 53

54

Individual 54

57

Individual 57

Source

<https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-TABM-147>

<https://github.com/cran/ecp/tree/master/data>

Examples

if (requireNamespace("ggplot2", quietly = TRUE)) {
  result <- fastcpd.mean(transcriptome$"10", trim = 0.005)
  summary(result)
  plot(result)

  result_all <- fastcpd.mean(
    transcriptome,
    beta = (ncol(transcriptome) + 1) * log(nrow(transcriptome)) / 2 * 5,
    trim = 0
  )

  plots <- lapply(
    seq_len(ncol(transcriptome)), function(i) {
      ggplot2::ggplot(
        data = data.frame(
          x = seq_along(transcriptome[, i]), y = transcriptome[, i]
        ),
        ggplot2::aes(x = x, y = y)
      ) +
        ggplot2::geom_line(color = "steelblue") +
        ggplot2::geom_vline(
          xintercept = result_all@cp_set,
          color = "red",
          linetype = "dotted",
          linewidth = 0.5,
          alpha = 0.7
        ) +
        ggplot2::theme_void()
    }
  )

  if (requireNamespace("gridExtra", quietly = TRUE)) {
    gridExtra::grid.arrange(grobs = plots, ncol = 1, nrow = ncol(transcriptome))
  }
}

UK Seatbelts Data

Description

Road Casualties in Great Britain 1969–84.

Usage

uk_seatbelts

Format

uk_seatbelts is a multiple time series, with columns

DriversKilled

car drivers killed.

front

front-seat passengers killed or seriously injured.

rear

rear-seat passengers killed or seriously injured.

kms

distance driven.

PetrolPrice

petrol price.

VanKilled

number of van (‘light goods vehicle’) drivers.

law

0/1: was the law in effect that month?

Source

R package datasets

Examples

if (
  requireNamespace("ggplot2", quietly = TRUE) &&
    requireNamespace("lubridate", quietly = TRUE) &&
    requireNamespace("zoo", quietly = TRUE)
) {
  result_ar <- fastcpd.ar(diff(uk_seatbelts[, "drivers"], 12), 1, beta = "BIC")
  summary(result_ar)
  plot(result_ar)

  result_lm <- suppressMessages(fastcpd.lm(
    diff(uk_seatbelts[, c("drivers", "kms", "PetrolPrice", "law")], lag = 12)
  ))
  cp_dates <- as.Date("1969-01-01", format = "%Y-%m-%d")
  cp_dates <- cp_dates + lubridate::period(month = 1 + result_lm@cp_set + 12)
  cp_dates <- zoo::as.yearmon(cp_dates)

  dates <- zoo::as.yearmon(time(uk_seatbelts))

  uk_seatbelts_df <- data.frame(
    dates = dates,
    drivers = c(uk_seatbelts[, "drivers"]),
    color = as.factor((dates < cp_dates[1]) + (dates < cp_dates[2]))
  )

  ggplot2::ggplot() +
    ggplot2::geom_line(
      data = uk_seatbelts_df,
      mapping = ggplot2::aes(x = dates, y = drivers, color = color)
    ) +
    ggplot2::geom_vline(
      xintercept = cp_dates,
      linetype = "dashed",
      color = "red"
    ) +
    zoo::scale_x_yearmon() +
    ggplot2::annotate(
      "text",
      x = cp_dates,
      y = 1025,
      label = as.character(cp_dates),
      color = "blue"
    ) +
    ggplot2::theme_bw() +
    ggplot2::theme(legend.position = "none")
}

Variance estimation for ARMA model with change points

Description

Estimate the variance for each block and then take the average.

Usage

variance_arma(data, p, q, max_order = p * q)

variance.arma(data, p, q, max_order = p * q)

Arguments

data

A one-column matrix or a vector.

p

The order of the autoregressive part.

q

The order of the moving average part.

max_order

The maximum order of the AR model to consider.

Value

A numeric value representing the variance.

Examples

set.seed(1)
n <- 300
w <- rnorm(n + 3, 0, 10)
x <- rep(0, n + 3)
for (i in 1:200) {
  x[i + 3] <- 0.1 * x[i + 2] - 0.3 * x[i + 1] + 0.1 * x[i] +
    0.1 * w[i + 2] + 0.5 * w[i + 1] + w[i + 3]
}
for (i in 201:n) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.1 * x[i + 1] - 0.3 * x[i] -
    0.6 * w[i + 2] - 0.1 * w[i + 1] + w[i + 3]
}
(result <- variance.arma(x[-seq_len(3)], p = 3, q = 2))

Variance estimation for linear models with change points

Description

Estimate the variance for each block and then take the average.

Usage

variance_lm(data, d = 1, block_size = ncol(data) - d + 1, outlier_iqr = Inf)

variance.lm(data, d = 1, block_size = ncol(data) - d + 1, outlier_iqr = Inf)

Arguments

data

A matrix or a data frame with the response variable as the first column.

d

The dimension of the response variable.

block_size

The size of the blocks to use for variance estimation.

outlier_iqr

The number of interquartile ranges to use as a threshold for outlier detection.

Value

A numeric value representing the variance.

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  n <- 300
  p <- 4
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta <- rbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
  y <- c(
    x[1:100, ] %*% theta[1, ] + rnorm(100, 0, 3),
    x[101:200, ] %*% theta[2, ] + rnorm(100, 0, 3),
    x[201:n, ] %*% theta[3, ] + rnorm(100, 0, 3)
  )
  (sigma2 <- variance.lm(cbind(y, x)))

  set.seed(1)
  n <- 300
  p <- 4
  d <- 2
  x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
  theta <- cbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
  theta <- cbind(theta, theta)
  y <- rbind(
    x[1:100, ] %*% theta[, 1:2] +
      mvtnorm::rmvnorm(100, rep(0, d), diag(3, d)),
    x[101:200, ] %*% theta[, 3:4] +
      mvtnorm::rmvnorm(100, rep(0, d), diag(3, d)),
    x[201:n, ] %*% theta[, 5:6] +
      mvtnorm::rmvnorm(100, rep(0, d), diag(3, d))
  )
  (sigma <- variance.lm(cbind(y, x), d = d))
}

Variance estimation for mean change models

Description

Implement Rice estimator for variance in mean change models.

Usage

variance_mean(data)

variance.mean(data)

Arguments

data

A matrix or a data frame with data points as each row.

Value

A matrix representing the variance-covariance matrix or a numeric value representing the variance.

Examples

if (requireNamespace("mvtnorm", quietly = TRUE)) {
  set.seed(1)
  p <- 3
  data <- rbind(
    mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(100, p)),
    mvtnorm::rmvnorm(400, mean = rep(50, p), sigma = diag(100, p)),
    mvtnorm::rmvnorm(300, mean = rep(2, p), sigma = diag(100, p))
  )
  (sigma <- variance.mean(data))
}

Variance estimation for median change models

Description

Implement Rice estimator.

Usage

variance_median(data)

variance.median(data)

Arguments

data

A vector of data points.

Value

A numeric value representing the variance.

Examples

(sigma2 <- variance.median(well_log))

Well-log Dataset from Numerical Bayesian Methods Applied to Signal Processing

Description

This is the well-known well-log dataset used in many changepoint papers obtained from Alan Turing Institute GitHub repository and licensed under the MIT license.

Usage

well_log

Format

A Time-Series of length 4050.

Source

<https://github.com/alan-turing-institute/TCPD>

Examples

result <- fastcpd.mean(well_log, trim = 0.001)
summary(result)
plot(result)

if (requireNamespace("matrixStats", quietly = TRUE)) {
  sigma2 <- variance.median(well_log)
  median_loss <- function(data) {
    sum(abs(data - matrixStats::colMedians(data))) / sqrt(sigma2) / 2
  }
  result <- fastcpd(
    formula = ~ x - 1,
    data = cbind.data.frame(x = well_log),
    cost = median_loss,
    trim = 0.002
  )
  summary(result)

  segment_starts <- c(1, result@cp_set)
  segment_ends <- c(result@cp_set - 1, length(well_log))
  residual <- NULL
  for (segment_index in seq_along(segment_starts)) {
    segment <-
      well_log[segment_starts[segment_index]:segment_ends[segment_index]]
    residual <- c(residual, segment - median(segment))
  }

  result@residuals <- matrix(residual)
  result@data <- data.frame(x = c(well_log))
  plot(result)
}