Cheat Sheet R



Getting started using the forecast package for time series data in R, as quickly as possible and no explanations.

Plot.ly/r/getting-started p plotly (library( plotly ) x = rnorm( 1000 ), y = rnorm( 1000 ), mode = ‘markers’ ) plotly (x = c( 1, 2, 3 ), y = c( 5, 6, 7 ). This cheat sheet summarizes common Stata commands for econometric analysis and provides their equivalent expression in R. References for importing/cleaning data, manipulating variables, and other basic commands include Hanck et al. (2019), Econometrics with R, and Wickham and Grolemund (2017), R for Data Science.

Source: Forecasting: Principles and Practice

Coerce your data to ts format:

  • autoplot(): Useful function to plot data and forecasts

Time Series Cheat Sheet in R 4 minute read Getting started using the forecast package for time series data in R, as quickly as possible and no explanations. Source: Forecasting: Principles and Practice. Coerce your data to ts format. The base R gsub function searches for a regular expression in a string and replaces it. The function recieve a string or character to replace, a replacement value, and the object that contains the regular expression.

Seasonality

  • ggseasonplot(): Create a seasonal plot
  • ggsubseriesplot(): Create mini plots for each season and show seasonal means
R studio stats cheat sheet

Lags and ACF

  • gglagplot(): Plot the time series against lags of itself
  • ggAcf(): Plot the autocorrelation function (ACF)

Basic R Syntax Cheat Sheet

White Noise and the Ljung-Box Test

White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.

Regex

You can use the Ljung-Box test to check if a time series is white noise, here’s an example with 24 lags:

p-value > 0.05 suggests data are not significantly different than white noise

The forecast package includes a few common models out of the box. Fit the model and create a forecast object, and then use the forecast() function on the object and a number of h periods to predict.

Example of the workflow:

Naive Models

Useful to benchmark against naive and seasonal naive models.

  • naive()
  • snaive()

Residuals

Residuals are the difference between the model’s fitted values and the actual data. Residuals should look like white noise and be:

  • Uncorrelated
  • Have mean zero

And ideally have:

  • Constant variance
  • A normal distribution

checkresiduals(): helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.

Evaluating Model Accuracy

Train/Test split with window function:

window(data, start, end): to slice the ts data

Use accuracy() on the model and test set

accuracy(model, testset): Provides accuracy measures like MAE, MSE, MAPE, RMSE etc

Backtesting with one step ahead forecasts, aka “Time series cross validation” can be done with a helper function tsCV().

tsCV(): returns forecast errors given a forecastfunction that returns a forecast object and number of steps ahead h. At h = 1 the forecast errors will just be the model residuals.

Studio

Here’s an example using the naive() model, forecasting one period ahead:

Cheat Sheet R

Exponential Models

  • ses(): Simple Exponential Smoothing, implement a smoothing parameter alpha on previous data
  • holt(): Holt’s linear trend, SES + trend parameter. Use damped=TRUE for damped trending
  • hw(): Holt-Winters method, incorporates linear trend and seasonality. Set seasonal=”additive” for additive version or “multiplicative” for multiplicative version

ETS Models

The forecast package includes a function ets() for your exponential smoothing models. ets() estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}

Transformations

May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a Box-Cox transformation.

BoxCox(): Box-Cox uses a lambda parameter between -1 and 1 to stabilize the variance. A lambda of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.

Differencing is another transformation that uses differences between observations to model changes rather than the observations themselves.

ARIMA

Parameters: (p,d,q)(P,D,Q)m

ParameterDescription
p# of autoregression lags
d# of lag-1 differences
q# of Moving Average lags
P# of seasonal AR lags
D# of seasonal differences
Q# of seasonal MA lags
m# of observations per year

Arima(): Implementation of the ARIMA function, set include.constant = TRUE to include drift aka the constant

auto.arima(): Automatic implentation of the ARIMA function in forecast. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise = FALSE to consider more models at the expense of more time.

Dynamic Regression

Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.

Usage example:

Dynamic Harmonic Regression

Dynamic Regression with K fourier terms to model seasonality. With higher K the model becomes more flexible.

Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging. Arima() and auto.arima() may run out of memory at large seasonal periods (i.e. >200).

TBATS

Cheat Sheet R Markdown

Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time.Cons: Slow.

R Studio Code Cheat Sheet

  • T: Trigonemtric terms for seasonality
  • B: Box-Cox transformations for heterogeneity
  • A: ARMA errors for short term dynamics
  • T: Trend (possibly damped)
  • S: Seasonal (including multiple and non-integer periods)