**ARIMA - Box Jenkins Algorithm Example**

For details on Box-Jenkins algorithm please visit our ARIMA PROJECTIONS page .

In this example we are going to show how Box-Jenkins algorithm works by applying it on two empirical data sets. For the first part of this example we are going to use auto.arima() function available in R. In the second part of this example we are going to learn other R command available for running ARIMA analysis.

The auto.arima() function in R was used to preliminary identify ARIMA model. Essentially auto.arima() function uses a variation of the Hyndman and Khandakar algorithm which combines unit root tests, minimization of the AICc and MLE to obtain an adequate ARIMA model. The auto.arima() function follows these steps.

**Step 1.**The number of differences d is determined using repeated KPSS tests.

**Step 2.**The values of

*p*and

*q*are determined by minimizing the AICc after differencing the data

*d*times.

(a) The best model (with smallest AICc) is selected from the following four:

ARIMA(2,d,2),

ARIMA(0,d,0),

ARIMA(1,d,0),

ARIMA(0,d,1).

If

*d=0*then the constant

*c*is included; if

*d>0*then the constant

*c*is set to zero. This is called the “current model”.

(b) Variations on the current model are considered:

- vary
*p*and/or q from the current model by ±1; - include/exclude
*c*from the current model.

- vary

(c) Repeat Step 2(b) until no lower AICc can be found.

**Step 3**. Fitting Procedure

When fitting an ARIMA model to a set of time series data, the following procedure provides a useful general approach.

- Plot the data. Identify any unusual observations.
- If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
- If the data are non-stationary: take first differences of the data until the data are stationary.
- Examine the ACF/PACF: Is an AR( ) or MA( ) model appropriate?
- Try your chosen model(s), and use the AICc to search for a better model.
- Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
- Once the residuals look like white noise, calculate forecasts ( this procedure is modified from book
**"Forecasting: principles and practice**" By Rob J Hyndman, George Athanasopoulos).

Now that we have some background on how algorithm works lets do step by step procedure .

**EXAMPLE 1 -AUTOMATED PROCEDURE**

Before we start procedure we need to download data set (

**DS1.txt**) and DS2.txt

Next step is to plot the data.

Steps to create time series plot ( note that dark bold letters are R- commands used for generating plots and results , blue letters are R -outputs) .

1. Download DS1.txt file to your computer folder

2. Open your R program and upload fpp library ( you need to instal fpp R-package into your R root folder)

**library(fpp)**# command that loads fpp library

3. Import Ds1.txt file into the R

**y =scan(file.choose())**# this command allows you to browse folders for the text files to import

4. Plot the data

**windows()**# keeps figure window open

**plot(y, main="DS1")**### plots the data set DS1

After visually inspecting the data, we are going to make auto-correlation plots to determine if data is un-stationary or not.

Plotting auto-correlation graphs:

1.Plot auto-correlation and partial auto-correlation graph in the same window

**windows()**

**par(mfrow=c(1,2))**## two plots in the same window

**Acf(y,main="autocorrelation DS1")**### auto-correlation plot of the data set

**Pacf(y,main=" partial autocorr. DS1")**### partial auto-correlation plot of the data set

2. Auto-correlation plots show that data is un-stationary and complex in nature.

Now we are going to learn how to use auto.arima() function to analyze our data set.

Below is R output of auto.arima() function

ARIMA(4,2,3) # this line tells you the best fitting model

Coefficients:

ar1 ar2 ar3 ar4 ma1 ma2 ma3

0.5903 0.0610 0.0411 -0.0118 -1.4669 0.5457 -0.0669

s.e. 0.8168 0.3566 0.1224 0.0581 0.8170 1.0644 0.2694

sigma^2 estimated as 7.364e-05: log likelihood=12380.63

AIC=-24745.26 AICc=-24745.22 BIC=-24695.52

As we can see from the output auto.arima () function picked up ARIMA (4,2,3) model as the best fitting model for our data set. Additionally we can see that output gives auto-correlation and moving average coefficients and AICs scores. Model has C=0 indicating non-deterministic process.

**model <- auto.arima((y),max.P=0,max.Q=0,D=0)**### Box- Jenkins algorithm,with , and determine best fitting ARIMA model**model**# display model coefficients and AIC evaluations.Below is R output of auto.arima() function

ARIMA(4,2,3) # this line tells you the best fitting model

Coefficients:

ar1 ar2 ar3 ar4 ma1 ma2 ma3

0.5903 0.0610 0.0411 -0.0118 -1.4669 0.5457 -0.0669

s.e. 0.8168 0.3566 0.1224 0.0581 0.8170 1.0644 0.2694

sigma^2 estimated as 7.364e-05: log likelihood=12380.63

AIC=-24745.26 AICc=-24745.22 BIC=-24695.52

As we can see from the output auto.arima () function picked up ARIMA (4,2,3) model as the best fitting model for our data set. Additionally we can see that output gives auto-correlation and moving average coefficients and AICs scores. Model has C=0 indicating non-deterministic process.

At this point we are ready for building forecasting model. Lets forecast next 10 points and plot it.

**windows()****forecast(model,h=10)**##### forecast() function predicts 10 time points of the data sets DS1 based on ARIMA (4,2,3) coefficients.**plot(forecast(model,h=10),xlab=" 10 point DS1 forecast", include=100**) ### plotting last 100 data set points and 10 forecasted pointsRemember that auto.arima() function does not test for validity of the model. In order for model to be valid, model's residuals need to be tested ( in other words residuals need to be random and not correlated).

In this step we are going to check randomness of the model's residuals, by plotting auto-correlation and partial auto-correlation plots

In this step we are going to check randomness of the model's residuals, by plotting auto-correlation and partial auto-correlation plots

**windows()****plot(residuals(model), main="residual plot ARIMA (4,2,3)")**### Plotting model residuals**windows()****par(mfrow=c(1,2))****Acf(residuals(model))**#### auto-correlation plot of model's residuals**Pacf(residuals(model))**### partial auto-correlation plot of model's residualsThe ACF and PACF plots show that there are no significant first 20 lags, indicating that residuals are mostly random (meaning they are not auto-correlated). However, we do have 2 significant lags ( 20 and 23), therefore Box-Ljungberg test of the residuals is needed to further strengthen our model.

**Box.test(residuals(model), lag=24, fitdf=9,**### Box-Ljungberg test that test residual

**type="Ljung"**)**#### if the p<0.05 that residuals are not random and the model is not valid**

**Box-Ljung test**

data: residuals(model)

X-squared = 23.4935, df = 15, p-value = 0.07421

**Results show that p=0.07421 which is >0.05 , there fore residuals are random and the forecasting model is valid.**

Complete R code used in this example can be downloaded here.