Time series analysis methods include. Tutorial: Time Series Analysis

Subscribe
Join the “koon.ru” community!
In contact with:

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Federal Agency for Education

Volgograd State Technical University

CONTROLJOB

by discipline: MModels and methods in economics

on the topic "Time Series Analysis"

Completed by: student of group EZB 291c Selivanova O.V.

Volgograd 2010

Introduction

Time series classification

Time series analysis methods

Conclusion

Literature

Introduction

The study of the dynamics of socio-economic phenomena, the identification and characterization of the main development trends and interrelationship patterns provides the basis for forecasting, that is, determining the future dimensions of the economic phenomenon.

Issues of forecasting are becoming especially relevant in the context of the transition to international systems and methods of accounting and analysis of socio-economic phenomena.

Statistical methods occupy an important place in the accounting system. The application and use of forecasting assumes that the pattern of development that operates in the past remains the same in the predicted future.

Thus, the study of methods for analyzing the quality of forecasts is very relevant today. It is this topic that is chosen as the object of research in this work.

A time series is a time-ordered sequence of values ​​of some arbitrary variable. Each individual value of this variable is called a count of the time series. Thus, a time series differs significantly from a simple data sample.

Time series classification

Time series are classified according to the following criteria.

1. According to the form of presentation of levels:

Ш series of absolute indicators;

Ш relative indicators;

Sh average sizes.

2. By the nature of the time parameter:

Sh momentary. In momentary time series, levels characterize the values ​​of an indicator as of certain points in time. In interval series, levels characterize the value of an indicator for certain periods of time.

Ш interval time series. An important feature of interval time series of absolute values ​​is the possibility of summing their levels.

3. By distance between dates and time intervals:

Ш complete (equally spaced) - when the dates of registration or end of periods follow each other at equal intervals.

Ш incomplete (not equally spaced) - when the principle of equal intervals is not observed.

4. Depending on the presence of the main trend:

Ш stationary series - in which the mean and variance are constant.

Ш non-stationary - containing the main development trend.

Time series analysis methods

Time series are studied for various purposes. In one series of cases, it may be sufficient to obtain a description of the characteristic features of the series, while in another series of cases it is necessary not only to predict the future values ​​of the time series, but also to control its behavior. The method of time series analysis is determined, on the one hand, by the goals of the analysis, and on the other hand, by the probabilistic nature of the formation of its values.

Time series analysis methods.

1. Spectral analysis. Allows you to find periodic components of a time series.

2. Correlation analysis. Allows you to find significant periodic dependencies and the corresponding delays (lags) both within one series (autocorrelation) and between several series. (cross-correlation)

3. Seasonal Box-Jenkins model. It is used when the time series contains a clearly expressed linear trend and seasonal components. Allows you to predict future values ​​of a series. The model was proposed in connection with the analysis of air transportation.

4. Forecast using an exponentially weighted moving average. The simplest time series forecasting model. Applicable in many cases. This includes a pricing model based on random walks.

Target spectral analysis- decompose the series into functions of sines and cosines of various frequencies, to determine those whose appearance is especially significant and significant. One possible way to do this is to solve a linear multiple regression problem, where the dependent variable is the observed time series and the independent variables or regressors are functions of the sines of all possible (discrete) frequencies. Such a linear multiple regression model can be written as:

x t = a 0 + (for k = 1 to q)

The next general concept of classical harmonic analysis in this equation is (lambda) - this is the circular frequency expressed in radians per unit time, i.e. = 2** k, where is the constant pi = 3.1416 and k = k/q. It is important to realize here that the computational problem of fitting sine and cosine functions of different lengths to data can be solved using multiple linear regression. Note that the coefficients a k for cosines and the coefficients b k for sines are regression coefficients that indicate the degree to which the corresponding functions are correlated with the data. There are q different sines and cosines; It is intuitively clear that the number of functions of sines and cosines cannot be greater than the number of data in the series. Without going into details, we note that if n is the amount of data, then there will be n/2+1 cosine functions and n/2-1 sine functions. In other words, there will be as many different sine waves as there are data, and you will be able to completely reproduce the series according to the main functions.

As a result, spectral analysis determines the correlation of sine and cosine functions of various frequencies with the observed data. If the correlation found (the coefficient at a certain sine or cosine) is large, then we can conclude that there is a strong periodicity at the corresponding frequency in the data.

Analysis distributed lags is a special method for estimating the lagged relationship between series. For example, suppose you produce computer programs and want to establish a relationship between the number of requests received from customers and the number of actual orders. You could record this data monthly for a year and then look at the relationship between two variables: the number of requests and the number of orders depends on the requests, but depends on a lag. However, it is clear that requests precede orders, so we can expect that the number of orders. In other words, there is a time shift (lag) in the relationship between the number of requests and the number of sales (see also autocorrelations and cross-correlations).

Dependencies of this kind with lags arise especially often in econometrics. For example, the income from investments in new equipment will not clearly appear immediately, but only after a certain time. Higher income changes people's housing choices; however, this dependence obviously also manifests itself with a delay.

In all these cases, there is an independent or explanatory variable that affects the dependent variables with some delay (lag). The distributed lag method allows one to study this kind of dependence.

General model

Let y be the dependent variable and let x be the independent or explanatory variable. These variables are measured several times over a period of time. In some econometrics textbooks, the dependent variable is also called an endogenous variable, and the dependent or explained variable is an exogenous variable. The simplest way to describe the relationship between these two variables is given by the following linear equation:

In this equation, the value of the dependent variable at time t is a linear function of the variable x measured at times t, t-1, t-2, etc. Thus, the dependent variable is a linear function of x and x shifted by 1, 2, etc. time periods. Beta coefficients (i) can be thought of as slope parameters in this equation. We will consider this equation as a special case of a linear regression equation. If the coefficient of a variable with a certain lag is significant, then we can conclude that the variable y is predicted (or explained) with a lag.

The parameter estimation and prediction procedures described in this section assume that the mathematical model of the process is known. In real data there are often no clearly defined regular components. Individual observations contain significant error, whereas you want to not only isolate the regular components, but also make a forecast. The ARIMA methodology developed by Box and Jenkins (1976) allows this to be done. This method is extremely popular in many applications, and practice has proven its power and flexibility (Hoff, 1983; Pankratz, 1983; Vandaele, 1983). However, due to its power and flexibility, ARIMA is a complex method. It is not easy to use and requires a lot of practice to master it. Although it often produces satisfactory results, they depend on the skill of the user (Bails and Peppers, 1982). The following sections will introduce you to its main ideas. For those interested in a concise, application-oriented (non-mathematical) introduction to ARIMA, we recommend McCleary, Meidinger, and Hay (1980).

ARIMA model

The general model proposed by Box and Jenkins (1976) includes both autoregressive and moving average parameters. Namely, there are three types of model parameters: auto regression parameters (p), difference order (d), moving average parameters (q). In Box and Jenkins notation, the model is written as ARI (p, d, q). For example, the model (0, 1, 2) contains 0 (zero) auto regression parameters (p) and 2 moving average parameters (q), which are calculated for the series after taking the difference with lag 1.

As noted earlier, the ARIMA model requires that the series be stationary, meaning that its mean is constant and the sample variance and autocorrelation do not change over time. Therefore, it is usually necessary to take the differences of the series until it becomes stationary (a logarithmic transformation is often also used to stabilize the variance). The number of differences that were taken to achieve stationarity is determined by the parameter d (see previous section). In order to determine the required order of the difference, you need to examine the series graph and autocorrelogram. Large changes in level (large jumps up or down) usually require taking a first-order non-seasonal difference (lag=1). Large changes in slope require taking a second order difference. The seasonal component requires taking the appropriate seasonal difference (see below). If there is a slow decrease in sample autocorrelation coefficients depending on the lag, the first order difference is usually taken. However, it should be remembered that for some time series it is necessary to take differences of a small order or not at all. Note that an excessive number of differences taken leads to less stable coefficient estimates.

At this stage (which is usually called identifying the model order, see below) you must also decide how many auto regression (p) and moving average (q) parameters should be present in an efficient and parsimonious process model. (Parsimony of a model means that it has the fewest number of parameters and the most degrees of freedom of any model that is fit to the data.) In practice it is very rare that the number of parameters p or q is greater than 2 (see below for a more complete discussion).

The next step after identification (Estimation) consists of estimating the model parameters (for which loss function minimization procedures are used, see below; more detailed information about minimization procedures is given in the Nonlinear Estimation section). The obtained parameter estimates are used at the last stage (Forecast) in order to calculate new values ​​of the series and construct a confidence interval for the forecast. The estimation process is carried out on the transformed data (subjected to the application of the difference operator). Before building a forecast, you need to perform the reverse operation (integrate the data). In this way, the forecast of the methodology will be compared with the corresponding input data. Data integration is indicated by the letter P in the general name of the model (ARMA = Auto Regression Integrated Moving Average).

Additionally, ARIMA models may contain a constant, the interpretation of which depends on the model being fitted. Namely, if (1) there are no auto regression parameters in the model, then the constant is the average value of the series, if (2) there are auto regression parameters, then the constant is a free term. If the difference of the series was taken, then the constant represents the mean or free term of the transformed series. For example, if the first difference (first order difference) was taken, and there are no auto regression parameters in the model, then the constant represents the average value of the transformed series and, therefore, the slope coefficient of the linear trend of the original one.

Exponential smoothing is a very popular method for forecasting many time series. Historically, the method was independently discovered by Brown and Holt.

Simple exponential smoothing

A simple and pragmatically clear time series model looks like this:

where b is a constant and (epsilon) is a random error. The constant b is relatively stable at each time interval, but can also change slowly over time. One intuitive way to extract b is to use moving average smoothing, in which the most recent observations are given greater weights than the second-to-last ones, the second-to-last ones are given greater weights than the next-to-last ones, and so on. This is exactly how the simple exponential works. Here, exponentially decreasing weights are assigned to older observations, and, unlike the moving average, all previous observations of the series are taken into account, and not those that fell within a certain window. The exact formula for simple exponential smoothing is as follows:

S t = *X t + (1-)*S t-1

When this formula is applied recursively, each new smoothed value (which is also a forecast) is calculated as the weighted average of the current observation and the smoothed series. Obviously, the smoothing result depends on the parameter (alpha). If equal to 1, then previous observations are completely ignored. If equal to 0, current observations are ignored. Values ​​between 0, 1 give intermediate results.

Empirical studies by Makridakis et al. (1982; Makridakis, 1983) have shown that quite often simple exponential smoothing gives a fairly accurate forecast.

Selecting the best parameter value (alpha)

Gardner (1985) discusses various theoretical and empirical arguments for choosing a particular smoothing parameter. Obviously, from the formula above, it follows that it must fall between 0 (zero) and 1 (although Brenner et al., 1968, for further application of ARIMA analysis consider that 0<<2). Gardner (1985) сообщает, что на практике обычно рекомендуется брать меньше.30. Однако в исследовании Makridakis et al., (1982), большее.30, часто дает лучший прогноз. После обзора литературы, Gardner (1985) приходит к выводу, что лучше оценивать оптимально по данным (см. ниже), чем просто "гадать" или использовать искусственные рекомендации.

Estimating the best value using data. In practice, the smoothing parameter is often found using a grid search. Possible parameter values ​​are divided into a grid with a certain step. For example, consider a grid of values ​​from = 0.1 to = 0.9, with a step of 0.1. It is then selected for which the sum of squares (or mean squares) of the residuals (observed values ​​minus step-forward predictions) is minimum.

Goodness of fit indices

The most direct way to evaluate a prediction based on a particular value is to plot the observed values ​​and the one-step-ahead predictions. This plot also includes residuals (plotted on the right Y-axis). The graph clearly shows in which areas the forecast is better or worse.

This visual check of forecast accuracy often gives the best results. There are also other measures of error that can be used to determine the optimal parameter (see Makridakis, Wheelwright, and McGee, 1983):

Average error. The average error (SE) is calculated by simply averaging the errors at each step. The obvious disadvantage of this measure is that positive and negative errors cancel each other out, so it is not a good indicator of forecast quality.

Average absolute error. Mean absolute error (MAE) is calculated as the average of absolute errors. If it is equal to 0 (zero), then we have a perfect fit (prediction). Compared to the mean squared error, this measure "doesn't give too much weight" to outliers.

Sum of squared errors (SSE), root mean square error. These values ​​are calculated as the sum (or mean) of squared errors. These are the most commonly used goodness of fit indices.

Relative error (RO). All previous measures used actual error values. It seems natural to express goodness-of-fit indices in terms of relative errors. For example, when forecasting monthly sales, which may fluctuate greatly (for example, seasonally) from month to month, you can be quite satisfied with the forecast if it has an accuracy of ?10%. In other words, when forecasting, the absolute error may not be as interesting as the relative one. To account for relative error, several different indices have been proposed (see Makridakis, Wheelwright, and McGee, 1983). In the first, the relative error is calculated as:

OO t = 100*(X t - F t)/X t

where X t is the observed value at time t, and F t is the forecast (smoothed value).

Average relative error (RME). This value is calculated as the average of the relative errors.

Mean absolute relative error (MAER). As with normal average error, negative and positive relative errors will cancel each other out. Therefore, to assess the quality of the fit as a whole (for the entire series), it is better to use the average absolute relative error. Often this measure is more expressive than the mean square error. For example, knowing that the forecast accuracy is ±5% is useful in itself, while the value of 30.8 for the mean square error cannot be so easily interpreted.

Automatic search for the best parameter. To minimize the mean square error, mean absolute error or mean absolute relative error, a quasi-Newtonian procedure (same as ARIMA) is used. In most cases, this procedure is more efficient than regular mesh search (especially if there are several smoothing parameters), and the optimal value can be quickly found.

The first smoothed value S 0 . If you look again at the formula for simple exponential smoothing, you will see that you must have a value of S 0 to calculate the first smoothed value (prediction). Depending on the choice of parameter (particularly if close to 0), the initial value of the smoothed process can have a significant impact on the forecast for many subsequent observations. As with other recommendations for using exponential smoothing, it is recommended to take the starting value that gives the best prediction. On the other hand, the influence of choice decreases with the length of the series and becomes uncritical with a large number of observations.

economic time series statistical

Conclusion

Time series analysis is a set of mathematical and statistical methods of analysis designed to identify the structure of time series and for their forecast. This includes, in particular, regression analysis methods. Identifying the structure of a time series is necessary in order to build a mathematical model of the phenomenon that is the source of the analyzed time series. Forecasting future values ​​of a time series is used for effective decision making.

Time series are studied for various purposes. The method of time series analysis is determined, on the one hand, by the goals of the analysis, and on the other hand, by the probabilistic nature of the formation of its values.

The main methods for studying time series are:

Ш Spectral analysis.

Ш Correlation analysis

Ш Seasonal Box-Jenkins model.

Ш Forecast by exponentially weighted moving average.

Literature

1. Bezruchko B. P., Smirnov D. A. Mathematical modeling and chaotic time series. -- Saratov: State Scientific Center "College", 2005. -- ISBN 5-94409-045-6

2. Blekhman I. I., Myshkis A. D., Panovko N. G., Applied mathematics: Subject, logic, features of approaches. With examples from mechanics: Textbook. -- 3rd ed., rev. and additional - M.: URSS, 2006. - 376 p. ISBN 5-484-00163-3

3. Introduction to mathematical modeling. Tutorial. Ed. P.V. Trusova. - M.: Logos, 2004. - ISBN 5-94010-272-7

4. Gorban A. N., Khlebopros R. G., Darwin’s Demon: The Idea of ​​Optimality and Natural Selection. -- M: Science. Chief ed. physics and mathematics lit., 1988. -- 208 p. (Problems of science and technical progress) ISBN 5-02-013901-7 (Chapter “Making models”).

5. Journal of Mathematical Modeling (founded in 1989)

6. Malkov S. Yu., 2004. Mathematical modeling of historical dynamics: approaches and models // Modeling of socio-political and economic dynamics / Ed. M. G. Dmitriev. - M.: RGSU. -- With. 76-188.

7. Myshkis A.D., Elements of the theory of mathematical models. -- 3rd ed., rev. -- M.: KomKniga, 2007. -- 192 with ISBN 978-5-484-00953-4

8. Samarsky A. A., Mikhailov A. P. Mathematical modeling. Ideas. Methods. Examples.. - 2nd ed., revised.. - M.: Fizmatlit, 2001. - ISBN 5-9221-0120-X

9. Sovetov B. Ya., Yakovlev S. A., Modeling of systems: Textbook. for universities - 3rd ed., revised. and additional - M.: Higher. school, 2001. -- 343 p. ISBN 5-06-003860-2

Posted on Allbest.ru

Similar documents

    Concept and main stages of forecast development. Problems of time series analysis. Assessment of the state and trends in the development of forecasting based on the analysis of time series SU-167 of JSC Mozyrpromstroy, practical recommendations for its improvement.

    course work, added 07/01/2013

    Methodology for analyzing time series of socio-economic phenomena. Components that form levels in the analysis of time series. Procedure for compiling a model of exports and imports of the Netherlands. Levels of autocorrelation. Correlation of time series.

    course work, added 05/13/2010

    Methods for analyzing the structure of time series containing seasonal fluctuations. Consideration of the moving average method approach and construction of an additive (or multiplicative) time series model. Calculation of seasonal component estimates in a multiplicative model.

    test, added 02/12/2015

    Analysis of a system of indicators characterizing both the adequacy of the model and its accuracy; determination of absolute and average forecast errors. Basic indicators of the dynamics of economic phenomena, the use of average values ​​for smoothing time series.

    test, added 08/13/2010

    The essence and distinctive features of statistical methods of analysis: statistical observation, grouping, analysis of time series, index, sample. The procedure for analyzing time series, analyzing the main development trend in time series.

    course work, added 03/09/2010

    Conducting an experimental statistical study of socio-economic phenomena and processes in the Smolensk region based on specified indicators. Construction of statistical graphs, distribution series, variation series, their generalization and evaluation.

    course work, added 03/15/2011

    Types of time series. Requirements for initial information. Descriptive characteristics of the dynamics of socio-economic phenomena. Forecasting using the exponential average method. Main indicators of the dynamics of economic indicators.

    test, added 03/02/2012

    The concept and meaning of a time series in statistics, its structure and main elements, meaning. Classification and types of time series, features of the scope of their application, distinctive characteristics and the procedure for determining the dynamics, stages, series in them.

    test, added 03/13/2010

    Definition of the concept of prices for products and services; principles of their registration. Calculation of individual and general indices of the cost of goods. The essence of basic methods of socio-economic research - structural averages, distribution series and dynamics series.

    course work, added 05/12/2011

    Machine learning and statistical methods for data analysis. Assessment of forecasting accuracy. Pre-processing of data. Methods of classification, regression and time series analysis. Nearest neighbors, support vector machines, rectifying space methods.

The purpose of time series analysis is usually to construct a mathematical model of the series, with the help of which one can explain its behavior and make a forecast for a certain period of time. Time series analysis includes the following main steps.

Analysis of a time series usually begins with the construction and study of its graph.

If the non-stationary nature of a time series is obvious, then the first step is to isolate and remove the non-stationary component of the series. The process of removing a trend and other components of a series that lead to a violation of stationarity can take place in several stages. Each of them examines a series of residuals obtained by subtracting a selected trend model from the original series, or the result of difference and other transformations of the series. In addition to graphs, signs of nonstationarity of a time series can be indicated by an autocorrelation function that does not tend to zero (with the exception of very large lag values).

Selection of a model for a time series. After the initial process is as close as possible to stationary, you can begin to select various models of the resulting process. The purpose of this stage is to describe and take into account in further analysis the correlation structure of the process under consideration. In practice, parametric autoregressive moving average models (ARIMA models) are most often used.

A model can be considered fitted if the residual component of the series is a “white noise” type process, when the residuals are distributed according to a normal law with a sample mean equal to 0. After fitting a model, the following are usually performed:

    assessment of the dispersion of the residuals, which can later be used to construct confidence intervals for the forecast;

    analysis of residuals to check the adequacy of the model.

Forecasting and interpolation. The last stage of time series analysis can be forecasting its future (extrapolation) or restoring missing (interpolation) values ​​and indicating the accuracy of this forecast based on the selected model. It is not always possible to select a good mathematical model for a time series. Ambiguity in the selection of a model can be observed both at the stage of isolating the deterministic component of a series and when choosing the structure of a series of residues. Therefore, researchers quite often resort to the method of several forecasts made using different models.

Methods of analysis. The following methods are commonly used in time series analysis:

    graphical methods for presenting time series and their accompanying numerical characteristics;

    methods of reduction to stationary processes: detrending, moving average models and autoregression;

    methods for studying internal connections between elements of time series.

3.5. Graphical methods for time series analysis

Why are graphical methods needed? In sample studies, the simplest numerical characteristics of descriptive statistics (mean, median, variance, standard deviation) usually provide a fairly informative picture of the sample. Graphic methods for presenting and analyzing samples play only a supporting role, allowing a better understanding of the localization and concentration of data, their distribution law.

The role of graphical methods in time series analysis is completely different. The fact is that a tabular presentation of a time series and descriptive statistics most often do not allow one to understand the nature of the process, while quite a lot of conclusions can be drawn from a time series graph. In the future, they can be checked and refined using calculations.

When analyzing the graphs, you can fairly confidently determine:

    presence of a trend and its nature;

    the presence of seasonal and cyclical components;

    the degree of smoothness or discontinuity of changes in successive values ​​of a series after the trend has been eliminated. By this indicator one can judge the nature and magnitude of the correlation between neighboring elements of the series.

Construction and study of a graph. Drawing a time series graph is not at all as simple a task as it seems at first glance. The modern level of time series analysis involves the use of one or another computer program to construct their graphs and all subsequent analysis. Most statistical packages and spreadsheets are equipped with some method of setting up the optimal presentation of a time series, but even when using them, various problems can arise, for example:

    due to the limited resolution of computer screens, the size of the displayed graphs may also be limited;

    with large volumes of analyzed series, points on the screen representing observations of the time series may turn into a solid black stripe.

Various methods are used to combat these difficulties. The presence of a “magnifying glass” or “magnification” mode in the graphical procedure allows you to depict a larger selected part of the series, but in this case it becomes difficult to judge the nature of the behavior of the series over the entire analyzed interval. You have to print out graphs for individual parts of the series and join them together to see the picture of the behavior of the series as a whole. Sometimes used to improve the reproduction of long rows thinning, that is, selecting and displaying every second, fifth, tenth, etc. on the chart. time series points. This procedure maintains a holistic view of the series and is useful for detecting trends. In practice, a combination of both procedures is useful: breaking the series into parts and thinning, since they allow one to determine the characteristics of the behavior of the time series.

Another problem when reproducing graphs is created by emissions– observations that are several times larger in magnitude than most other values ​​in the series. Their presence also leads to the indistinguishability of fluctuations in the time series, since the program automatically selects the image scale so that all observations fit on the screen. Selecting a different scale on the y-axis eliminates this problem, but sharply different observations remain off-screen.

Auxiliary graphics. When analyzing time series, auxiliary graphs are often used for the numerical characteristics of the series:

    graph of a sample autocorrelation function (correlogram) with a confidence zone (tube) for a zero autocorrelation function;

    plot of the sample partial autocorrelation function with a confidence zone for the zero partial autocorrelation function;

    periodogram graph.

The first two of these graphs make it possible to judge the relationship (dependence) of neighboring values ​​of the time rad; they are used in the selection of parametric models of autoregression and moving average. The periodogram graph allows one to judge the presence of harmonic components in a time series.

02/16/15 Viktor Gavrilov

44859 0

A time series is a sequence of values ​​that change over time. I will try to talk about some simple but effective approaches to working with such sequences in this article. There are many examples of such data - currency quotes, sales volumes, customer requests, data in various applied sciences (sociology, meteorology, geology, observations in physics) and much more.

Series are a common and important form of describing data, as they allow us to observe the entire history of changes in the value of interest to us. This gives us the opportunity to judge the “typical” behavior of a quantity and deviations from such behavior.

I was faced with the task of choosing a data set on which it would be possible to clearly demonstrate the features of time series. I decided to use international airline passenger traffic statistics because this data set is very clear and has become somewhat of a standard (http://robjhyndman.com/tsdldata/data/airpass.dat, source Time Series Data Library, R. J. Hyndman). The series describes the number of international airline passengers per month (in thousands) for the period 1949 to 1960.

Since I always have at hand, which has an interesting tool “” for working with rows, I will use it. Before importing data into a file, you need to add a column with a date so that the values ​​are tied to time, and a column with the name of the series for each observation. Below you can see what my source file looks like, which I imported into Prognoz Platform using the Import Wizard directly from the time series analysis tool.

The first thing we usually do with a time series is plot it on a graph. Prognoz Platform allows you to build a chart by simply dragging a series into the workbook.

Time series on a chart

The symbol ‘M’ at the end of the series name means that the series has monthly dynamics (the interval between observations is one month).

Already from the graph we see that the series demonstrates two features:

  • trend– on our chart this is a long-term increase in the observed values. It can be seen that the trend is almost linear.
  • seasonality– on the graph these are periodic fluctuations in value. In the next article on the topic of time series, we will learn how we can calculate the period.

Our series is quite “neat”, however, there are often series that, in addition to the two characteristics described above, demonstrate another one - the presence of “noise”, i.e. random variations in one form or another. An example of such a series can be seen in the chart below. This is a sine wave mixed with a random variable.

When analyzing series, we are interested in identifying their structure and assessing all the main components - trend, seasonality, noise and other features, as well as the ability to make forecasts of changes in value in future periods.

When working with series, the presence of noise often makes it difficult to analyze the structure of the series. To eliminate its influence and better see the structure of the series, you can use series smoothing methods.

The simplest method of smoothing series is a moving average. The idea is that for any odd number of points in the series sequence, replace the central point with the arithmetic mean of the remaining points:

Where x i– initial row, s i– smoothed series.

Below you can see the result of applying this algorithm to our two series. By default, Prognoz Platform suggests using anti-aliasing with a window size of 5 points ( k in our formula above it will be equal to 2). Please note that the smoothed signal is no longer so affected by noise, but along with the noise, naturally, some useful information about the dynamics of the series also disappears. It is also clear that the smoothed series lacks the first (and also the last) k points. This is due to the fact that smoothing is performed on the central point of the window (in our case, the third point), after which the window is shifted by one point, and the calculations are repeated. For the second, random series, I used smoothing with a window of 30 to better identify the structure of the series, since the series is “high-frequency” with a lot of points.

The moving average method has certain disadvantages:

  • A moving average is inefficient to calculate. For each point, the average must be recalculated anew. We cannot reuse the result calculated for a previous point.
  • The moving average cannot be extended to the first and last points of the series. This can cause a problem if these are the points we are interested in.
  • The moving average is not defined outside the series, and as a result, cannot be used for forecasting.

Exponential smoothing

A more advanced smoothing method that can also be used for forecasting is exponential smoothing, also sometimes called the Holt-Winters method after its creators.

There are several variations of this method:

  • single smoothing for series that have no trend or seasonality;
  • double smoothing for series that have a trend, but no seasonality;
  • triple smoothing for series that have both a trend and seasonality.

The exponential smoothing method calculates the values ​​of a smoothed series by updating the values ​​calculated in the previous step using information from the current step. Information from the previous and current steps is taken with different weights that can be controlled.

In the simplest version of single smoothing, the ratio is:

Parameter α defines the relationship between the unsmoothed value at the current step and the smoothed value from the previous step. At α =1 we will take only the points of the original series, i.e. there will be no smoothing. At α =0 row we will take only smoothed values ​​from previous steps, i.e. the series will become a constant.

To understand why smoothing is called exponential, we need to expand the relationship recursively:

It is clear from the relationship that all previous values ​​of the series contribute to the current smoothed value, but their contribution fades exponentially due to an increase in the degree of the parameter α .

However, if there is a trend in the data, simple smoothing will “lag” behind it (or you will have to take the values α close to 1, but then the smoothing will be insufficient). You need to use double exponential smoothing.

Double smoothing already uses two equations - one equation evaluates the trend as the difference between the current and previous smoothed values, then smoothes the trend with simple smoothing. The second equation performs smoothing as in the simple case, but the second term uses the sum of the previous smoothed value and the trend.

Triple smoothing includes another component - seasonality, and uses another equation. In this case, there are two variants of the seasonal component – ​​additive and multiplicative. In the first case, the amplitude of the seasonal component is constant and does not depend over time on the base amplitude of the series. In the second case, the amplitude changes along with the change in the base amplitude of the series. This is exactly our case, as can be seen from the graph. As the series grows, the amplitude of seasonal fluctuations increases.

Since our first row has both a trend and seasonality, I decided to select triple smoothing parameters for it. In Prognoz Platform, this is quite easy to do, because when the parameter value is updated, the platform immediately redraws the graph of the smoothed series, and visually you can immediately see how well it describes our original series. I settled on the following values:

We will look at how I calculated the period in the next article on time series.

Typically, values ​​between 0.2 and 0.4 can be considered as first approximations. Prognoz Platform also uses a model with an additional parameter ɸ , which dampens the trend so that it approaches a constant in the future. For ɸ I took the value 1, which corresponds to the normal model.

I also made a forecast of the series values ​​using this method for the last 2 years. In the figure below, I marked the starting point of the forecast by drawing a line through it. As you can see, the original series and the smoothed one coincide quite well, including during the forecasting period - not bad for such a simple method!

Prognoz Platform also allows you to automatically select optimal parameter values ​​using a systematic search in the space of parameter values ​​and minimizing the sum of squared deviations of the smoothed series from the original one.

The methods described are very simple, easy to apply, and provide a good starting point for analyzing the structure and forecasting of time series.

Read more about time series in the next article.

Types and methods of time series analysis

A time series is a collection of sequential measurements of a variable taken at equal time intervals. Time series analysis allows you to solve the following problems:

  • explore the structure of a time series, which, as a rule, includes a trend - regular changes in the average level, as well as random periodic fluctuations;
  • explore cause-and-effect relationships between processes that determine changes in series, which manifest themselves in correlations between time series;
  • build a mathematical model of the process represented by a time series;
  • transform the time series using smoothing and filtering tools;
  • predict the future development of the process.

A significant part of the known methods are intended for the analysis of stationary processes, the statistical properties of which, characterized by a normal distribution by the mean value and variance, are constant and do not change over time.

But the series often have a non-stationary character. Non-stationarity can be eliminated as follows:

  • subtract the trend, i.e. changes in the average value, represented by some deterministic function that can be selected by regression analysis;
  • perform filtering with a special non-stationary filter.

To standardize time series for uniformity of methods

analysis, it is advisable to carry out their general or seasonal centering by dividing by the average value, as well as normalization by dividing by the standard deviation.

Centering a series removes a non-zero mean that can make the results difficult to interpret, for example in spectral analysis. The purpose of normalization is to avoid operations with large numbers in calculations, which can lead to a decrease in the accuracy of calculations.

After these preliminary transformations of the time series, its mathematical model can be built, according to which forecasting is carried out, i.e. Some continuation of the time series was obtained.

In order for the forecast result to be compared with the original data, transformations that are inverse to those performed must be made on it.

In practice, modeling and forecasting methods are most often used, and correlation and spectral analysis are considered as auxiliary methods. It's a delusion. Methods for forecasting the development of average trends make it possible to obtain estimates with significant errors, which makes it very difficult to predict the future values ​​of a variable represented by a time series.

Methods of correlation and spectral analysis make it possible to identify various, including inertial, properties of the system in which the processes under study are developing. The use of these methods makes it possible to determine with sufficient confidence from the current dynamics of processes how and with what delay the known dynamics will affect the future development of processes. For long-term forecasting, these types of analyzes provide valuable results.

Trend analysis and forecasting

Trend analysis is intended to study changes in the average value of a time series with the construction of a mathematical model of the trend and, on this basis, forecasting future values ​​of the series. Trend analysis is performed by constructing simple linear or nonlinear regression models.

The initial data used are two variables, one of which is the values ​​of the time parameter, and the other is the actual values ​​of the time series. During the analysis process you can:

  • test several mathematical trend models and choose the one that more accurately describes the dynamics of the series;
  • build a forecast of the future behavior of the time series based on the selected trend model with a certain confidence probability;
  • remove the trend from the time series in order to ensure its stationarity, necessary for correlation and spectral analysis; for this, after calculating the regression model, it is necessary to save the residuals to perform the analysis.

Various functions and combinations are used as trend models, as well as power series, sometimes called polynomial models. The greatest accuracy is provided by models in the form of Fourier series, but not many statistical packages allow the use of such models.

Let us illustrate the derivation of a series trend model. We use a series of data on US gross national product for the period 1929-1978. at current prices. Let's build a polynomial regression model. The accuracy of the model increased until the degree of the polynomial reached the fifth:

Y = 145.6 - 35.67* + 4.59* 2 - 0.189* 3 + 0.00353x 4 + 0.000024* 5,

(14,9) (5,73) (0,68) (0,033) (0,00072) (0,0000056)

Where U - GNP, billion dollars;

* - years counted from the first year 1929;

Below the coefficients are their standard errors.

The standard errors of the model coefficients are small, not reaching values ​​equal to half the values ​​of the model coefficients. This indicates the good quality of the model.

The coefficient of determination of the model, equal to the square of the reduced multiple correlation coefficient, was 99%. This means that the model explains 99% of the data. The standard error of the model turned out to be 14.7 billion, and the significance level of the null hypothesis - the hypothesis of no connection - was less than 0.1%.

Using the resulting model, it is possible to give a forecast, which, in comparison with actual data, is given in Table. PZ. 1.

Forecast and actual size of US GNP, billion dollars.

Table PZ.1

The forecast obtained using the polynomial model is not very accurate, as evidenced by the data presented in the table.

Correlation analysis

Correlation analysis is necessary to identify correlations and their lags - delays in their periodicity. Communication in one process is called autocorrelation, and the connection between two processes characterized by series - cross-correlations. A high level of correlation can serve as an indicator of cause-and-effect relationships, interactions within one process, between two processes, and the lag value indicates a time delay in the transmission of interaction.

Typically, in the process of calculating the values ​​of the correlation function on To The th step calculates the correlation between the variables along the length of the segment / = 1,..., (p - k) first row X and the segment / = To,..., P second row K The length of the segments thus changes.

The result is a value that is difficult for practical interpretation, reminiscent of the parametric correlation coefficient, but not identical to it. Therefore, the possibilities of correlation analysis, the methodology of which is used in many statistical packages, are limited to a narrow range of classes of time series, which are not typical for most economic processes.

Economists in correlation analysis are interested in studying lags in the transfer of influence from one process to another or the influence of an initial disturbance on the subsequent development of the same process. To solve such problems, a modification of the known method was proposed, called interval correlation".

Kulaichev A.P. Methods and tools for data analysis in the Windows environment. - M.: Informatics and computers, 2003.

The interval correlation function is a sequence of correlation coefficients calculated between a fixed segment of the first row of a given size and position and equal-sized segments of the second row, selected with successive shifts from the beginning of the series.

Two new parameters are added to the definition: the length of the shifted fragment of the series and its initial position, and the definition of the Pearson correlation coefficient accepted in mathematical statistics is also used. This makes the calculated values ​​comparable and easy to interpret.

Typically, to perform an analysis, it is necessary to select one or two variables for autocorrelation or cross-correlation analysis, and also set the following parameters:

Dimension of the time step of the analyzed series for matching

results with a real timeline;

The length of the shifted fragment of the first row, in the form of the number included in

of the elements of the series;

The shift of this fragment relative to the beginning of the row.

Of course, it is necessary to choose the option of interval correlation or another correlation function.

If one variable is selected for analysis, then the values ​​of the autocorrelation function are calculated for successively increasing lags. The autocorrelation function allows us to determine to what extent the dynamics of changes in a given fragment are reproduced in its own segments shifted in time.

If two variables are selected for analysis, then the values ​​of the cross-correlation function are calculated for successively increasing lags - shifts of the second of the selected variables relative to the first. The cross-correlation function allows us to determine to what extent changes in the fragment of the first row are reproduced in fragments of the second row shifted in time.

The results of the analysis should include estimates of the critical value of the correlation coefficient g 0 for a hypothesis "r 0= 0" at a certain significance level. This allows you to ignore statistically insignificant correlation coefficients. It is necessary to obtain the values ​​of the correlation function indicating the lags. Graphs of auto- or cross-correlation functions are very useful and visual.

Let us illustrate the use of cross-correlation analysis with an example. Let us evaluate the relationship between the growth rates of GNP of the USA and the USSR over the 60 years from 1930 to 1979. To obtain characteristics of long-term trends, the shifted fragment of the series was chosen to be 25 years long. As a result, correlation coefficients were obtained for different lags.

The only lag at which the correlation turns out to be significant is 28 years. The correlation coefficient at this lag is 0.67, while the threshold, minimum value is 0.36. It turns out that the cyclicality of the long-term development of the USSR economy with a lag of 28 years was closely related to the cyclicality of the long-term development of the US economy.

Spectral analysis

A common way to analyze the structure of stationary time series is to use the discrete Fourier transform to estimate the spectral density or spectrum of the series. This method can be used:

  • to obtain descriptive statistics of one time series or descriptive statistics of dependencies between two time series;
  • to identify periodic and quasiperiodic properties of series;
  • to check the adequacy of models built by other methods;
  • for compressed data presentation;
  • to interpolate the dynamics of time series.

The accuracy of spectral analysis estimates can be increased through the use of special methods - the use of smoothing windows and averaging methods.

For analysis, you must select one or two variables, and the following parameters must be specified:

  • the dimension of the time step of the analyzed series, necessary to coordinate the results with the real time and frequency scales;
  • length To the analyzed segment of the time series, in the form of the number of data included in it;
  • shift of the next segment of the row to 0 relative to the previous one;
  • type of smoothing time window to suppress the so-called power leakage effect;
  • a type of averaging of frequency characteristics calculated over successive segments of a time series.

The results of the analysis include spectrograms - values ​​of amplitude-frequency spectrum characteristics and values ​​of phase-frequency characteristics. In the case of cross-spectral analysis, the results are also the values ​​of the transfer function and the spectrum coherence function. The results of the analysis may also include periodogram data.

The amplitude-frequency characteristic of the cross-spectrum, also called cross-spectral density, represents the dependence of the amplitude of the mutual spectrum of two interconnected processes on frequency. This characteristic clearly shows at what frequencies synchronous and corresponding in magnitude changes in power are observed in the two analyzed time series or where the areas of their maximum coincidences and maximum discrepancies are located.

Let us illustrate the use of spectral analysis with an example. Let us analyze the waves of economic conditions in Europe during the period of the beginning of industrial development. For the analysis, we use an unsmoothed time series of wheat price indices averaged by Beveridge based on data from 40 European markets over 370 years from 1500 to 1869. We obtain the spectra

series and its individual segments lasting 100 years every 25 years.

Spectral analysis allows you to estimate the power of each harmonic in the spectrum. The most powerful are the waves with a 50-year period, which, as is known, were discovered by N. Kondratiev 1 and received his name. The analysis allows us to establish that they were not formed at the end of the 17th - beginning of the 19th centuries, as many economists believe. They were formed from 1725 to 1775.

Construction of autoregressive and integrated moving average models ( ARIMA) are considered useful for describing and forecasting stationary time series and nonstationary series that exhibit uniform fluctuations around a changing mean.

Models ARIMA are combinations of two models: autoregression (AR) and moving average (moving average - MA).

Moving Average Models (MA) represent a stationary process as a linear combination of successive values ​​of the so-called “white noise”. Such models turn out to be useful both as independent descriptions of stationary processes and as an addition to autoregressive models for a more detailed description of the noise component.

Algorithms for calculating model parameters MA are very sensitive to the incorrect choice of the number of parameters for a specific time series, especially in the direction of their increase, which may result in a lack of convergence of calculations. It is recommended not to select a moving average model with a large number of parameters at the initial stages of analysis.

Preliminary assessment - the first stage of analysis using the model ARIMA. The preliminary assessment process is terminated upon acceptance of the hypothesis about the adequacy of the model to the time series or upon exhaustion of the permissible number of parameters. As a result, the results of the analysis include:

  • values ​​of parameters of the autoregressive model and the moving average model;
  • for each forecast step, the average forecast value, the standard error of the forecast, the confidence interval of the forecast for a certain level of significance are indicated;
  • statistics for assessing the significance level of the hypothesis of uncorrelated residuals;
  • time series plots indicating the standard error of the forecast.
  • A significant part of the materials in the PZ section is based on the provisions of the books: Basovsky L.E. Forecasting and planning in market conditions. - M.: INFRA-M, 2008. Gilmore R. Applied theory of disasters: In 2 books. Book 1/ Per. from English M.: Mir, 1984.
  • Jean Baptiste Joseph Fourier (Jean Baptiste Joseph Fourier; 1768-1830) - French mathematician and physicist.
  • Nikolai Dmitrievich Kondratiev (1892-1938) - Russian and Soviet economist.

TIME SERIES ANALYSIS


INTRODUCTION

CHAPTER 1. TIME SERIES ANALYSIS

1.1 TIME SERIES AND ITS BASIC ELEMENTS

1.2 AUTOCORRELATION OF TIME SERIES LEVELS AND IDENTIFICATION OF ITS STRUCTURE

1.3 TIME SERIES TREND MODELING

1.4 Least SQUARE METHOD

1.5 REDUCING THE TREND EQUATION TO A LINEAR FORM

1.6 ESTIMATION OF REGRESSION EQUATION PARAMETERS

1.7 ADDITIVE AND MULTIPLICATE TIME SERIES MODELS

1.8 STATIONARY TIME SERIES

1.9 APPLYING THE FAST FOURIER TRANSFORM TO A STATIONARY TIME SERIES

1.10 AUTOCORRELATION OF RESIDUALS. DURBIN-WATSON CRITERION

Introduction

In almost every field there are phenomena that are interesting and important to study in their development and change over time. In everyday life, for example, meteorological conditions, prices for a particular product, certain characteristics of an individual’s health status, etc. may be of interest. All of them change over time. Over time, business activity, the mode of a particular production process, the depth of a person’s sleep, and the perception of a television program change. The totality of measurements of any one characteristic of this kind over a certain period of time represents time series.

The set of existing methods for analyzing such series of observations is called time series analysis.

The main feature that distinguishes time series analysis from other types of statistical analysis is the importance of the order in which observations are made. If in many problems observations are statistically independent, then in time series they are, as a rule, dependent, and the nature of this dependence can be determined by the position of observations in the sequence. The nature of the series and the structure of the process generating the series can predetermine the order in which the sequence is formed.

Target The work consists in obtaining a model for a discrete time series in the time domain, which has maximum simplicity and a minimum number of parameters and at the same time adequately describes the observations.

Obtaining such a model is important for the following reasons:

1) it can help to understand the nature of the system generating time series;

2) control the process that generates the series;

3) it can be used to optimally predict future values ​​of time series;

Time series are best described non-stationary models, in which trends and other pseudo-stable characteristics, possibly changing over time, are considered statistical rather than deterministic phenomena. In addition, time series associated with the economy often have noticeable seasonal, or periodic, components; these components may vary over time and must be described by cyclic statistical (possibly non-stationary) models.

Let the observed time series be y 1 , y 2 , . . ., y n . We will understand this entry as follows. There are T numbers representing the observation of some variable at T equidistant moments in time. For convenience, these moments are numbered with integers 1, 2, . . .,T. A fairly general mathematical (statistical or probabilistic) model is a model of the form:

y t = f(t) + u t , t = 1, 2, . . ., T.

In this model, the observed series is considered as the sum of some completely deterministic sequence (f(t)), which can be called a mathematical component, and a random sequence (u t ), which obeys some probabilistic law. (And sometimes the terms signal and noise are used for these two components, respectively). These components of the observed series are unobservable; they are theoretical quantities. The exact meaning of this decomposition depends not only on the data themselves, but partly on what is meant by the repetition of the experiment from which these data are the result. The so-called “frequency” interpretation is used here. It is believed that, at least in principle, it is possible to repeat the entire situation, obtaining new sets of observations. Random components, among other things, may include observational errors.

This paper considers a time series model in which a random component is superimposed on the trend, forming a random stationary process. In such a model it is assumed that the passage of time does not affect the random component in any way. More precisely, it is assumed that the mathematical expectation (that is, the average value) of the random component is identically equal to zero, the variance is equal to some constant, and that the values ​​of u t at different times are uncorrelated. Thus, any time dependence is included in the systematic component f(t). The sequence f(t) may depend on some unknown coefficients and on known quantities that change over time. In this case, it is called the “regression function”. Statistical inference methods for regression function coefficients prove useful in many areas of statistics. The uniqueness of methods related specifically to time series is that they study those models in which the above-mentioned quantities that change over time are known functions of t.


Chapter 1. Time series analysis

1.1 Time series and its main elements

A time series is a collection of values ​​of any indicator for several consecutive moments or periods of time. Each level of a time series is formed under the influence of a large number of factors, which can be divided into three groups:

· factors shaping the trend of the series;

· factors that form cyclical fluctuations in the series;

· random factors.

With different combinations of these factors in the process or phenomenon under study, the dependence of the levels of the series on time can take different forms. Firstly, most time series of economic indicators have a trend that characterizes the long-term cumulative impact of many factors on the dynamics of the indicator being studied. It is obvious that these factors, taken separately, can have a multidirectional impact on the indicator under study. However, together they form an increasing or decreasing trend.

Secondly, the indicator being studied may be subject to cyclical fluctuations. These fluctuations may be seasonal, since the activities of a number of economic and agricultural sectors depend on the time of year. If large amounts of data are available over long periods of time, it is possible to identify cyclical fluctuations associated with the overall dynamics of the time series.

Some time series do not contain a trend or a cyclical component, and each subsequent level is formed as the sum of the average level of the series and some (positive or negative) random component.

In most cases, the actual level of a time series can be represented as the sum or product of trend, cyclical and random components. A model in which a time series is presented as the sum of the listed components is called additive model time series. A model in which a time series is presented as a product of the listed components is called multiplicative model time series. The main task of a statistical study of an individual time series is to identify and quantify each of the components listed above in order to use the information obtained to predict future values ​​of the series.

1.2 Autocorrelation of time series levels and identification of its structure

If there is a trend and cyclical fluctuations in a time series, the values ​​of each subsequent level of the series depend on the previous ones. The correlation dependence between successive levels of a time series is called autocorrelation of series levels.

It can be measured quantitatively using a linear correlation coefficient between the levels of the original time series and the levels of this series shifted by several steps in time.

One of the working formulas for calculating the autocorrelation coefficient is:

(1.2.1)

As a variable x, we will consider the series y 2, y 3, ..., y n; as a variable y – the series y 1, y 2, . . . ,y n – 1 . Then the above formula will take the form:

(1.2.2)

Similarly, autocorrelation coefficients of the second and higher orders can be determined. Thus, the second-order autocorrelation coefficient characterizes the closeness of the connection between the levels y t and y t – 1 and is determined by the formula

(1.2.3)

The number of periods for which the autocorrelation coefficient is calculated is called lagom. As the lag increases, the number of pairs of values ​​from which the autocorrelation coefficient is calculated decreases. Some authors consider it advisable to use the rule to ensure statistical reliability of autocorrelation coefficients - the maximum lag should be no more than (n/4).

Return

×
Join the “koon.ru” community!
In contact with:
I am already subscribed to the community “koon.ru”