How to do approximation in excel? Approximation of experimental data in Microsoft Excel.

Subscribe
Join the “koon.ru” community!
In contact with:

Decide approximation problem experimental data means constructing a regression equation. The approximation problem arises when it is necessary to describe analytically, that is, in the form of a mathematical relationship, real phenomena, observations of which are given in the form of a table containing the values ​​of the indicator at different points in time or at different meanings independent argument. For example,

Profit indicators are known (they can be denoted Y) depending on the size of capital investments ( X);

The company's sales volumes are known ( Y) for six weeks of her work. In this case, X is a sequence of weeks.

Sometimes they say that you need to build empirical model . Empirical called a model built on the basis of real observations. If a model can be found, a forecast can be made about the behavior of the phenomenon and process under study in the future and, possibly, the optimal direction of its development can be chosen.

In general approximation problem experimental data has the following staging :

Let the data obtained by practical means be known (duringn experiments or observations) that can be represented by pairs of numbers (X i; y i) . The relationship between them is reflected in the table:

X x 1 x 2 x 3 x n
Y y 1 y 2 y 3 y n

There is a class of various functionsF . It is required to find an analytical (i.e. mathematical) expression of the relationship between these indicators, that is, it is necessary to select the function F from the set of functionsf , such that . which the best way would smooth out the experimental dependence between variables and, as accurately as possible, reflect the general trend of the dependence betweenX And Y, excluding measurement errors and random deviations.

You can find out the type of function either from theoretical considerations or by analyzing the location of points (x i; y i) on the coordinate plane.

Solve graphically approximation problem means to draw a curve whose points (x i; ŷ i) would be as close as possible to the starting points (x i; y i) , displaying experimental data.

For solutions approximation problems use least square method .

In this case, the function is considered the best approximation to , if for it the sum of the squared deviations of the “theoretical” values ​​, found using the empirical formula, from the corresponding experimental values ​​, has smallest value in comparison with other functions from which the desired approximation is selected.

Mathematical notation The least squares method has the form:

Where n - number of indicator observations.

Thus, the approximation problem is divided into two parts.

First, the type of dependence and, accordingly, the type of empirical formula are established, that is, they decide whether it is linear, quadratic, logarithmic or some other. If there are no theoretical considerations for selecting the type of formula, a functional dependence is usually selected from among the simplest ones, comparing their graphs with the graph of a given function.

After this, the numerical values ​​of the unknown parameters of the selected empirical formula are determined, for which the approximation to the given function turns out to be the best.

The simplest type of empirical model with two parameters used to approximate experimental results is linear regression, described by linear function:

Where a, b - required parameters.

For a linear regression model, the least squares method (1) will be written:

To solve (2) with respect to a and b, the partial derivatives are set to zero:

As a result, to find a and b, it is necessary to solve a system of linear algebraic equations of the form:

(3)

There are various ways to implement the least squares method in the case of linear regression in Excel.

1 way. Construct a system of linear algebraic equations, substituting all known values ​​into (3), and solve it, for example, using the matrix method (see task 4).

In formula form, an element of the calculation table is shown in Fig. 26.

Method 2. Solve the optimization problem (2) in Excel using Finding a solution(see task 5).

Note 1. It should be noted that for the objective function S it is convenient to use the built-in mathematical function SUMQVAR(array1,array2), as a result of which the sum of the squares of the differences between the two arrays is calculated. In our case, we should indicate the range of initial values ​​as array1, and “theoretical” values, calculated by the formula, as array2, where a And b – these are the addresses of cells with the desired values.

Note 2. In the dialog box of the Search for a solution command, you should set the target cell, the direction of the goal - to the minimum and the cells to be changed (Fig. 28). This problem contains no restrictions.

Note3. Nonlinear models of the form can also be used as empirical models with two parameters:


The described method for solving the least squares method is also applicable for nonlinear dependencies.

3 way. To find parameter values a And b In the case of linear regression, you can use the following statistical functions built into Excel:

SLOPE(known_Values; known_Values_X)

CUT(known_values_Y, known_values_X)

LINEST (known_values_Y; known_values_X)

Moreover, the SLOPE() function returns the parameter value A , the function CUT() returns the parameter value b. The LINEST() function returns both parameters of a linear relationship at the same time, since it is an array function. Therefore, to enter the LINEST() function into a table, you must follow following rules:

Select two adjacent cells

· enter the formula

· When finished, press the key combination Ctrl+Shift+Enter at the same time.

As a result, the left cell will contain the parameter value A , and on the right - the value of the parameter b.

To solve the approximation problem graphically in Excel you need to build a graph based on the source data, for example, scatter plot with values ​​connected by smoothing lines (see task 1). Excel can plot this chart Trend line. A trend line can be added to any data series that uses following types charts: area charts, line charts, histograms, bar charts, or scatter charts.

When creating a trend line in Excel based on chart data, one or another approximation is used. Excel allows you to select one of five fitting lines or calculate a line showing a moving average.

In addition, Excel provides the ability to select the intersection values ​​of the trend line with the Y axis, as well as add an approximation equation and an approximation confidence value (R 2) to the chart. You can also determine future and past data values ​​based on the trend line and the associated approximation equation.

Polynomial approximation of a function continuous on a segment.

Approximation (from the Latin “approximate” - “to come closer”) is an approximate expression of any mathematical objects (for example, numbers or functions) through others that are simpler, more convenient to use, or simply better known. In scientific research, approximation is used to describe, analyze, generalize and further use empirical results.

As is known, there can be an exact (functional) connection between quantities, when one specific value corresponds to one value of the argument, and a less precise (correlation) connection, when one specific value of the argument corresponds to an approximate value or a certain set of function values, to one degree or another close to each other. When conducting scientific research When processing the results of an observation or experiment, one usually has to deal with the second option. When studying the quantitative dependencies of various indicators, the values ​​of which are determined empirically, as a rule, there is some variability. It is partly determined by the heterogeneity of the studied objects of inanimate and, especially, living nature, and is partly determined by the error of observation and quantitative processing of materials. The last component cannot always be completely eliminated; it can only be minimized by careful selection of an adequate research method and careful work. Therefore, when performing any research work, the problem arises of identifying the true nature of the dependence of the indicators being studied, this or that degree masked by the failure to take into account the variability of values. For this purpose, approximation is used - an approximate description of the correlation dependence of variables by a suitable equation functional dependence, conveying the main tendency of the dependence (or its “trend”).

When choosing an approximation, one should proceed from the specific research problem. Typically, the simpler the equation used for approximation, the more approximate the resulting description of the relationship.

Therefore, it is important to read how significant and what causes the deviations of specific values ​​from the resulting trend. When describing the dependence empirically certain values Much greater accuracy can be achieved by using some more complex, multi-parametric equation. However, there is no point in striving to convey random deviations of values ​​in specific series of empirical data with maximum accuracy. It is much more important to grasp the general pattern that in this case is most logically and with acceptable accuracy expressed precisely by the two-parameter equation power function. Thus, when choosing an approximation method, the researcher always makes a compromise: he decides to what extent in this case it is advisable and appropriate to “sacrifice” details and, accordingly, how generally the dependence of the compared variables should be expressed. Along with identifying patterns masked by random deviations of empirical data from general pattern, approximation also allows you to solve many other important problems: formalize the found dependence; find unknown values ​​of the dependent variable by interpolation or, if appropriate, extrapolation.

Here we will consider polynomial approximation. This means that our task is that, based on the initial data (function and segment), it is necessary to find a polynomial whose line deviation from the graph of the initial function will be minimal.

The most popular method of polynomial approximation is the least squares method. In Excel, this is implemented using a chart and a trend line.

Let's analyze this method in Excel.

Initial data:

First, we need to split this segment using the “Chebyshev” partition, because this type splitting always gives a more accurate result.

In column I (Fig. 1) we write numbers from 0 to 8, because We divide the segment into 8 parts.

In the z column, cells are calculated using the formula: COS(3.141593*I/8). To calculate each cell, we use the corresponding I.

We find the value of each x using the formula: 2*z + 1.

In the column F(x) we calculate the value of this function for each x.


Picture 1
Next, in cells H2, I2, J2, we set the initial values ​​of the coefficients a, b and c in the desired polynomial (Fig. 2).


Figure 2
In column F from cells 2 to 10 we calculate the deviation values, i.e. the modulus of the difference between the value of the initial function and the found polynomial.

Formula: ABS((1+x^2)^0.5+2^(-x)-($H$2*x^2+$I$2*x+$J$2)).

In cell B11 the sum of deviations is calculated, and in cell B12 the average deviation (Fig. 3).


Figure 3
Using the "Chart Wizard" we build a scatter plot based on the data in the x and F(x) columns. Now in the “Diagram” tab, select “Add trend line” and check the required box to show the equation on the chart (Fig. 4).


Figure 4
Now we substitute the coefficients from the resulting equation into cells H2, I2 and J2 (Fig. 5).


Figure 5
As you can see, the average deviation is 0.117006252.

Found polynomial: 0.363*x² - 0.6901*x + 2.2203.

Let us propose another method of polynomial approximation.

Open the "Service" tab and select "Search for solutions". In the window that appears, specify F11 as the target cell, and equal to the minimum value. In the “changing cells” field we indicate H2, I2 and J2.

Click the "Run" button. After performing the procedure, we see that the results have changed (Fig. 6).


Figure 6
This time the average deviation is 0.106084329.

Found polynomial: 0.35724*x² - 0.702*x + 2.259158.

This result is significantly more accurate than the previous one, which confirms the advantage of using minimization of the sum of deviations compared to the least squares method.

DEPENDENCIES

Excel has tools that allow you to predict processes. The approximation problem arises when it is necessary to analytically describe phenomena that take place in life and are given in the form of tables containing the values ​​of the argument (arguments) and functions. If the dependence can be found, it is possible to make a prediction about the behavior of the system under study in the future and, possibly, choose the optimal direction for its development. Such an analytical function (also called a trend) can have different type and varying levels of complexity depending on the complexity of the system and the desired accuracy of representation.

10.1. Linear regression

The simplest and most popular is straight line approximation - linear regression.

Let us have actual information about profit levels Y depending on the size of X investment - Y(X). In Fig. Figure 10.1-1 shows four such points M(Y,X). Let us also have reason to assume that this dependence is linear, i.e. looks like Y=A+BX. If we were able to find the coefficients A and B and use them to construct a straight line (for example, like the one in the figure), in the future we could make informed assumptions about the dynamics of the business and the possible commercial state of the enterprise in the future. Obviously, we would be satisfied with a straight line located as close as possible to the known points M(Y,X), i.e. having a minimum sum of deviations or sum of errors (in the figure, deviations are shown by dotted lines). It is known that there is only one such line.

To solve this problem, the least squares error method is used. The difference (error) between known value Y1 of the point M1(Y1,X1) and the value of Y(X1), calculated from the equation of the straight line for the same value of X1, will be

D1 = Y1 – A – B X1.

Same difference

for X=X2 will be D2 = Y2 – A – B X2;

for X=X3 D3 = Y3 – A – B X3;

and for X=X4 D4 = Y4 – A – B X4.

Let us write an expression for the sum of squares of these errors

Ф(A,В)=(Y1–A–B X1) 2 +(Y2–A–B X2) 2 +(Y3–A–B X3) 2 +(Y4–A–B X4) 2

or abbreviated Ф(B,A) = å(Yi – A – BXi) 2.

Here we know all X and Y and unknown coefficients A and B. Let us draw the desired straight line in such a way (i.e., choose A and B such) that this sum of squared errors Ф(A,B) is minimal. The conditions for minimality are the known relations

¶Ф(A,B)/¶A=0 and ¶Ф(A,B)/¶B=0.

Let us derive these expressions (we omit the subscripts at the sum sign):

¶[å(Yi–A–B Xi) 2 ]/¶A = å(Yi–A–B Xi)(–1)

¶[å(Yi–A–B Xi) 2 ]/¶B = å(Yi–A–B Xi)(–Xi).

Let us transform the resulting formulas and equate them to zero

Solving approximation problems using Excel

Doctor of Physics – Math. Sciences, Professor Gavrilenko V.V. assistant Parokhnenko L.M.

(National Transport University)

Theoretical information. In practice, when modeling various pro-

processes, in particular economic, physical, technical, social,

One or another method of calculating approximate values ​​is widely used.

of functions based on their known values ​​at some fixed points.

This kind of function approximation problem often arises:

when constructing approximate formulas for calculating the values ​​of the characteristics

values ​​of the process under study according to the table data obtained in real-time

the result of a physical or computational experiment;

with numerical integration, numerical differentiation, numerical

new decision differential equations etc.;

if it is necessary to calculate function values ​​at intermediate points,

kah of the interval under consideration;

when determining the values ​​of characteristic quantities of a process outside the limits of

the interval being viewed, in particular, if necessary, look into

“past”), that is, when determining the values ​​of process indicators to the present day

started observing;

in forecasting, that is, when receiving preliminary estimates will-

general values ​​of the process indicators of interest (opportunity to look

to the future").

If, to simulate a certain process specified by a table,

build a function that approximately describes a given process based on me-

least squares method, then it is called the approximating function

(regression), and the task itself of constructing approximating functions is called

This is an approximation problem.

This article discusses the capabilities of the Excel package when solving

research on approximation problems, namely, methods and techniques for constructing

(creating) regressions for tabulated functions, which is the basis of regression analysis.

Excel has the following features for building regressions:

1) adding selected regressions (trend lines) to a diagram built on the basis of a data table for the studied characteristic of the process (this tool can only be used if there is a constructed

2) Using Worksheet's Built-in Statistical Functions Excel,

allowing to obtain regressions (trend lines) based on the source table

data (the use of this tool is not pre-

provided with a corresponding diagram).

Adding trend lines to a chart

For a table of data describing a certain process and represented by a diagram, Excel has effective tool regression analysis

for, allowing:

∙ build on the basis of the least squares method and add five types of regressions (trend lines) to the diagram, which, with varying degrees of accuracy,

they model the process under study;

add the constructed regression equation to the diagram;

determine the degree of correspondence of the selected regression to the data displayed on the chart.

The constructed process models—trendlines—show

trend of data changes, make it possible to determine the values ​​of the study

given characteristics at intermediate points, predict the behavior of a given process in the future (extrapolation problem), and also look into its past.

Based on chart data, Excel allows you to obtain these types of regressions:

sy or trend lines, such as linear, polynomial, logarithmic, steri-

foam, exponential, which are given by the equation y = y(x) , where x is independent

a dependent variable, which often takes the values ​​of a sequence of natural numbers (1; 2; 3; ...) and produces, for example, a countdown of the time of the process under study.

1. Linear regression is good for modeling characteristics whose values ​​increase or decrease at a constant rate. This is the simplest to construct, but the least accurate model of the process under study.

y = m x + b,

where m is the angle of inclination of linear regression to the abscissa axis; b – coordinate of the point of intersection of linear regression with the ordinate.

2. A polynomial trend line is useful for describing characteristics

having several pronounced extremes (maxima and minimums)

mov). The choice of the degree of the polynomial trend line (polynomial) is determined by the number of extrema of the characteristic under study. Thus, a polynomial of the second degree can well describe a characteristic that has only one maximum

mum or minimum; polynomial of the third degree – no more than two extrema; By-

fourth degree linear – no more than three extrema, etc.

Constructed according to the equation

y = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 + c6 x6,

where the coefficients c 0 , c 1 , c 2 ,...c 6 are constants.

3. Logarithmic the trend line is successfully used in modeling

characteristics, the values ​​of which initially quickly increase or decrease in value, and then gradually stabilize.

Constructed according to the equation

y = c× ln(x)+ b,

4. A power trend line gives good results if the values ​​studied

The given dependence is characterized by a constant change in the growth rate.

An example of such a dependence is the graph of uniformly accelerated motion.

tion of the car. If there are zero or negative values ​​in the data, you cannot use a power trend line.

Constructed according to the equation

y = c×xb,

where coefficients b, c are constants.

5. Exponential A trend line should be used if the rate of change in the data is continuously increasing. For data containing zero or negative values, this type of approximation is not applicable.

Constructed according to the equation

y = c×eb×x,

where coefficients b, c are constants.

When selecting a trend line, Excel automatically calculates the value of R 2 , which characterizes the reliability of the approximation: the closer the value of R 2 is to unity, the more reliably the trend line approximates the study.

my process. If necessary, the R 2 value can always be displayed on

diagram.

Determined by the formula

R 2 = 1-

Σ1 = ∑(yj − Yj )2

S2 =∑Y j 2 -

× (∑Yj )2

To add a trend line to a data series:

1. Activate a diagram constructed based on a series of data, i.e. click-

within the diagram area. The Dia- item will appear in the main menu.

2. After clicking on this item, a menu will appear on the screen in which you should select the Add trend line command.

Mouse over to a graph built on a series of data and right-click

Hover your mouse and select Add from the context menu that appears.

trend line. The Trend Line dialog box will appear on the screen with

that Type tab (Fig. 1).

Fig.1. Type tab of the Trendline Format dialog box

3. Select the required trend line type on the Type tab (the Linear type is selected by default). For the Polynomial type, in the Degree of tracking field

You can specify the degree of the selected polynomial.

4. In the Built on series field, all data series of the chart in question are listed. To add a trend line to a specific data series, select its name in the Built on series field.

5. If necessary, by going to the Parameters tab (Fig. 2), you can

set a trend following parameters:

∙ Change the name of the trend line in the Approximating name field

(smoothed) curve;

∙ Set the number of periods (forward or backward) for the forecast in the Pro-

∙ Display the equation of the trend line in the diagram area, for which you should

Check the box for the “show equation on diagram” option.

∙ Display the value of the approximation reliability R 2 in the diagram area,

for which you should check the box for the option “place on the weight diagram”

the reliability of the approximation (R^2).”

∙ Set the point of intersection of the trend line with the Y axis, for which you should set

Select the checkbox for the option “intersection of the curve with the Y axis at the point: “. 6. Press the OK key.

Fig.2. Options tab of the Trendline dialog box

To edit a trend line that has already been drawn:

1. Left-click on the trend line you want

change.

2. Press the Format key in the main menu, and the context menu that appears

Nude select the command Selected trend line.

Points 1–2 can also be easily implemented next step: send an indication-

Mouse over the trend line graph, right-click, and in the context menu that appears, select the Format trend line command.

Steps 1–2 are even easier to implement: double-click the left mouse button on the trend line graph.

3. The Trend Line Format dialog box will appear on the screen (Fig. 3), containing

There are three tabs: View, Type, Parameters, and the contents of the tabs Type,

Parameters completely coincides with similar tabs of the dialog box -

on the Trend line (Fig. 1–2).

4. If necessary, by going to the View tab (Fig. 3), you can set the line type, color and thickness for the trend line.

5. Press the OK key.

To delete a trend line that has already been drawn, select the trend line to be deleted and press the Delete key.

The advantages of this regression analysis tool are:

∙ relative ease of constructing trend lines on charts without creating

creating a data table for it;

a fairly wide list of types of proposed trend lines, and this list includes the most commonly used regressions;

the ability to predict the behavior of the process under study in production

a free (within the limits of common sense) number of steps forward and also backward;

the ability to obtain the trend line equation in analytical form;

the possibility, if necessary, of obtaining an assessment of the reliability of the test

day approximation.

The disadvantages include the following:

the construction of a trend line is carried out only if there is a diagram constructed on a series of data;

the process of generating data series for research is somewhat cluttered

my characteristics based on the equations of the training lines obtained for it

yes, since the coefficients of these equations with each change in the values ​​of the

Yes, the data is recalculated, but only within the chart area;

∙ in PivotChart reports when you change the view of the chart or associated report pivot table existing trend lines are not saved,

that is, before drawing trend lines or otherwise formatting the summary report

charts, you should ensure that the report layout satisfies the required

our requirements.

Fig.3. View tab of the Format Trendline dialog box

Trend lines can be used to supplement the data series presented in the graph.

charts, histograms, flat non-standardized diagrams with areas, lines

line, scatter, bubble and stock charts.

It is impossible to supplement trend lines with volumetric, normalized data series.

bath, radar, pie and donut charts. When replacing the type of dia-

grams to one of the above, or when you change the view of a PivotChart report or associated PivotTable report accordingly.

The trend lines associated with the data will be lost.

Using Excel's built-in functions

Excel also has a regression analysis tool for plotting trend lines outside the chart area. There are a number of statistical worksheet functions you can use for this purpose, but all of them only allow you to build linear or exponential regressions.

Excel has several options for constructing linear regression (li-

linear trend), in particular:

using the TREND function;

using the LINEST function;

using the TILT and CUT functions.

Excel also has several options for constructing an exponential trend line, in particular:

using the GROWTH function;

using the LGRFPRIBL function.

It should be noted that techniques for constructing regressions using functions

TREND and GROWTH are almost the same. The same can be said about the pair of functions LINEST and LGRFPRIBL. For all these four functions, when creating a table of values, Excel features such as array formulas are used, which somewhat clutters the process of building a regression.

this. Note also that the construction (creation) of linear regression, in our opinion, is most easily accomplished using the SLOPE and INTERCEPT functions,

where the first one defines slope linear regression, and automatically

paradise – a segment cut off by regression on the ordinate axis.

The advantages of this regression analysis tool are:

∙ a fairly simple, uniform process for generating data series using

the following characteristics for all built-in statistical functions,

giving trend lines;

∙ standard methodology for constructing trend lines based on generated data series;

∙ the ability to predict the behavior of the process under study according to the required

the required number of steps forward or backward.

The disadvantages of this tool include the fact that Excel does not have built-in functions for creating others (except linear and exponential).

different) types of trend lines. This circumstance often does not allow choosing a sufficiently accurate model using the above built-in functions.

del of the process under study, and also obtain forecasts close to reality.

In addition, when using the TREND and GROWTH functions, the equations of the trend lines are not known.

t'i - on specific examples show the capabilities of the Excel package when solving

Research of approximation problems; demonstrate how effective tools

Excel has the tools for building regressions and forecasting; proil-

illustrate how relatively easily such problems can be solved even by a user who does not have deep knowledge of regression analysis.

The methodology proposed in the article for mastering the skills of solving a remedy is

we use Excel for this kind of problems (see also, where methods for solving systems of linear algebraic equations, nonlinear equations,

optimization problems, transport problems) may be useful and interesting to users. This is due to the fact that Excel is installed on almost every modern computer, while such well-known specialists

based mathematical packages such as Mathematica, Maple, Matlab, Mathcad,

having more powerful capabilities for building regressions and pro-

forecasting are used by significantly less user audit

Below are solutions to specific problems using the listed tools in the Excel package.

Task 1. For the table of data on the profit of a motor transport enterprise for 1995–2002. you need to do the following:

(Please note the additional section dated 06/04/2017 at the end of the article.)

Accounting and control! Those over 40 should remember well this slogan from the era of building socialism and communism in our country.

But without well-established accounting, the effective functioning of neither a country, nor a region, nor an enterprise, nor household for any socio-economic formation of society! To draw up forecasts and plans for activity and development, initial data is required. Where can I get them? Only one reliable source is yours statistical records of previous periods of time.

Take into account the results of your activities, collect and record information, process and analyze data, apply the results of analysis to make decisions right decisions in the future, in my understanding, every sane person should. This is nothing more than the accumulation and rational use of one’s life experience. If you do not keep records of important data, then after a certain period of time you will forget them and, when you start dealing with these issues again, you will again make the same mistakes that you made when you first did this.

“I remember that 5 years ago we produced up to 1000 pieces of such products per month, and now we can barely assemble 700!” We open the statistics and see that 5 years ago they didn’t even produce 500 pieces...

“How much does a kilometer of your car cost, taking into account everyone costs? Let’s open the statistics – 6 rubles/km. Trip to work – 107 rubles. Cheaper than taking a taxi (180 rubles) by more than one and a half times. And there were times when it was cheaper to take a taxi...

“How long does it take to fabricate the steel structures of a 50m high corner communications tower?” We open the statistics - and in 5 minutes the answer is ready...

“How much will it cost to renovate a room in an apartment?” We pull up old records, adjust for inflation over the past years, take into account that last time we bought materials 10% cheaper market price and – we already know the estimated cost...

Keeping records of your professional activity, you will always be ready to answer the boss’s question: “When!!!???”. By keeping household records, it is easier to plan expenses for large purchases, vacations and other expenses in the future, taking appropriate measures to earn additional income or to reduce unnecessary expenses today.

In this article I simple example I will show how you can process the collected statistical data in Excel for further use in forecasting future periods.

Approximation of statistical data in Excel with an analytical function.

The production site produces building metal structures from sheet and profile metal products. The site operates stably, orders are of the same type, the number of workers fluctuates slightly. There is data on product output for the previous 12 months and on the amount of rolled metal processed during these periods of time by group: sheets, I-beams, channels, angles, round pipes, profiles rectangular section, round hire. After a preliminary analysis of the initial data, an assumption arose that the total monthly production of metal structures significantly depends on the number of angles in orders. Let's check this assumption.

First of all, a few words about approximation. We will look for a law - an analytical function, that is, a function given by the equation, which better than others describes the dependence of the total output of metal structures on the amount of angle steel in completed orders. This is an approximation, and the found equation is called the approximating function for the original function, given in the form of a table.

1. Turn on Excel and place a table with statistics data on a sheet.

2. Next, we build and format a scatter plot, in which along the X axis we set the values ​​of the argument - the number of processed corners in tons. Along the Y axis we plot the values ​​of the original function - the total production of metal structures per month, specified in the table.

3. We “point” the mouse at any of the points on the chart and right-click to bring up the context menu (as one of my good friends says - when working in an unfamiliar program, when you don’t know what to do, click the right mouse button more often...). In the drop-down menu, select “Add trend line...”.

4. In the “Trend Line” window that appears, on the “Type” tab, select “Linear”.

6. A straight line appeared on the graph, approximating our table dependence.

In addition to the line itself, we see the equation of this line and, most importantly, we see the value of the parameter R 2 - the value of the reliability of the approximation! The closer its value is to 1, the more accurately the selected function approximates the tabular data!

7. We build trend lines using power, logarithmic, exponential and polynomial approximations in the same way as we built a linear trend line.

Of all the selected functions, a polynomial of the second degree approximates our data best; it has the maximum reliability coefficient R 2 .

However, I want to warn you! If you take polynomials of higher degrees, you might get even more top scores, but the curves will have a complicated appearance…. It is important to understand here that we are looking for a function that has physical meaning. What does this mean? This means that we need an approximating function that will produce adequate results not only within the considered range of X values, but also beyond it, that is, it will answer the question: “What will be the output of metal structures if the number of angles processed per month is less than 45 and more than 168 tons! Therefore, I do not recommend getting carried away with polynomials of high degrees, and choosing a parabola (polynomial of the second degree) carefully!

So, we need to choose a function that not only interpolates well the tabular data within the range of values ​​X = 45...168, but also allows adequate extrapolation outside this range. In this case, I choose a logarithmic function, although you can also choose a linear one, as it is the simplest. In the example under consideration, when choosing a linear approximation in Excel, the errors will be greater than when choosing a logarithmic one, but not by much.

8. We remove all trend lines from the chart field, except for the logarithmic function. To do this, right-click on unnecessary lines and select “Clear” from the context menu that appears.

9. Finally, we will add error bars to the tabular data points. To do this, right-click on any of the points on the graph and select “Format data series…” in the context menu and configure the data on the “Y-errors” tab as in the figure below.

10. Then right-click on any of the error range lines, select “Format error bars…” in the context menu and in the “Format error bars” window on the “View” tab, adjust the color and thickness of the lines.

Any other diagram objects are formatted in the same way.Excel!

The final result of the chart is shown in the following screenshot.

Results.

The result of all previous actions was the resulting formula for the approximating function y=-172.01*ln (x)+1188.2. Knowing it, and the number of corners in the monthly set of works, it is possible with a high degree of probability (±4% - see error bars) to predict the total production of metal structures for the month! For example, if the plan for the month is 140 tons of angles, then the total output, all other things being equal, will most likely be 338 ± 14 tons.

To increase the reliability of the approximation, there should be a lot of statistical data. Twelve pairs of values ​​is not enough.

From practice, I will say that finding an approximating function with a reliability coefficient R 2 >0.87 should be considered a good result. Excellent result– with R 2 >0.94.

In practice, it can be difficult to identify one most important determining factor (in our example, the mass of corners processed in a month), but if you try, you can always find it in each specific task! Of course, the total output for a month really depends on hundreds of factors, taking into account which requires significant labor costs from standard-setters and other specialists. But the result will still be approximate! So is it worth incurring the costs when there is much cheaper mathematical modeling!

In this article I have only touched the tip of the iceberg called collection, processing and practical use statistical data. I hope to find out whether I succeeded or not in stirring up your interest in this topic from the comments and ratings of the article in search engines.

The raised issue of approximation of a function of one variable has a wide range of practical use V different areas life. But the solution to the function approximation problem has much greater application several independent variables... Read about this and more in the following blog articles.

Subscribe to announcements of articles in the window located at the end of each article or in the window at the top of the page.

Do not forget confirm subscribe by clicking on the link in a letter that will come to you at the specified mail (may arrive in the folder « Spam » )!!!

I will read your comments with interest, dear readers! Write!

P.S. (06/04/2017)

Highly accurate, beautiful replacement of tabular data with a simple equation.

You are not satisfied with the obtained approximation accuracy (R 2<0,95) или вид и набор функций, предлагаемые MS Excel?

Are the dimensions of the expression and the line shape of the high degree approximating polynomial not pleasing to the eye?

Please refer to the "" page to obtain a more accurate and compact result of approximation of your tabular data and to learn a simple technique for solving problems of high-precision approximation by a function of one variable.

When using the proposed algorithm of actions, a very compact function was found that provides the highest accuracy of approximation: R 2 =0.9963!!!

Return

×
Join the “koon.ru” community!
In contact with:
I am already subscribed to the community “koon.ru”