Current location - Trademark Inquiry Complete Network - Futures platform - What methods are used in econometrics to measure the influencing factors of house prices?
What methods are used in econometrics to measure the influencing factors of house prices?
1. The theoretical model is designed to analyze the economic phenomena to be further studied. According to the purpose of the study, the factors to be included in the model are selected, and appropriate variables are selected to represent these factors according to the availability of data. According to the relationship between economic behavior theory and sample data, the mathematical expression describing the relationship between these variables, that is, the theoretical model, is set. For example, the production function in the previous section is a theoretical model. The design of theoretical model mainly includes three parts: selecting variables, determining the mathematical relationship between variables, and drawing up the numerical range of parameters to be estimated in the model. 1. Determines the variables contained in the model. In the single equation model, variables are divided into two categories. As the object of study, the "fruit" in causality, such as the output in the production function, is the explained variable in the model; Variables as "causes", such as capital, labor and technology in the production function, are explanatory variables in the model. Determining the variables contained in the model mainly refers to determining the explanatory variables. There are several variables that can be used as explanatory variables: exogenous economic variables, exogenous conditions variables, exogenous policy variables and lagging explanatory variables. Some of these variables, such as policy variables and conditional variables, often appear in the form of imaginary variables. Strictly speaking, he said, the output, capital, labor and technology in the above production functions can only be called "elements", and there is a causal relationship between these elements. In order to establish an econometric model, we must choose appropriate variables to represent these factors, and these variables must have data availability. Therefore, we can use the total output value to represent output, the original value of fixed assets to represent capital, the number of employees to represent labor, and time as a variable to represent technology. In this way, the final model is a mathematical expression about the relationship between total output value, original value of fixed assets, number of employees and time variables. For the convenience of description, we will omit the difference between "factor" and "variable" for the time being and use "variable" to represent them. The key is how to choose the explanatory variables correctly after determining the explained variables. First of all, we should correctly understand and grasp the economic theory and the law of economic behavior implied in the economic phenomenon studied. This is the basis of correctly selecting explanatory variables. For example, in the above production problems, it has been clearly pointed out that the supply is insufficient, so the factors affecting the output should be in the input factors. At present, the general input factors are mainly technology, capital and labor. If the demand is insufficient, then the factors that affect the output should be in the demand side, not in the input factor. At this time, if the research object is the production of consumer goods, we should choose variables such as residents' income as explanatory variables; If the research object is the production of means of production, variables such as total investment in fixed assets should be selected as explanatory variables. It can be seen that the same production model is established, and the choice of variables is different in different economic environments and different industries. Secondly, the availability of data should be considered when selecting variables. This requires a thorough understanding of economic statistics. Econometric model is based on the sample data, that is, the sample observation values of variables, and estimates parameters through certain mathematical methods, thus revealing the quantitative relationship between variables. Therefore, the selected variables must exist in the statistical index system and have reliable data sources. If it is necessary to introduce individual policy variables and conditional variables that have an important influence on the explained variables, the method of selecting the observed values of virtual variables samples is adopted. Third, when selecting variables, we should consider the relationship between all the selected variables, so that each explanatory variable is independent. This is required by econometric model technology. Of course, it was difficult to do this at first. If relevant variables appear in all selected variables, they can be tested and eliminated during the modeling process. From here, we can see that the first step of establishing the model has already embodied the idea that econometrics is a combination of economic theory, economic statistics and mathematics. When selecting variables, errors are easy to occur. The following examples are all found from the existing applied research results of econometrics, which represent several errors that are easy to occur. For example, the export value of agricultural and sideline products =- 107.66+0. 13× total retail sales of social goods 10.22× purchase amount of agricultural and sideline products. An irrelevant variable is selected here, because the total retail sales of social goods is not directly related to the export volume of agricultural and sideline products, and it is not the reason that affects the export volume of agricultural and sideline products. Another example is the import of means of production = 0.73× light industry investment +0.2 1× export value +0. 18× production consumption +67.60× import and export policy. This variable is not important, because light industry investment has an impact on the import of means of production, but it is not important or complete. What is important is the investment in fixed assets of the whole society, and this variable should be selected. Another example is the total agricultural output value = 0.78+0.24× grain output +0.05× agricultural machinery power -0.2 1× disaster area. The independent variable is chosen here because the grain yield is affected by the power of agricultural machinery and the affected area, and there is a correlation between them. It is worth noting that the above model can fit the sample data well, so the fitting degree of the sample data can never be used as the main criterion to judge whether the model variables are selected correctly. The selection of variables is not completed at one time, and it is often repeated many times. 2. Determine the mathematical form of the model, choose the appropriate variables, and then choose the appropriate mathematical form to describe the relationship between these variables, that is, establish a theoretical model. The main basis for choosing the mathematical form of the model is the economic behavior theory. In mathematical economics, the mathematical forms of common models such as production function, demand function, consumption function and investment function have been widely studied, and these research results can be used for reference. It should be pointed out that modern economics pays special attention to empirical research, and any theoretical model based on certain economic theoretical assumptions can't be accepted by people if it can't explain the past well, especially historical statistical data. This requires the establishment of theoretical model to be revised repeatedly in the whole process of parameter estimation and model verification, so as to get a mathematical model that can not only have a good economic explanation, but also better reflect the relationship between variables that have happened in history. It is wrong to ignore any aspect. You can also make a scatter plot according to the sample data of variables to explain the relationship between variables and the explained variables, and the functional relationship between variables shown in the scatter plot is used as the mathematical form of the theoretical model. This is also a method that people often use in modeling. In some cases, if the mathematical form of the model cannot be determined in advance, then try to simulate it in various possible forms, and then choose the one with better simulation effect. 3. Theoretical expectation of the parameters to be estimated in the formulaic theoretical model Generally speaking, the parameters to be estimated in the theoretical model have specific economic significance, and their values can only be determined after the model is estimated and tested, that is, after the economic mathematical model is completed, but their numerical range, that is, theoretical expectation, can be drawn up at the beginning according to their economic significance. This theory is expected to be used to test the estimation results of the model. The key to drawing up the theoretical expectation value of the parameters to be estimated in the theoretical model lies in understanding the economic significance of the parameters to be estimated. For example, there are four parameters to be estimated and α, β, γ and A in the above production function theoretical model. Among them, α is the output elasticity of capital, β is the output elasticity of labor, γ is approximately the speed of technological progress, and A is the efficiency coefficient. According to these economic significance, their numerical range should be centralized. The change of economic variables in time series is often slow, for example, the annual change of residents' income is only about 5%. If in a consumption function model, residents' consumption is the explained variable, residents' income is the explained variable, and its time series data is the sample data of the explained variable, it is difficult to reflect the long-term relationship between the two variables because the sample data is too concentrated. This is also one of the main reasons why time series is not suitable for estimating the structural parameters reflecting the long-term change relationship in the model. The fourth is the sequence correlation of random error terms in the model. Using time series data as samples, it is easy to cause the random error term of the model to produce sequence correlation. This problem will be discussed later. Cross-sectional data is a batch of survey data that occurs simultaneously on the cross-section. For example, industrial census data, population census data and household survey data are mainly provided by statistical departments. Using cross-sectional data as sample data of econometric model, we should pay attention to the following problems. One is the consistency between the sample and the matrix. Mathematically speaking, the parameter estimation of econometric model is to estimate the parameters of the matrix with individual samples randomly selected from the matrix, so it is required that the matrix and individuals must be consistent. For example, to estimate the production function model of coal enterprises, we can only use the data of coal enterprises as samples, not the data of coal industry. Then, the cross-sectional data is difficult to estimate some set models. For example, if the production function model of coal industry is established, the appropriate cross-sectional data cannot be obtained. Second, the heteroscedasticity of the random error term of the model. Using cross-sectional data as samples is easy to cause heteroscedasticity of random error terms in the model. This problem will be discussed later. Virtual variable data, also known as binary data, generally takes 0 or 1. Fictitious variables are often used in econometric models to represent factors such as policies and conditions. For example, the econometric model of grain production in China is established, and the grain output is taken as the explained variable. In addition to variables such as sowing area, fertilizer consumption, total power of agricultural machinery, and affected area, obviously, policy factors cannot be ignored. 1980 or so, due to different policies, even if the above variables have not changed, the grain output will change greatly. Therefore, policy variables must be introduced into explanatory variables, which are represented by dummy variables. For years after 1980, the sample observation value of dummy variable is 1, and for years before 1980, the sample observation value of dummy variable is 0. You can also take values other than 0 and l to indicate the degree of change of this factor. For example, in the industrial production model, dummy variables are used to represent the influence of climate on industrial production, and the influence degree of climate in different years can be expressed by 0, 1,-1, even 0.5, -0.5, etc. However, this method should be used with caution to avoid violating objectivity. 2. The quality of sample data can generally be summarized into four aspects: completeness, accuracy, comparability and consistency. Integrity, that is, all variables contained in the model must get the same sample observations. This is not only the need of model parameter estimation, but also the characteristic of economic phenomenon itself. However, in practice, the phenomenon of "losing data" often occurs, especially in China, where the economic system and accounting system are in transition. When "missing data" appears, if the sample size is large enough and the relationship between sample points is not close, the sample points where "missing data" is located can be completely eliminated; If the sample size is limited or the sample points are closely related, removing a sample point will affect the estimation quality of the model, so special techniques should be adopted to make up for the "missing data". Accuracy has two meanings. First, the obtained data must accurately reflect the state of the economic factors it describes, that is, the statistical data or survey data itself is accurate; Second, it must be exactly needed in model research, that is, to meet the requirements of the model for variable aperture. The former is obvious, while the latter is easily overlooked. For example, in the production function model, capital, labor, etc. as explanatory variables. It must be the part of the production factor that is put into the production process and contributes to the output. Take the labor force as an example, it should be the part of workers who put into the production process and contribute to the output. Therefore, when collecting sample data, we should collect the number of productive employees instead of the number of all employees as sample data. Although the number of all employees is statistically accurate, a considerable part of them have nothing to do with the production process and are not required by the model. Comparability, that is, the problem of data caliber, can be said to be everywhere in the study of econometric models. However, the economic statistics that people can easily get are generally not comparable, because the changes of statistical scope and price must be processed before they can be used to estimate the model parameters. Econometrics method is to find the objective regularity of economic activities from sample data. If the data are not comparable, then the regularity obtained can hardly reflect the reality. Different researchers study the same economic phenomenon, adopt the same variables and mathematical forms, and choose the same sample points, but they may get far different model parameter estimation results. Why? The reason lies in the comparability of sample data. For example, using time series data as sample data of production function model, the total output value calculated at constant prices in different years is comparable; The original value of fixed assets whose capital is calculated at the current price is not comparable in different years. For the original value of fixed assets calculated at current prices directly provided in statistical data, some people directly use it for model estimation, while others use it for model estimation after processing, and the results will definitely be different. Consistency, that is, the consistency between matrix and sample. It has been introduced above when discussing the use of cross-sectional data as sample data for econometric models. Violations of consistency often occur, such as using enterprise data as the sample data of industry production function model, using per capita income and consumption data as the sample data of total consumption function model, using 3 1 province data as the sample data of national total model, and so on. 3. Estimation of model parameters The estimation method of model parameters is the core content of econometrics. After establishing the theoretical model and collecting the sample data that meet the requirements of the model, we can choose the appropriate method to estimate the model and get the estimator of the model parameters. The estimation of model parameters is a purely technical process, including model identification (for simultaneous equation model), selection of estimation methods, application of software and so on. The following chapters will spend a lot of time discussing the estimation problem, so I won't go into details here. Fourth, the test of the model After obtaining the parameter estimation of the model, it can be said that an econometric model has been initially established. But whether it can objectively reveal the relationship between various factors in the studied economic phenomenon and whether it can be put into application depends on the test. Generally speaking, econometric models must pass four tests, namely, economic significance test, statistical test, econometric test and prediction test. 1. economic significance test The economic significance test mainly tests the rationality of the model parameter estimator in the economic sense. The main method is to compare the estimated parameters of the model with the theoretical expected values drawn up in advance, including the symbol, size and relationship of the estimated parameters, so as to judge its rationality. First, check the symbol of the parameter estimator. For example, there is the following coal industry production model: coal output =- 108.5427+0.00067× original value of fixed assets +0.0 1527× number of employees -0.0068 1× power consumption +0.00256× wood consumption. In this model, the parameters before power consumption. The model can't pass the test. Find out the reason and rebuild the model. No matter how high the quality of other aspects is, the model has no practical value. 2. Statistical test Statistical test is determined by statistical theory, and its purpose is to test the statistical properties of the model. Usually, the most widely used statistical test criteria are goodness of fit test, significance test of variables and equations, etc. 3. Econometrics test Econometrics test is determined by econometric theory, and its purpose is to test the econometric nature of the model. Usually, the most important test criteria are sequence correlation test and heteroscedasticity test of random error terms, multicollinearity test of explanatory variables and so on. 4. Model prediction test The prediction test is mainly to test the stability of the model parameter estimator and the sensitivity when the relative sample size changes, and to determine whether the established model can be used outside the range of sample observation, that is, the so-called over-sampling characteristics of the model. The specific test methods are as follows: (1) Re-estimate the model parameters with the expanded sample, and compare the new estimated value with the original estimated value to test the significance of the difference between them; (2) The established model is applied to the actual prediction of a certain period outside the sample, and the predicted value is compared with the actual observation value to test the significance of the difference between them. After going through and passing the test of the above steps, it can be said that the required econometric model has been established and can be applied to the expected purpose. V. Three elements of success of econometric model From the above steps of establishing econometric model, it is not difficult to see that any econometric research and any econometric model must rely on three elements: theory, method and data. Theory, that is, economic theory, the behavior theory of economic phenomena studied is the basis of econometrics research. Methods, mainly including model methods and calculation methods, are the tools and means of econometrics research and the main characteristics that distinguish econometrics from other branches of economics. Data, or more broadly, information, which reflects the activity level, mutual relationship and external environment of the research object, is the raw material of econometric research. These three aspects are indispensable. Generally speaking, in econometric research, the research of methods is the focus of people's attention, and the level of methods often becomes the main basis for measuring the level of a research achievement. This is normal. It is the unshirkable responsibility of econometric researchers to study the theoretical methods of econometrics. However, we can't ignore the discussion of economic theory. A person who doesn't understand economic theory and economic behavior can't engage in econometric research, and it is impossible to establish even a very simple econometric model. Therefore, econometric economists should be economists first. In contrast, people pay less attention to data, especially data quality. When applying for research projects or evaluating research results, they lack a serious review of the availability, availability and reliability of data. When there are problems in the research process, the reasons are rarely found from the data quality. At present, the actual situation is that data has become an important issue that restricts the development of econometrics. Correlation analysis, regression analysis and causal analysis of intransitive verbs from the above steps of establishing econometric model, it is further seen that the core of classical econometric methods is to reveal the causal relationship between variables by regression analysis. However, the correlation between variables does not mean causality. This is a very important concept in establishing econometric model, so it is necessary to make a simple explanation of correlation and causality first. The so-called correlation refers to the random mathematical relationship between the sample observation value series of two or more variables, which is measured by correlation coefficient. If the absolute value of the correlation coefficient between the observation sequences of two variables is 1, there is a complete correlation (perfect positive correlation or complete negative correlation) between them; If the absolute value of the correlation coefficient is relatively large or close to 1, it shows that there is a strong correlation between them; If the absolute value of the correlation coefficient is 0 or close to 0, there is no correlation between them. If a variable is correlated with the linear combination of two or more other variables, the correlation coefficient between it and each variable is called partial correlation coefficient. Correlation is a purely mathematical relationship between variables, and the only basis for judging whether there is correlation between variables is data. Causality refers to the dependence of two or more variables on behavior mechanism. The variable as the result is determined by the variable as the cause, and the change of the cause variable causes the change of the result variable. Causality can be divided into one-way causality and mutual causality. For example, there is a one-way causal relationship between labor and GDP. In economic behavior, labor affects GDP, not the other way around. However, there is causality between GDP and total consumption. GDP not only determines the total consumption, but also is driven by consumption. There must be a mathematical correlation between variables with causal relationship. However, there is not necessarily a causal relationship between correlated variables. For example, there is a strong correlation between China's GDP and Indian population, because both of them are growing at a relatively fast speed, but obviously there is no causal relationship between them. Correlation analysis is a mathematical analysis method to judge whether there is correlation between variables, which is realized by calculating the correlation coefficient between variables. Regression analysis is also a mathematical analysis method to judge whether there is correlation between variables. The key point is to judge whether there is correlation between a random variable and one or more controllable variables. Because of its specific function, it is also used for causal analysis between variables. However, regression analysis alone cannot make a final judgment on the causal relationship between variables, and it must be combined with qualitative analysis of economic behavior. These are the three elements of establishing econometric model emphasized above.