Current location - Trademark Inquiry Complete Network - Futures platform - Difference between single model and fusion model
Difference between single model and fusion model
For a long time, the stable operation of the futures market has been the focus of the exchange, which plays an important role in the function of futures varieties. The open position of contract trading is an important indicator of the operation of futures market, and it is also the basic indicator of using futures to manage risks. In order to enhance the understanding of market trends and improve the ability of operation prediction, this paper analyzes the data based on the historical operation rules of contracts and the parameters of risk control measures, extracts historical data and risk control parameters as input features, establishes a fusion model based on various machine learning algorithms, and uses grid search to set the optimal parameters to predict the trading volume and positions of futures contracts in the next five days. The experimental results show that the average accuracy of forecasting trading volume is close to 70%, and the average accuracy of holding volume is 83%. At the same time, this paper proves that the fusion model and grid search technology play a significant role in improving the prediction accuracy in the form of case analysis.

I. Project background

Futures trading is a barometer of the spot market, which provides a benchmark for commodity forward pricing and is of great significance for escorting the steady operation of the real economy. Futures trading prices are composed of different participants and the same quotation. Hedgers use market-locked profits to manage the risk of price fluctuations, while speculators try to judge the market and make profits. When the market transaction is overheated, the futures price will be distorted and deviate from the spot price, which may bring losses to investors and hedging enterprises; On the contrary, when liquidity is insufficient, futures prices cannot accurately reflect the "* * * knowledge" of market participants. Therefore, stable market participation is an important basis for reasonable pricing of futures trading. Predicting market heat is very important for regulating market sentiment and using risk control measures reasonably to stabilize the market. In this paper, the core problem is defined as forecasting the market heat, that is, forecasting the trading position of varieties. Generally speaking, the trend of trading positions in the futures market is influenced by many factors, such as price changes of futures targets, sudden public opinion events, and policy influences. Under the influence of many factors, it is difficult for simple rule algorithm to effectively predict trading positions. Based on this, this paper attempts to extract effective features from multiple dimensions, and uses three independent machine learning models to capture the different relationships between data. Finally, the grid search method is used to fuse the results of the three models and output the final prediction results. The organizational structure of this paper is as follows: The second chapter analyzes the historical operation data and studies the relationship among risk control parameters, settlement price and trading position; The third chapter introduces the feature extraction and the construction of three single models in detail. The fourth chapter introduces model fusion and weight grid search technology; The fifth chapter designs experiments to verify the effectiveness of the model, and the design method explains the model results; The sixth chapter is the interpretable method of the model; The seventh chapter is the summary and prospect.

Second, the historical operation analysis

In the problems related to machine learning, data analysis is the basis of the whole data modeling, which determines the quality of feature extraction and the final effect of the model. Data analysis plays an important role in understanding the target problem and guiding the iterative construction of the model. The data analysis of this paper covers multiple dimensions, and the following four aspects are selected to briefly analyze the historical data of the exchange.

(A) the life cycle of the main contract (bimodal phenomenon)

Looking back at the historical data, all contracts will experience the process of gradually increasing trading positions and then decreasing during the whole cycle of listing and delisting. Among them, a large proportion of the main contracts (close to 40%) will present a "double peak" phenomenon in the whole listing and delisting cycle. The phenomenon of "double peaks" means that after the contract becomes the main force, the trading volume and positions will experience two peaks, especially the trading volume. We show the trading position trends of glass futures 1 705 and cotton futures 180 1 respectively with figure1and figure 2. As can be seen from the figure, although glass and cotton belong to non-agricultural and agricultural categories, the trading volume presents a typical bimodal pattern. One possible reason for this phenomenon is that when the contract becomes the main contract, the trading funds will flow in quickly, leading to the rapid expansion of trading positions; Second, the former main contract entered the delivery delisting stage, so the main contract reached the second peak. The periodicity of bimodal phenomenon plays a guiding role in grasping the operating law of varieties and predicting trading positions.

Figure 1. Glass 1705 Contract Trading Position Trend Chart

Figure 2. Cotton 180 1 Contract Trading Position Trend Chart

(B) the relationship between variety trading positions and prices

In order to explore the influencing factors of trading and positions, this paper focuses on the relationship between price fluctuation and trading positions. There is a kind of "* * * knowledge" in the trading market, that is, price fluctuations will cause the amplification of trading positions. Therefore, this paper attempts to calculate the Pearson correlation coefficient 1 between price fluctuation and trading position change. This paper studies whether price fluctuation will substantially cause the amplification of trading volume and positions from the dimension of long-term operation. We have defined the following three indicators:

Where t represents the current date, n represents the time difference, PT represents the settlement price of T day, VT represents the trading volume of T day, and HT represents the position of T day; Correspondingly, PT-N, VT-N and HT-N respectively represent the corresponding values of T-N days; Pdelta represents the absolute value of price fluctuation ratio between T day and T-N day, Vdelta represents the actual value of corresponding trading volume fluctuation ratio, and Hdelta represents the actual value of position fluctuation ratio. When n is set to [1-5] days, we calculate Pearson coefficients between Pdelta, Vdelta and Hdelta respectively. In the experiment, we selected all the listed varieties from 20 16 to 20 18 of Zhengshang Institute, and summarized all the listed contracts under the varieties. The details are provided in the table below.

Table 1. Table of correlation coefficient between price fluctuation and trading position change

In the table 1, PD 1-HD 1 indicates the correlation coefficient between Pdelta and Hdelta when the value of n is 1; PD 1-VD 1 indicates the correlation coefficient between Pdelta and Vdelta when the value of n is 1, and so on. It is easy to know from the table that the correlation coefficients of all varieties are positive under different values of n, so there is indeed a positive correlation between the change of trading positions and the absolute value of price fluctuations. However, it is generally believed in academic circles that when the correlation coefficient | r | >;; 0.8, the two variables have a high correlation; When 0.6

(3) the long-term relationship between the variety trading position and the risk control measures.

Besides price fluctuation, this paper also studies the long-term influence of risk control parameters on trading positions. The setting of risk control parameters is intended to adjust the market heat and stabilize market changes. Considering that the parameters such as margin and handling fee are different from the change dimension of trading positions, when analyzing the correlation between relevant parameters and trading position fluctuation, this paper decides to use the coefficient of variation to measure the operation of trading and positions under different risk control parameters. The specific calculation method is as follows.

The greater the coefficient of variation, the greater the fluctuation range of trading position compared with its average value. Based on the data of 20 16 to 20 18, the variation coefficients of different varieties were calculated respectively, and the relationship between risk control parameters and variation coefficients was calculated by Pearson coefficient. This paper focuses on the analysis of the deposit and handling fee of Ping Jin warehouse. The specific results are shown in Table 2 and Table 3. Considering that the calculation of correlation coefficient needs to adjust the relevant risk control parameters many times, only the qualified varieties in the selected interval are kept in Table 2 and Table 3 for analysis.

Table 2. Margin and coefficient of variation of long-term fluctuation of varieties

Table 3. Variation coefficient between warehouse handling fee and long-term fluctuation of varieties in Ping Jin

From Table 2 and Table 3, it can be found that, on the whole, the values of margin and liquidation fee are negatively correlated with the coefficient of variation of trading position fluctuation. When the margin or liquidation fee increases, the fluctuation ratio of the trading position of the corresponding variety is relatively small, and the specific value changes greatly due to the variety difference. It is found in the observation table that the correlation coefficient of some varieties is positive, which may be because the adjustment points gather in unilateral market or large fluctuation market. Based on the above analysis, we decided to introduce the parameters of risk control measures into the feature sequence as the basis of prediction.

(d) The short-term relationship between the variety trading position and the parameters of risk control measures.

In addition to the long-term dimension, based on the relevant data from 20 16 to 20 18, this paper attempts to explore the impact of short-term changes in risk control parameters on market operation. Through data analysis, from the perspective of the whole market, the short-term impact of changes in margin and handling fees on trading positions is random and has no significant law. Based on this, this paper decided to divide customers into four categories according to the transaction characteristics: long-term customers, short-term customers, big customers and small customers. . On the whole, the margin has obvious influence on the position, while the closing fee has obvious influence on the volume. See Figure 3 and Figure 4 for details. The four small graphs in Figure 3 represent the relationship between margin adjustment range and position under four customer groups. The x-axis represents the range of numerical changes before and after margin adjustment, and the y-axis represents the range of average positions within five days before and after margin adjustment. Every point in the chart represents a real adjustment in history. In Figure 4, the X-axis represents the adjustment range of the handling fee of the flat warehouse, and the Y-axis represents the change range of the average transaction volume within five days before and after the adjustment. Observing Figure 3, we can see that when the margin is raised, the positions of large customers and long-term customers show a downward trend; When the margin is lowered, the positions of small customers and short-term customers are on the rise. Observing Figure 4, it can be seen that the increase of the handling fee of the flat warehouse has obvious influence on the reduction of the short-term customer transaction volume, which is also in line with the general cognition; Accordingly, reducing the handling fee has a certain effect on promoting the transaction volume of short-term small customers.

In the short term, the parameters of risk control measures have certain influence on the trading positions of different customer groups. Therefore, this paper also introduces the change value of risk control parameter measure into the feature vector.

Figure 3. The relationship between the margin adjustment range and the average position of five days before and after the adjustment point under different customer groups.

Figure 4. The relationship between the adjustment range of liquidation fee and the change of average transaction volume in five days before and after the adjustment point under different customer groups.

Thirdly, feature engineering and model construction.

Based on the above analysis, trading positions are affected by multiple factors, and different factors have different degrees of influence. This paper attempts to use multi-model fusion to capture the different relationships between data, dig deep value and predict future trading positions. The specific problem is defined as: for any contract, after the closing of T day, according to the current risk control measures parameters and the historical operation data of that day, the trading volume and positions in the next five trading days are estimated.

Among the problems related to numerical prediction, feature selection is an important basis for model construction and determines the effect of the model. After data analysis and experimental iteration, this paper finally decided to choose four categories of ***3 17 dimensional features, including settlement parameters, market characteristics, customer characteristics and contract characteristics. Among them, the settlement parameters include historical price fluctuation, price difference between contracts and other multidimensional characteristics; Market characteristics include data characteristics related to historical trading positions; Customer characteristics include characteristic data of customers with different attributes and transaction characteristics of different customer groups; Contract features mainly extract contract operation features and contract stages, and constrain the forecast results.

Table 4. Data feature table

Among the above 3 17 dimensions, 7 dimensions are contract binding characteristics, and 3 10 dimension is time series characteristics related to historical transactions. After feature extraction, this paper begins to construct three machine learning models. Details are as follows.

(a) comprehensive moving average autoregressive (ARIMA)

In the field of statistics and economy, ARIMA (Autoregressive Integrated Moving Average) model is a commonly used time series prediction algorithm, which is usually applied to stationary time series, or to eliminate the non-stationarity of the mean equation through differential stationary process. Among them, autoregressive (AR) is a statistical method to deal with time series. It measures the correlation of random variables in the series itself at different times, predicts the value of the current time by using the value of the variables at the previous time, and assumes that they are linear. This method is widely used in the modeling of financial sequence correlation. Moving average model (MA) is another method to model a single variable in time series. Because the problem in this paper is more suitable for ARIMA's common scenarios, we decided to use it to capture the time series of trading positions.

(2) Regression model based on support vector machine.

ARIMA model is based on capturing the correlation of time series to directly predict the future trading volume. At the same time, we hope to use more information to get better prediction results. We decided to use support vector machine to predict the fluctuation of future trading volume and positions as a supplement. Support vector machine (SVM) is a machine learning model widely used in classification and regression problems. The core of this method is to use "kernel function" to effectively deal with low-latitude nonlinear features and map them to high-dimensional feature space. Data are classified or regressed by finding hyperplane in high-dimensional space.

(III) sequence model of sequence pair (Seq2Seq)

Seq2Seq model (Sequence to Sequence) is a deep neural network proposed by Google Brain team and Yoshua Bengio team, which is widely used in translation, automatic text summarization and some regression prediction problems. At first, Seq2Seq was mainly used to solve the problems related to natural language processing. However, due to its powerful ability to mine time series relations, it has been gradually applied to the prediction of numerical series in recent years. As shown in the figure, the network used in this paper takes the market characteristics of the past ten days as the input sequence and maps it to the trading volume or position sequence in the next five days through two processes: encoder and decoder. The encoder combines the input sequence into a hidden vector in the hidden layer by using nonlinear function, which has the ability to express the information of the input sequence and the potential relationship. The decoder decodes the transmitted hidden vector, and predicts the trading position of T+ 1~T+5 day by day in combination with the input market characteristics of T+4 day.

Figure 5. 5' s flowchart. Seq2Seq model

Fourthly, the construction of multi-algorithm fusion model.

The futures market is changing rapidly, and the trading position behavior of customer groups is influenced by many factors, so a single model is easy to fit the historical data and face it.