How to apply machine learning to quantitative trading (1)
A friend once asked whether there are few applications of machine learning in the quantitative field in China because the effect is not as good as simple strategies. In fact, applying machine learning to quantitative trading always faces dilemmas, but it is not an unsolvable dilemma. In many cases, it’s not that machine learning doesn’t work, but that there are too few people who really know how to use Machine Learning with correct and scientific statistical thinking.
Machine learning involves a complete set of modeling processes such as feature selection, feature engineering, model selection, data preprocessing, result verification and analysis, etc. From a broad perspective, it is not just a matter of model selection. Therefore, if you think that research such as "Using support vector machines to successfully predict the rise and fall of stocks" is the application of machine learning to quantitative trading, this narrow understanding is undoubtedly buying a casket for a pearl and turning a blind eye to the pearls scattered all over the field of machine learning. If the rise of machine learning is considered in the historical process, it is nothing more than a continuation of the trend: now, the vague and uncertain experience of the past can be confirmed through systematic data analysis, and machine learning algorithms can bring unnoticed patterns to the surface.
In my opinion, there are two directions for future development:
1. Statistical learning algorithms for quantitative trading have been proposed, making them suitable for situations with large noise and unstable distribution. Financial data analysis;
2. The enthusiasm for machine learning returns to rationality, from tool-oriented to problem-oriented.
Share some ideas on how to choose appropriate tools in machine learning algorithms in a problem-oriented manner.
1. Calculation of factor weights for multi-factor models
After we build a multi-factor model and have selected a series of factors, how do we adjust each factor according to different market conditions? What about the weight? In previous studies, it was found that compared with other algorithms, the random forest algorithm has better analysis results for training sets with nonlinearity, noise, and independent variables that are extremely linear. Therefore, currently on the weights of the multi-factor model, random forest regression analysis is performed on the factors of the previous period using the current period's rate of return to determine the factor weights of the next period's multi-factor model.
2. Missing value processing
Handling missing values ??is an inevitable problem in quantitative financial analysis. Choosing a reasonable missing value processing method depends on the characteristics of the data itself, the situation of missing data, its corresponding economic significance, and what calculations we need to use the data for. When trying to build a multi-factor model, we chose two missing value replacement methods: (1) Use the expectation maximization algorithm to perform maximum likelihood estimation of missing values ??using known data of the same variable. (2) Treat all factors included in the model as feature variables and give them the same weight, and then use the K-nearest neighbor algorithm in machine learning to find the most similar targets to ensure that after missing values ??are replaced, some factors will not be strengthened. Influence.
In fact, in the field of quantification, machine learning solves the inherent defects or shortcomings of linear models, so it is still deeply involved. Excluding applications in fields such as convex optimization and dimensionality reduction (extracting market characteristics), "non-dynamics" and "non-linearity" are currently two important drawbacks. Financial relationships are not static, nor are they linear in many cases. The advantages of statistical learning will be reflected at this time, they can quickly adapt to the market, or describe the market in a more "accurate" way.
In China, the application of machine learning in quantification has a lot to do with the field and frequency. For example, CTA may be used more than stocks. The dimensionality of its data processing is much smaller than that of stocks, and it is better at capturing the length and dynamics of the market than stocks. The momentum of the stock market is weaker than that of the futures market, and its trend is more obvious and less noisy than that of stocks. These characteristics are more beneficial for machine learning to work.
It is very likely that the design of some domestic transaction execution algorithms draws on machine learning. We can make some probabilistic predictions about the changes in the next handicap by learning the characteristics of the order book. After training with a certain sample, the algorithm performance can be significantly improved.
The reason why I am still cautiously optimistic about machine learning methods such as deep learning is that in the understanding market, most of the current methods are not on the same dimension as these methods. This advantage makes them compared with other methods. , capture more revenue. In other words, a new perspective on understanding the market can bring about alpha.