The main research contents and key problems to be solved in financial mathematics include:
(1) Pricing theory of securities and security portfolios
Development of marketable securities and security portfolios Pricing theory of securities (especially derivatives such as futures and options). The mathematical method used is mainly to propose a suitable stochastic differential equation or stochastic difference equation model to form the corresponding inverse equation. The corresponding nonlinear Feynman-Kac formula is established, and a very general generalized Black-Scholes pricing formula is derived. The resulting inverse equation will be a high-dimensional nonlinear constrained singular equation.
Study the pricing problem of portfolios of securities with different maturities and yields. It is necessary to establish a mathematical model that combines pricing and optimization. In terms of research on mathematical tools, research on stochastic programming, fuzzy programming and optimization algorithms may be needed.
Under the condition that the market is incomplete, the pricing theory related to preferences is introduced.
(2) Incomplete Market Economic Equilibrium Theory (GEI)
It is planned to conduct research in the following aspects:
1. Infinite dimensional space, infinite horizontal space, and infinite state
2. Stochastic economy, arbitrage-free equilibrium, economic structure parameter variation, non-linear asset structure
3. Innovation and Design of Asset Securities
4. Economy with Friction
5. Corporate behavior and production, bankruptcy and bad debt
6. Securities market games.
(3) The application of GEI plate balancing algorithm and Monte Carlo method in the calculation of economic equilibrium point, the application of GEI theory in financial and fiscal economic macroeconomic control, and sustainable development under incomplete market conditions Study the pricing of natural resource assets and the sustainable utilization of natural resources under a theoretical framework.
1. What are association rules
Before describing some details about association rules, let's first look at an interesting story: the story of "diapers and beer".
In a supermarket, there is an interesting phenomenon: diapers and beer are sold together. But this strange move resulted in increased sales of both diapers and beer. This is not a joke, but a real case that happened in the American Wal-Mart chain supermarket, and has been talked about by merchants. Wal-Mart has the world's largest data warehouse system. In order to accurately understand customers' purchasing habits in its stores, Wal-Mart conducts shopping basket analysis on its customers' shopping behavior and wants to know what products customers often buy together. Walmart’s data warehouse collects detailed raw transaction data for each of its stores. Based on these original transaction data, Walmart uses data mining methods to analyze and mine these data. An unexpected discovery was: "The most purchased product together with diapers is actually beer! After a lot of actual investigation and analysis, a behavioral pattern of Americans hidden behind "diapers and beer" was revealed: In the United States, some young people Fathers often go to the supermarket to buy baby diapers after get off work, and 30% to 40% of them also buy some beer for themselves. The reason for this phenomenon is that American wives often tell their husbands to get off work. Later, they bought diapers for their children, and the husbands brought back their favorite beer after buying diapers. According to conventional thinking, diapers and beer have nothing to do with each other. If it were not for the use of data mining technology to mine and analyze a large amount of transaction data. , it is impossible for Wal-Mart to discover this valuable pattern in the data.
Data association is an important type of discoverable knowledge that exists in the database. There is a certain regularity between them, which is called correlation. Correlation can be divided into simple correlation, temporal correlation, and causal correlation. The purpose of correlation analysis is to find the correlation network hidden in the database. Sometimes the correlation function of the data in the database is not known. , it is uncertain even if it is known, so the rules generated by association analysis have credibility. Agrawal et al. first proposed mining customer transaction databases in 1993. For the problem of association rules between item sets, many researchers have conducted a lot of research on the mining of association rules. Their work includes optimizing the original algorithm, such as introducing random sampling, parallel ideas, etc., to improve the algorithm. Mining the efficiency of rules; promoting the application of association rules.
Association rule mining is an important topic in data mining and has been widely studied by the industry in recent years.
2. Association rule mining process, classification and related algorithms
2.1 Process of association rule mining
The association rule mining process mainly includes two stages: the first In the first stage, all high-frequency item sets (FrequentItemsets) must be found from the data collection, and in the second stage, association rules (AssociationRules) are generated from these high-frequency item sets.
The first stage of association rule mining must find all high-frequency item sets (LargeItemsets) from the original data collection. High frequency means that the frequency of occurrence of a certain item group relative to all records must reach a certain level. The frequency of occurrence of an item group is called support. Taking a 2-itemset containing two items A and B as an example, we can obtain the support of the item group containing {A, B} through formula (1) , if the support is greater than or equal to the set minimum support (MinimumSupport) threshold, then {A, B} is called a high-frequency project group. A k-itemset that satisfies the minimum support is called a high-frequency k-itemset (Frequentk-itemset), generally expressed as Largek or Frequentk. The algorithm then generates Largek+1 from Largek's item group until no longer high-frequency item group can be found.
The second stage of association rule mining is to generate association rules (AssociationRules). Generating association rules from high-frequency item groups is to use the high-frequency k-item group in the previous step to generate rules. Under the conditional threshold of minimum reliability (MinimumConfidence), if the reliability obtained by a rule satisfies the minimum reliability , call this rule an association rule. For example: the reliability of rule AB generated through the high-frequency k-item group {A, B} can be obtained through formula (2). If the reliability is greater than or equal to the minimum reliability, AB is called an association rule.
As far as the Wal-Mart case is concerned, using association rule mining technology to mine records in the transaction database, one must first set two thresholds: minimum support and minimum trust. This assumes that the minimum support min_support=5% and the minimum confidence min_confidence=70%. Therefore, the association rules that meet the needs of this supermarket must meet the above two conditions at the same time. If the association rule "diaper, beer" found through the mining process meets the following conditions, the association rule "diaper, beer" will be accepted. The formula can be used to describe Support (diapers, beer)>=5% and Confidence (diapers, beer)>=70%. Among them, the meaning of Support (diapers, beer) >= 5% in this application example is: among all transaction record data, at least 5% of transactions show the transaction behavior of diapers and beer being purchased at the same time. Confidence (diaper, beer) >= 70% means in this application example: among all transaction record data containing diapers, at least 70% of the transactions will also purchase beer. Therefore, if a consumer purchases diapers in the future, the supermarket will recommend that the consumer purchase beer at the same time. The behavior of this product recommendation is based on the "diaper, beer" association rule, because the supermarket's past transaction records support the consumer behavior of "most purchases of diapers are accompanied by the purchase of beer."
It can also be seen from the above introduction that association rule mining is usually more suitable for situations where the indicators in the records take discrete values. If the index values ??in the original database are continuous data, appropriate data discretization should be performed before association rule mining (actually, the value of a certain interval corresponds to a certain value). The discretization of data is the key to data mining. As an important step in the process, whether the discretization process is reasonable will directly affect the mining results of association rules.
2.2 Classification of association rules
According to different situations, association rules can be classified as follows:
1. Based on the categories of variables processed in the rules, association rules Can be divided into Boolean and numerical types.
The values ??processed by Boolean association rules are discrete and categorical, and they show the relationship between these variables; while numerical association rules can be combined with multi-dimensional association or multi-layer association rules. Process numeric fields, dynamically segment them, or directly process the original data. Of course, numeric association rules can also include category variables. For example: Gender = "Female" => Occupation = "Secretary", which is a Boolean association rule; Gender = "Female" => avg (income) = 2300, the income involved is of numeric type, so it is a numeric association rule.
2. Based on the abstraction level of the data in the rules, it can be divided into single-layer association rules and multi-layer association rules.
In single-layer association rules, all variables do not take into account that the actual data has multiple different levels; in multi-layer association rules, the multi-layer nature of the data has been ignored. Full consideration was given. For example: IBM desktop => Sony printer is a single-layer association rule on detailed data; desktop => Sony printer is a multi-layer association rule between a higher level and a detail level.
3. Based on the dimensionality of the data involved in the rules, association rules can be divided into single-dimensional and multi-dimensional.
In single-dimensional association rules, we only involve one dimension of the data, such as the items purchased by the user; in multi-dimensional association rules, the data to be processed will involve multiple dimensions. In other words, single-dimensional association rules deal with some relationships in a single attribute; multi-dimensional association rules deal with some relationships between various attributes. For example: Beer => Diapers, this rule only involves the items purchased by the user; Gender = "Female" => Occupation = "Secretary", this rule involves information in two fields and is in two dimensions. An association rule. Apriori algorithm
2.3 Related algorithms for association rule mining
1. Apriori algorithm: Use candidate item sets to find frequent item sets
Apriori algorithm is one of the most effective Influence of algorithms for mining frequent itemsets of Boolean association rules. Its core is a recursive algorithm based on the two-stage frequency set idea. This association rule is classified as a single-dimensional, single-level, Boolean association rule. Here, all itemsets whose support is greater than the minimum support are called frequent itemsets, or frequent itemsets for short.
The basic idea of ??this algorithm is: first find all frequency sets whose frequency of occurrence is at least as high as the predefined minimum support. Then strong association rules are generated from the frequency set, and these rules must satisfy the minimum support and minimum credibility. Then use the frequency set found in step 1 to generate the desired rules, and generate all rules that only contain the items of the set, in which there is only one item on the right side of each rule. The definition of the medium rule is used here. Once these rules are generated, only those rules that are greater than the minimum confidence level given by the user are retained. To generate all frequency sets, a recursive approach is used.
The possibility of generating a large number of candidate sets and the possibility of repeatedly scanning the database are two major shortcomings of the Apriori algorithm.
2. Algorithm based on partitioning: Savasere et al. designed an algorithm based on partitioning. This algorithm first logically divides the database into several disjoint blocks, considers each block separately and generates all frequency sets for it, and then merges the generated frequency sets to generate all possible frequency sets. Finally, the support of these itemsets is calculated. The size of the chunks here is chosen so that each chunk can fit into main memory and only needs to be scanned once per stage. The correctness of the algorithm is guaranteed by the fact that every possible frequency set is a frequency set in at least one block. The algorithm is highly parallelizable and can assign each block to a processor to generate a frequency set. After each cycle of generating frequency sets ends, the processors communicate with each other to generate global candidate k-itemsets. Usually the communication process here is the main bottleneck in the algorithm execution time; on the other hand, the time for each independent processor to generate the frequency set is also a bottleneck.
3. FP-Tree Frequency Set Algorithm: In view of the inherent shortcomings of the Apriori algorithm, J.Han et al. proposed a method that does not generate candidate frequent item sets for mining: FP-Tree Frequency Set Algorithm.
Adopting a divide-and-conquer strategy, after the first scan, the frequency sets in the database are compressed into a frequent pattern tree (FP-tree) while still retaining the associated information, and then the FP-tree is divided into some conditions libraries, each library is related to a frequency set of length 1, and then these condition libraries are mined separately. When the amount of original data is large, the partitioning method can also be combined so that an FP-tree can be placed in the main memory. Experiments show that FP-growth has good adaptability to rules of different lengths, and its efficiency is greatly improved compared to the Apriori algorithm.
3. Application of this field at home and abroad
3.1 Application of association rule mining technology at home and abroad
For now, association rule mining technology has It is widely used in Western financial industry companies and can successfully predict bank customer needs. Once this information is obtained, banks can improve their own marketing. Now banks are developing new ways to communicate with customers every day. Each bank bundles the bank's product information that may be of interest to customers on its own ATM machine, so that users who use the bank's ATM machine can learn about it. If the database shows that a customer with a high credit limit has changed their address, it is likely that this customer has recently purchased a larger home and therefore may need a higher credit limit, a new higher-end credit card, or a new credit card. Home improvement loans, these products can be mailed to customers via credit card statements. When customers call for consultation, the database can effectively help telephone sales representatives. The sales representative's computer screen can display the characteristics of the customer, as well as what products the customer would be interested in.
At the same time, some well-known e-commerce sites have also benefited from powerful association rule mining. These e-shopping sites use association rules to mine and then set up bundles that users intend to purchase together. Some shopping websites also use them to set up corresponding cross-sells, that is, customers who buy a certain product will see related advertisements for another product.
But currently in our country, "massive data and lack of information" is a common embarrassment faced by commercial banks after the large-scale concentration of data. Most databases currently implemented in the financial industry can only implement lower-level functions such as data entry, query, and statistics, but cannot discover various useful information existing in the data, such as analyzing these data and discovering their data patterns and patterns. Characteristics, and then it is possible to discover the financial and business interests of a certain customer, consumer group or organization, and to observe changing trends in financial markets. It can be said that the research and application of association rule mining technology in our country are not very extensive and in-depth.
3.2 Some research on association rule mining technology in recent years
Since many application problems are often more complex than supermarket purchasing problems, a large number of studies have expanded association rules from different perspectives, including More factors are integrated into the association rule mining method, thereby enriching the application fields of association rules and broadening the scope of supporting management decisions. For example, consider category hierarchical relationships between attributes, temporal relationships, multi-table mining, etc. In recent years, research on association rules has mainly focused on two aspects, namely, expanding the scope of problems that classic association rules can solve, and improving the efficiency of classic association rule mining algorithms and the interest of rules.