Through data preprocessing, the following remarkable features are sorted out:
1. The overall fraud rate in the transaction is 2%-3%.
2. Most transaction frauds are initiated by non-cardholders, and a large number of cards are used in a short period of time, involving a large amount.
3. The merchant fraud rate of120-150 transaction reached 50%. With the increase in the number of transactions, especially for merchants with 2000 or more transactions, the possibility of merchant fraud is reduced, and the merchants with less than 150 transactions should be monitored.
4. In the given data, there are still a few merchants whose transactions amount to thousands, and the number of fraudulent transactions reaches 1000+, which shows that the current fraud detection methods are not mature enough to find suspicious businesses in time.
5. From the perspective of fraud rate, the less transactions, the higher the fraud rate and the easier it is to cheat.
Based on the above data, the top image intelligent analysis platform has derived several thousand important features, divided the training and verification data sets according to a certain proportion, and used the appropriate machine learning model to make the AUC (area under curve) of the model reach above 0.75. Then, the data of model training is adjusted, and the samples are sampled and adjusted before entering the model training, so that when the accuracy reaches over 90%, 70% of fraudulent transactions can be covered.
After model training, the data with the top prediction ranking is finally selected as the final prediction list, which achieves a very high hit rate. The top image intelligent analysis platform covers data transmission, data storage, data management, data ETL, AI modeling and other functions, which can greatly reduce the threshold for enterprises to use artificial intelligence technology. Enterprises only need to provide relevant data to realize all operations such as data ETL and modeling, and accelerate the analysis and application of data and the application of the latest AI technology through offline scheduling and real-time analysis and decision-making.