RFM model analysis and customer segmentation

According to research by Arthur Hughes of the American Database Marketing Institute, there are three magical elements in the customer database. These three elements constitute the most effective way of data analysis. Good indicators: latest consumption (Recency), consumption frequency (Frequency), consumption amount (Monetary).

RFM model: R (Recency) indicates how far back the customer’s last purchase was, F (Frequency) indicates the number of times the customer purchased in the most recent period, and M (Monetary) indicates the number of times the customer purchased in the most recent period. The amount purchased within. Generally, the original data has three fields: customer ID, purchase time (date format), and purchase amount. It is processed with data mining software and weighted (considering the weight) to obtain the RFM score, which can then be used for customer segmentation, customer level classification, and Customer Level Value. Score sorting, etc., to achieve database marketing!

Here again I borrow the RFM customer RFM classification chart of @data mining and data analysis.

The software tools used for this analysis: IBM SPSS Statistics 19, IBM SPSS Modeler 14.1, Tableau7.0, EXCEL and PPT

Because RFM analysis is only a small part of the project Partial analysis, but also facing the processing capabilities of massive data, which requires computer memory and hard disk capacity.

First, let’s talk about some experience of massive data mining and data processing: (only for personal computer operating platforms)

Generally, the data we get is text in compressed format Files that need to be decompressed are all in storage units of Gbytes or more. It is generally best to store them on an external power mobile hard disk; if the customer does not inform you, you probably don’t know how many records and fields there are;

Modeler The default installation of mining software generally requires data exchange with the C drive. At least 100G of space needs to be reserved, otherwise there will be insufficient space during the process of reading data

Be patient when processing massive data and wait for more than 30 minutes to run. Out of results is a common phenomenon, especially in the process of sampling, merging data, data reconstruction, and neural network modeling. You must be resilient, otherwise it will be a tragedy if it is interrupted within one minute, haha;

The preparation phase and data preprocessing time of data mining account for 70% of the entire project. I say here that if it is a very large data set, it may account for more than 90% of the time. On the one hand, the processing is time-consuming, and on the other hand, it may only be processed by this computer and cannot be operated by several computers at the same time;

Multiple computers will make a difference, which is the experience I have always emphasized. Therefore, massive data requires the use of sampling technology for data viewing and pre-operation. Remember: sometimes even if the sample data is normal, there may be problems with all the data.

It is recommended that the data delimiter be stored with "|";

It cannot be overemphasized that a data mining project and mining engineers have an understanding of the industry and business insights. Good data mining must be market-oriented, of course. It also requires IT personnel and market personnel to have a good communication mechanism;

Data mining will face data dictionary and semantic layer meaning understanding. Working hard on MetaData metadata management and understanding will get twice the result with half the effort, otherwise you will have to wait until the data reconstruction is completed. It’s a tragedy to find problems and have to start over again;

Every time I work on massive big data mining, I go to Weibo the most. It really can’t calculate as fast as I can, so I have to go to Weibo and wait. It, haha!

The main thoughts on converting traditional RFM analysis into telecommunications business RFM analysis:

The RFM model here and then segmenting customers are only a small part of the data mining project, assuming We got a month's customer recharge behavior data set (actually six months of data). We first used IBM Modeler software to build an analysis flow:

The data structure fully meets the RFM analysis requirements. There are 30 million transaction records in the monthly data!

We first use the RFM summary node and RFM analysis node of the RFM model of the mining tool to generate R (Recency), F (Frequency), M (Monetary) ;

Then we used the RFM analysis node to complete the reconstruction and organization of the basic data of the RFM model;

Now we have the Recency_Score, Frequency_Score, Monetary_Score and RFM_Score of the RFM model; here are the The RFM score was cut into five equal parts, and weighted by 100, 10, and 1 to obtain the RFM score, which represents 125 RFM cubes.

The traditional RFM model is complete here, but the 125 market segments are too many to target targeted marketing. Customer characteristics and behaviors also need to be identified, and it is necessary to further segment the customer base;

In addition: The RFM model is actually just a data processing method, which can also be accomplished using data reconstruction technology. It is just that the solidified RFM module here is simpler and more direct, but the way we can use RFM to build data is also available for RFM. This module performs data reconstruction.

We can import the obtained data into Tableau software for descriptive analysis: (data mining software is very weak in descriptive and tabulation output, haha)

We can also perform Comparative analysis of different blocks: mean analysis, block category analysis, etc.

At this time we can see the convenience of Tableau visualization tools

Next, we continue to use mining tools to Three fields of R, F, and M are used for cluster analysis. Cluster analysis mainly uses: Kohonen, K-means and Two-step algorithms:

At this time, we have to consider whether to directly use R (Recency), The three variables F (Frequency) and M (Monetary) still need to be transformed, because the measurement scales of the three fields R, F, and M are different. It is best to standardize the three variables, for example: Z score (in actual situations, you can choose linear interpolation method, comparative method, benchmarking method, etc.)! Another consideration: how to consider the weight of the three indicators R, F, and M. In real marketing, the importance of these three indicators is obviously different!

Research data shows that: Regarding the index weight of each variable of RFM, Hughes and Arthur believe that the weight of RFM in measuring an issue is consistent, so they do not give different divisions.

Through the empirical analysis of credit cards, Stone and Bob believe that the weights of each indicator are not the same, and the weights with the highest frequency, followed by recency, and lowest value should be given;

Here we use the weighting method: A simple weighting method of WR=2, WF=3, and WM=5 (the actual situation needs to be determined by experts or marketers); the specific choice of clustering method and number of clusters requires repeated testing and evaluation, and at the same time, which of the three methods should be compared. The method is more ideal!

The following figure is the result of fast clustering:

And the clustering result of kohonen neural algorithm:

Next we need to identify the clusters The meaning of class results and class analysis: Here we can use C5.0 rules to identify the characteristics of different clusters:

Among them, the Two-step two-stage clustering feature map:

Used The evaluation analysis node judges the model recognition ability of C5.0 rules:

The results are not bad. We can choose three clustering methods respectively, or choose a clustering result that is easier to interpret. Here we choose Kohonen After the clustering results are written into the data set, in order to facilitate us to import the data into SPSS software for mean analysis and output to Excel software!

After outputting the results, import the data into Excel, and use R and F Compare the three field classifications of , M and the mean value of the field, and use the conditional format of Excel software to give the trend compared with the mean value! Combine the classification of the RFM model magic cube to identify customer types: Divide the customer group into important retention customers through RFM analysis There are six levels: , important development customers, important retention customers, generally important customers, general customers, and worthless customers; (a certain level may not exist);

Another consideration is for R, F, The standardized scores of the three indicators M are weighted and calculated according to the clustering results, and then the comprehensive score ranking is performed to identify the customer value level of each category;

At this point, if we analyze and segment the customers through the RFM model If you are satisfied, the analysis may end here! If we still have a customer background information database, we can use the clustering results and RFM scores as independent variables to perform other data mining modeling work!

Can't I get a credit card from China Merchants Bank if I work in a financial company?

Credit cards are fake. Fake.

Can I fill in the phone number of the credit card company blankly?

How much is the annual fee for CCB Credit Card Platinum Card?

Can I apply for a credit card at the age of 55?

What will happen if the credit card is overdue?

What is the lineup of Huaxia and Evergrande?

What is the limit of I Platinum Credit Card of CITIC Bank?

How to set account reminder for Gansu Bank app credit card?

The billing date of ICBC credit card is 17, so how to calculate the repayment date?