Label source
From the source of the label, it can be generally divided into three types:
(1) The owner labels the items.
2 experts mark items.
③ Ordinary users mark items.
Owners usually mark the items when they are released, and expert marking is generally initiated by the platform and organized by people to complete the marking. These two methods are suitable for labeling labels that distinguish the objective attributes of articles. ; For example, PGC content publishers can choose whether their content belongs to entertainment or military; When loading goods on the e-commerce platform, the color, size and other attributes of clothes will be selected; The music platform will have a special person to mark the music, such as the author, release time, style and other information.
User tags generally describe the subjective feelings and cognition of current users after consuming goods; For example, after reading the article on the information platform, users can mark whether the article looks good or not; After listening to a song, users can think that the song is sad or quiet.
Label-based recommendation method
As mentioned above, tags can help us make better and more accurate recommendations, which is essentially a way of tag+collaborative recommendation. The general idea is as follows:
① Users like label A and recommend products with label B similar to label A..
② User A and User B have similar interest models, and recommend the items with label A that User B likes to User A..
③ User A's favorite items contain label A, and other items containing label A are recommended.
④ The products that user A likes include label A, and products with similar labels B and A are recommended.
⑤ Recommend by combining the above methods, and each method is given a different weight?
Different user scenarios will have different biases. First of all, it should be defined in different ways. For example, news content platform, news does not involve ordering, so reading, news content users need a certain diversity. If the third recommendation method is directly adopted, it will inevitably lead to a single content and users will soon get bored; But in other scenes, such as labels of certain groups, such as "female" labels, this method can be used; When determining the recommendation strategy, we should consider the user groups and the currently recommended usage scenarios, but the specific effect is a long-term optimization process. After the general algorithm is adjusted, there will be a data fluctuation period of about 7 days, following the principle of AB testing (building a univariate environment), and a relatively accurate effect evaluation can be obtained by looking at the data after 7 days.
Optimization of labels
Besides the adjustment of label recommendation strategy, label optimization is also an important way to optimize the effect of label recommendation. We can improve the accuracy and preference clarity of labels in the following ways:
① Try to provide labels that can reflect users' views and preferences on articles for users to choose from; For example, the label of a song, singer, release year and album belongs to objectivity, and the quiet and sad label can reflect users' views on the project. This collection of supervisor tags helps us to build a user interest model more accurately.
② Improve the accuracy of tag interest; Modeling with tags will bring great weight to popular tags, and the accuracy of user interest model may decline in the long tail state. TF-IDF can be used to reduce the weight of popular labels.
(3) according to the tag similarity and tag expansion associated with tags; When there is no tag similarity, all tags related to user A are just A's direct collection tags. After labeling the similarity, user A's favorite tag species can also quote the similarity tag?
(4) Clean up useless labels. For some stop words with high word frequency, the accuracy of labeling is improved by merging synonyms representing differences.
Note: TF-IDF: If a word or phrase appears frequently in one article but rarely in other articles, it is considered that the word or phrase has good classification ability and is suitable for classification. TFIDF is actually: TF * IDF, TF word frequency ($ Term frequency) and IDF inverse document frequency. High-frequency words in a specific file and low-frequency words in the whole file set can produce TF-IDF with high weight. Therefore, TF-IDF tends to filter out common words and keep important words.
Link: /p/43a76f 1784da