Random forest refers to a classifier that uses multiple trees to train and predict samples. The classifier was first proposed by Leo Breiman and Adele Cutler and registered as a trademark.
In machine learning, random forest is a classifier that contains multiple decision trees, and the output category is determined by the category mode of each tree. Leo Breiman and Adele Cutler developed an algorithm to infer random forests. "Random Forest" is their trademark. ?
This term comes from the stochastic decision forest proposed by Tin Kam Ho of Bell Laboratories in 1995.
This method combines the idea of "guided aggregation" of Breimans and "random subspace method" of Ho to establish a set of decision trees.
Learning algorithm:
Each tree is built according to the following algorithm:
1.n represents the number of training cases (samples), and m represents the number of features.
2. Input the feature number m, which is used to determine the decision result of a node on the decision tree; Where m should be much smaller than m.
3. Extract n times from n training cases (samples) by putting back samples, form a bootstrap sampling, predict with the non-extracted use cases (samples), and evaluate its error.
4. For each node, randomly select M features, and the decision of each node in the decision tree is based on these features. According to these m features, the best splitting mode is calculated.
5. Each tree will grow completely without pruning, which can be adopted after establishing a normal tree classifier.