Current location - Trademark Inquiry Complete Network - Tian Tian Fund - Python, Node.js, which is more suitable for writing reptiles?
Python, Node.js, which is more suitable for writing reptiles?
Simple directional crawling:

Python + urlib2 + RegExp + bs4

or

Node.js+co, any dom framework or html parser +Request+RegExp is also convenient.

For me, the above two options are almost equivalent, but mainly because I am familiar with JS, and now I will choose more node platforms.

Crawling on the scale of the whole station:

Python + Scrapy

If the DIY spider in the above two schemes is millet plus rifle, then Scrapy is simply a heavy cannon, which is extremely useful, customized crawling rules, http error handling, XPath, RPC, pipeline mechanism and so on. And because Scrapy is based on Twisted, it is also very efficient. Relatively speaking, the only drawback is that the installation is more troublesome and the dependence is stronger. I am still a relatively new osx, so I can't install scrapy directly in pip.

In addition, if xpath is introduced into spider and xpath plug-ins are installed on chrome, the parsing path will be clear at a glance and the development efficiency will be extremely high.