Current location - Trademark Inquiry Complete Network - Futures platform - What is a web crawler? What exactly do you want to learn?
What is a web crawler? What exactly do you want to learn?
Simply put, reptiles are detection machines. Its basic operation is to simulate human behavior, go to various websites, click buttons, check data, or recite the information you see. Like a bug crawling around the building tirelessly.

You can simply imagine that every reptile is your avatar. It's like the Monkey King plucking a tuft of hair and blowing out a bunch of monkeys.

Baidu, which you use every day, is actually using this kind of crawler technology: every day, countless reptiles are posted to various websites to retrieve their information, and then the team is waiting for you to search in light makeup.

The ticketing software is equivalent to spreading out countless avatars, and each avatar helps you constantly refresh the train tickets on the 12306 website. Once you find the ticket, take it away immediately and shout to you: local tyrants come to pay.

So, how terrible is the reptile technology like this once it is used for evil?

Just last weekend, my hacker friend Ryan mysteriously sent me a copy of Chinese Crawler. This guy is mainly responsible for working overtime in Tencent Yunding Lab, and by the way, he developed a lot of black technology with his colleagues. For example, they set up a threat intelligence system, claiming that they can detect what "reptiles" around the world are doing.

I whistled and opened the picture book, but a minute later, I was completely ill.

I saw another "parallel world":

On the network around us, there are many kinds of web crawler. They are different in good and evil, and each has his own thoughts. And the more people's interests, the more reptiles.

Finally, it is found that this is not a China reptile picture book, but a China anxiety picture book.

What we are going to talk about today is related to these apps.

A, reptiles "SAO"

Reptiles can be divided into good and evil.

Search engine crawlers like Google scan the whole web page every few days for everyone to see, and most of the scanned websites are very happy. This species is defined as a "goodwill reptile".

However, a reptile like the ticket grabbing software can't wait to fan tens of thousands of times per second at 12306. Tie always feels unhappy. This species is defined as a "malicious reptile". (Note that it's no use being happy when you get the ticket. If the scanned website feels unhappy, it is malicious. )

Let me show you a photo:

This picture shows the proportion of all walks of life that are "disturbed". Note that this picture shows the whole world, not the whole China. Behind every color block is a real and powerful interest chain.

Next, Brother Zhong will give you a popular science about the Sao operation inside.

1, the number one is travel.

Reptiles account for the highest proportion in the tourism industry (20.87%). Among the traveling reptiles, 89.02% of the traffic is directed to 12306. This is not surprising. It is the only train ticket seller in China.

Do you still remember Wang He and Bai Baihe's "Captcha of the Crampiest Picture in History" in 12306?

These things are not to embarrass the honest ticket sellers, but to stop the crawlers (that is, the ticketing software) from clicking. As I said just now, reptiles only click mechanically. It doesn't know Bai Baihe, so a large number of reptiles are kept out.

You may say, no way, I can still get tickets with the ticketing software now.

That's right. The ticketing software is not vegetarian either. They are confronting the general manager of Iron.

There is something called "coding platform". You can learn about it.

The coding platform employs many uncles and aunts. They don't do anything else in front of the computer screen to help people identify the verification code. When the ticketing software over there encounters the verification code, the system will automatically pass these verification codes to the uncles and aunts, and they will manually choose which one is Bai Baihe and which one is Rodin Wang, and then send the results back. The whole process takes less than a few seconds.

Of course, such a coding platform also has a memory function. If my uncle and aunt have marked this picture as a spatula, the system will directly judge it as a spatula the next time this picture appears. After a long time, the pictures in the 12306 system will be marked, and the machine itself can know, and uncles and aunts can also sit by and fight landlords.

You may ask: Why is 12306 so picky? Will it kill you to let reptiles crawl freely generously?

A: I will die.

Do you know what to order before Chinese New Year 12306 every year? According to public data, "at peak hours 1 day, the number of page views reached 8 1 .34 billion, and the highest number of hits1hour was 5.93 billion, with an average of1.648 million times per second. This is still the data protected by the verification code. You can imagine how many reptiles were intercepted outside.

Besides, it's not discussed here. Is it fair that people like our parents who can't grab tickets have been robbed by the ticketing software?

It's bad enough that the railway was "lit" by reptiles, but there is another one. He is my brother, and that is aviation.

In aviation, Air China, Hainan Airlines and China Eastern Airlines are not the most affected. But airasia.

Distribution proportion of aviation reptiles.

Many people may not have been on AirAsia. This is a Malaysian low-cost airline that basically flies from all parts of China to tourist attractions in Southeast Asia. Even the mineral spring water should be bought at your own expense on the plane, making it the first choice for diaosi poor X holiday.

Why do reptiles like AirAsia so much? Because it's cheap. That's right, because it often gives cheap tickets.

Originally, the original intention of AirAsia was to randomly release some cheap air tickets to attract tourists, but scalpers were profitable.

As far as I know, they play like this:

The technical house scalper constantly refreshes AirAsia's ticket purchasing interface by using reptiles. Once there is a cheap ticket, regardless of the willy-nilly, take the lead.

AirAsia has a rule that if you shoot for half an hour (I can't remember the exact time), you will automatically return to the ticket pool to continue selling. But the scalper wrote the exact time in the crawler script. At half an hour, not a millisecond, he photographed the ticket again, and so on. Until someone booked the ticket from the scalper, the scalper then abandoned the ticket in the AirAsia system with a program, and then booked the ticket for you in your name after 0.0000 1 second.

"I am a middleman, I want to make a difference! This wave of Sao operation is perfect.

2, the second place is social.

The hardest hit area for social reptiles is Weibo, which you love to see and hear.

Let me show you a photo:

This is the address of Weibo frequented by reptiles.