After 17 years of hard work, it became one of the leading trading companies on Wall Street, and the disaster was destroyed in less than an hour.
August 1st, 212 was a nightmare for Knight Capital. A seemingly simple and hard-to-find human error cost 17 years of hard work. Software errors cost the company $44 million in direct transactions within one hour. As a result, Knight Capital was acquired by competitor Getco LLC the following summer.
Knight Capital Group, established in 1995, is a well-known securities company on wall street. At its peak, trading volume accounted for 17.3% of NYSE and 16.9% of NASDAQ.
in addition to the normal brokerage business, the company also provides customers with a trading platform system to serve high-frequency trading. This electronic trading platform driven by embedded quantitative model not only provides customers with decision-making reference according to market data and related information, but also helps customers complete automatic high-speed order placing transactions through the exchange interface (the automatic order placing interface has been closed in China).
The highly programmed quantitative trading system is the basic platform for Knight Capital to serve customers efficiently and with high quality. Simply put, customers who invest in securities, according to the set trading rules, the software can automatically help you do "buy" and "sell" transactions. The trading system is highly intelligent. For large orders, it will be split into small orders to buy different stocks. Let individual stocks not have too big price fluctuations.
first explain a term "dark pool". Suppose I have 2 million A shares in my hand and hope to sell them one day. If you sell directly at the market price of A stock, the stock price will definitely fall due to the large number of selling. Therefore, the concealment of the transaction is more important than the speed of the transaction. However, public investors who hang out through ordinary trading software can see that there are huge orders, and the final transaction price is definitely lower than the price I hang out. In order to reduce this negative impact, I will send the broker an instruction to "sell the block dark". This instruction indicates that the broker sends the order to a number of hidden customers for execution by the counterparty. This "darkening" strategy gives brokers a crucial advantage in terms of anonymity and market shock. Similar to the "non-display" block trade on the floor. This will bring benefits to the stability of stock prices and promote liquidity.
In October, 211, new york Stock Exchange asked brokers to support the retail liquidity plan (RLP), which is the dark pool function.
in early June, 212, NYSE was approved by the securities and futures commission of the United States to provide the "RLP" function, and announced that the RLP function would be launched on August 1, 212.
it only takes more than 3 days for the dealer to debug and go online.
Most of Knight Capital's clients are securities brokers, and there are a large number of financial service giants in cooperation with them. The market requires them not to give up this trading market.
The software development team of Knight Capital has only one month to develop, test and go online. They are working in full swing. The core transaction module they need to modify this time is called SMARS (Intelligent Market Access Routing System).
SMARS executes thousands of orders every second, and can display prices between dozens of different exchanges in a few milliseconds. It can receive orders from upstream users, split them and send them to the exchange for matchmaking.
However, there are still some refactoring codes and test codes left over from the previous version upgrade in their trading system. One of the order algorithm codes named "Power Peg" was a test program written by an engineer at that time, and the instructions executed by the program were the test strategy of buying high and selling low. This practice of keeping "dead code" in the system is very common in large systems.
I don't know whether it is a document error or an engineer's error. The startup switch flag used in the RLP code modified by this upgrade is the same as the switch flag bit of the "Power Peg" algorithm. After the system was upgraded, the algorithm of buying high and selling low was activated.
with the approach of August 1st, engineers manually deployed the new RLP code in SMARS to its eight servers one week before going online. At this time, the engineer made a fatal mistake and did not copy the new code to one of the servers. Their software upgrade has no corresponding audit mechanism, no automatic system reminder mechanism, and no regression test. I went online in a hurry.
on August 1st, at 8:1 am EST, an internal system named BNET automatically generated 97 alarm emails and sent them to the engineers of Knight Capital. But these machine mails didn't get the attention of the staff. It also made Knight Capital miss the last chance to repair the system.
at 9: 3am, the new york Stock Exchange started trading, and the trading system began to receive information RLP orders from brokers, and SMARS distributed the incoming work to its servers. Seven servers with the new RLP code processed the order correctly. However, the order sent to the eighth server has a defective Power Peg code activated by the reuse flag. The server starts sending sub-orders continuously for each incoming parent order, regardless of the number of confirmed executions that Knight has received from other trading places.
a disastrous transaction has begun. Of the 212 incoming parent orders processed by the defective Power Peg code, SMARS sends thousands of sub-orders every second, which will buy high and sell low, resulting in 4 million executions of 154 stocks in about 45 minutes, exceeding 397 million shares. Among them, 75 stocks, Knight Capital pushed up by more than 5%, accounting for more than 2% of the trading volume; The prices of 37 stocks plummeted by 1%, accounting for more than 5% of the trading volume of Knight Capital.
at 9: 34, the computer analyst of new york Stock Exchange noticed that the market trading volume was twice the normal level, traced the soaring trading volume back to Knight Capital, and immediately informed its chief information officer.
Knight Capital quickly called the company's top IT staff, and it took 2 minutes to find out the cause of the problem.
at 9: 5, the NYSE triggered the fuse mechanism and automatically suspended the trading of multiple stocks.
at 9: 58, the engineers of Knight Capital identified the root cause and shut down SMARS; on all servers; However, the damage has been done. Knight executed more than 4 million transactions in 154 stocks, totaling more than 397 million shares; It holds more than $3.5 billion in 8 stocks and about $3.15 billion in 74 stocks.
Knight Capital in Shock With the incident, its share price dropped from $1.33 on August 1st to $2.58. Moreover, heavyweight customers TDA Securities, Pioneer Fund and Wells Fargo Fund all announced that they would stop sending trading orders to Knight Capital. According to the statistics afterwards, within one hour of trading, Knight Capital bought about $7 billion in stock futures. According to the securities trading rules, Knight Capital must pay the fee of $7 billion in three days. Of course, he can't afford it at all.
Knight Capital applied to the exchange to cancel these trading orders, but the chairman of American Stock Exchange cancelled only six of them according to the regulations, and the others did not agree to cancel.
if knight capital sells its stock at a lower price the next day, the market may melt again. In order to stabilize the market, Goldman Sachs agreed to spend $44 million to buy some positions bought by BUG Software in Knight Capital at a discounted price.
A week later, Knight Capital received $4 million in capital support. The following summer, he was acquired by a competitor.
years later, when interviewed by the former CEO of Knight Capital, he still believed that Knight Capital was not a technology company, but an economic company that used technology. Obviously, he regards technology as an auxiliary function, not the core competitiveness of the company. For complex systems with multi-system coupling, the strategies to reduce catastrophic failure are as follows:
1. Consciousness level
Catastrophic failure does not come from outside, but from internal technical failure or our wrong combination.
2. Tools and methods
If Knight Capital can strictly implement the methods of modern software development and operation practice, the incident may not happen. For example, using version control, writing test units, code review, automated testing, automated deployment, distributed deployment process, risk management, etc.
3. Time management
Timetable is another reason why Knight Capital failed to provide RLP solutions. IT project managers and CIOs should postpone overly aggressive delivery plans and use alternative phased plans to confront their business leaders. It is impulsive, naive and reckless to spend 3 days to implement, test and deploy major changes in the algorithmic trading system, which is used to make the market worth billions of dollars every day.
4. Encourage different opinions
Even after warning, you ignored it, so that the last hour of looking for mistakes was wasted. Enterprises must reward efficient reward mechanisms and encourage the effective presentation of different opinions.
One year later, on October 16th, 213, the US Securities and Exchange Commission fined Knight Capital $12 million for the illegal trading on August 1st.