A brief analysis of the working principle of the CPU
A complete microcomputer system includes two parts: a hardware system and a software system. Computer hardware refers to the various physical devices that make up a computer. They are composed of various real devices and are the material basis for the computer to work. The most important component of a computer hardware system is the central processing unit (CPU).
(1) Basic concepts and composition of CPU
Central processing unit, referred to as CPU (Central Processing Unit), is the core of the computer system and mainly includes two arithmetic units and controllers. part. If a computer is compared to a person, then the CPU is the heart, and its important role is evident from this. The internal structure of the CPU can be divided into three parts: the control unit, the logic unit and the storage unit. The three parts coordinate with each other, so that they can analyze, judge, calculate and control the coordinated work of various parts of the computer.
All actions that occur on the computer are controlled by the CPU. The arithmetic unit mainly completes various arithmetic operations (such as addition, subtraction, multiplication, division) and logical operations (such as logical addition, logical multiplication and NOT operations); while the controller does not have arithmetic functions, it only reads various instructions, And analyze the instructions and make corresponding control. Usually, there are several registers in the CPU, which can directly participate in operations and store intermediate results of operations.
The CPUs we often talk about are all X86 series and compatible CPUs. The so-called The CPU in the world's first PC - i8088 (simplified version of i8086) also uses X86 instructions. At the same time, the X87 chip series math coprocessor added to the computer to improve floating point data processing capabilities additionally uses X8 7 instructions. From now on, the X86 instruction set and the X87 instruction set will be collectively referred to as the X86 instruction set. Although with the continuous development of CPU technology, Intel has successively developed newer i80386, i80486 and today's Pentium III series. However, in order to ensure that the computer can continue to run various applications developed in the past to protect and inherit rich software resources, Intel All CPUs produced by the company continue to use the X86 instruction set. In addition to Intel, manufacturers such as AMD and Cyrix have also successively produced CPUs that can use the X86 instruction set. Since these CPUs can run all kinds of software developed for Intel CPUs, people in the computer industry refer to these CPUs as Listed as Intel's CPU compatible product. Since the Intel X8 6 series and its compatible CPUs all use the X86 instruction set, today's huge lineup of X86 series and compatible CPUs has been formed.
(2) Main technical parameters of the CPU
The quality of the CPU directly determines the grade of a computer system, and the main technical characteristics of the CPU can reflect the general performance of the CPU.
The number of binary data bits that a CPU can process simultaneously is one of its most important quality indicators. What people usually call 16-bit computers and 32-bit computers means that the CPU in the microcomputer can process 16-bit and 32-bit binary data at the same time. The early representative IBM PC/XT, IBM PC/AT and 286 machines are 16-bit machines, the 386 machine and 486 machine are 32-bit machines, and the 586 machine is a 64-bit high-end microcomputer.
CPU can be divided into eight-bit microprocessors, 16-bit microprocessors, 32-bit microprocessors and 64-bit microprocessors according to the word length of the information they process.
Bit: Binary is used in digital circuits and computer technology, and the codes are only "0" and "1". Whether "0" or "1" is a "bit" in the CPU.
Bytes and word length: In computer technology, the number of binary digits that the CPU can process at one time per unit time (at the same time) is called the word length. Therefore, a CPU that can process data with a word length of 8 bits is usually called an 8-bit CPU. In the same way, a 32-bit CPU can process binary data with a word length of 32 bits per unit time.
Since commonly used English characters can be represented by 8-bit binary, 8 bits are usually called a byte. The length of bytes is not fixed, and the length of words is different for different CPUs. An 8-bit CPU can only process one unit at a time, while a 32-bit CPU can process 4 units at a time. Similarly, a 64-bit CPU can process 8 bytes at a time.
2. CPU FSB
The CPU FSB is the CPU bus frequency listed in the common characteristics table. It is the base clock frequency provided by the motherboard for the CPU, and the CPU's work The main frequency is obtained by multiplying the external frequency by the multiplication factor. In the Pentium era, the CPU FSB was generally 60/66MHz. Starting from the Pentium II 350, the CPU FSB was increased to 1O0MHz. Since the CPU bus frequency and the memory bus frequency are the same under normal circumstances, when the CPU FSB is increased, the exchange speed with the memory is also increased accordingly, which has a greater impact on improving the overall running speed of the computer.
3. Front-side bus (FSB) frequency
The front-side bus is also what was previously called the CPU bus. Since the front-side bus frequency is the same as the memory bus frequency on various current motherboards, Therefore, it is also the working clock for exchanging data between the CPU, memory and L2 Cache (only Socket 7 motherboard). Since the maximum bandwidth of data transmission depends on the simultaneously transmitted data bit width and transmission frequency, that is, data bandwidth = (bus frequency (data width) / 8. For example, Intel's PII 333 uses a 6 6MHz front-side bus, so the distance between it and the memory Its data exchange bandwidth is 528MB/s = (66×64)/8, while its PⅡ 350 uses a 100MHz front-side bus, so its peak data exchange bandwidth is 800MB/s = (100×64)/8. It can be seen from this. The front-side bus speed will affect the data exchange speed between the CPU and memory (L2 Cache) when the computer is running, which actually affects the overall running speed of the computer. Therefore, Intel is currently starting to change the front-side bus frequency of its P III from 100MHz to 100MHz. 133MHz transition. Although AMD's newly launched K7 uses a front-side bus frequency of 20 MHz, data shows that the data exchange clock between the K7 CPU core and memory is still 100MHz, and the main frequency is also multiplied by 100 MHz. < /p>
4. CPU main frequency
The CPU main frequency is also called the operating frequency, which is the actual operating frequency of the CPU core (integer and floating point arithmetic unit) circuit before the 486 DX2 CPU. The main frequency is equal to the FSB. Starting from 486DX2, basically all CPU main frequencies are equal to the main technical characteristics of the CPU. The main frequency is the clock frequency of the CPU core. The frequency directly affects the computing speed of the CPU.
We know that Pentium alone can execute two computing instructions in one clock cycle. If the Pentium with a main frequency of 100MHz can execute 2 computing instructions in 1 second. 100 million instructions, then a Pentium with a main frequency of 200MHz can execute 400 million instructions per second, so the higher the CPU frequency, the faster the computer will run.
It should be noted that Cyrix's CPU. The main frequency indicator is nominalized by the PR performance level parameter (Performance Rating), which means that the CPU performance at this time is equivalent to the performance of a certain Intel main frequency CPU. The main frequency is not consistent. For example, the actual operating frequency of MⅡ-300 is 233MHz (66×3.5), but the main frequency of the PR parameter is marked as 300MHz, which means that MⅡ-300 is equivalent to Intel's PⅡ-300, but in fact it is only. It's just that the Business Winston indicator (integer performance) of MII-300 is comparable to that of PII-300
5. The capacity and speed of L1 and L2 Cache
The capacity and speed of L1 and L2 Cache. Capacity and working rate play a key role in improving computer speed, especially L2 Cache, which plays a significant role in improving the speed of commercial software that runs a lot of 2D graphics processing.
Setting up L2 Cache started in the 486 era. The purpose is to make up for the lack of L1 Cache (first-level cache) capacity to minimize the delay caused by main memory to CPU operation.
The L2 Cache of the CPU is divided into two types: internal and external. The L2 Cache located in the CPU chip runs at the same speed as the main frequency, while the L2 Cache installed outside the CPU chip using the PII method generally runs at half the main frequency, so its efficiency is higher than the L2 Cache inside the chip. Low, this is an important reason why Celeron only has 128KB on-chip Cache but its performance is almost better than P II with the same main frequency (it has 512KB but has an off-chip L2Cache with half the working clock of the main frequency).
(3) A brief analysis of the main technical terms of the CPU
1. Pipeline technology
Pipeline (pipeline) was first used by Intel in the 486 chip. The assembly line works like an assembly line in industrial production. In the CPU, an instruction processing pipeline is composed of 5 to 6 circuit units with different functions, and then an X86 instruction is divided into 5 to 6 steps and then executed by these circuit units respectively, so that one instruction can be completed in one CPU clock cycle. , thus increasing the computing speed of the CPU. Since the 486CP U has only one pipeline, the five circuit units of fetching instructions, decoding, generating addresses, executing instructions, and writing back data simultaneously execute instructions that have been divided into five steps at the same time. Therefore, the 486CPU designers' expectations of each step are achieved. The purpose of completing one instruction in one clock cycle (according to the author's opinion, the CPU should actually reach the processing speed of completing one instruction per cycle from the fifth clock cycle). In the Pentium era, designers set up two pipelines with independent circuit units in the CPU, so that the CPU could execute two instructions at the same time through these two pipelines when working. Therefore, in theory, it can be implemented in each The purpose of two instructions is completed in a clock cycle.
2. Super pipeline and superscalar technology
Super pipeline refers to the pipeline inside some CPUs exceeding the usual 5 to 6 steps. For example, the pipeline of Pentium pro is as long as 14 steps. step. The more steps (stages) the pipeline is designed to complete, the faster it can complete an instruction, so it can adapt to CPUs with higher operating frequencies. Superscalar means that there is more than one pipeline in the CPU and more than one instruction can be completed per clock cycle. This design is called superscalar technology.
3. Out-of-order execution technology
Out-of-order execution (out-of-order execution) means that the CPU uses a method that allows multiple instructions to be sent separately to each instruction in an order that is not specified by the program. The technology of corresponding circuit unit processing. For example, if there are 7 instructions in a certain section of the program, the CPU will immediately send the instructions that can be executed in advance to the corresponding circuit for execution based on the analysis of the idle status of each unit circuit and the specific situation of whether each instruction can be executed in advance. Of course, after each unit executes instructions out of the specified order, the corresponding circuit must re-arrange the operation results in the order of instructions specified by the original program before returning to the program. This kind of operation method in which instructions are separated and executed out of order is called out-of-order execution (also called out-of-order execution) technology. The purpose of using out-of-order execution technology is to make the CPU's internal circuits operate at full capacity and accordingly increase the speed of the CPU's running programs.
4. Branch prediction and speculative execution technology
Branch prediction and speculative execution are the main contents of CPU dynamic execution technology. Dynamic execution It is one of the advanced technologies currently used in CPUs. The main purpose of using branch prediction and dynamic execution is to increase the computing speed of the CPU. Speculative execution is based on branch prediction. The processing performed after the branch prediction program branches is also speculative execution.
5. Special instruction extension technology
Starting from the simplest computers, instruction sequences can obtain operands and perform calculations on them.
On most computers, these instructions can only perform one calculation at a time. To complete some parallel operations, multiple calculations must be performed continuously. This type of computer uses a "Single Instruction Single Data" (SISD) processor. When introducing CPU performance, "extended instructions" or "special extensions" are often mentioned, which all refer to whether the CPU has instruction extensions to the X86 instruction set. The first extension instruction to appear was Intel's own "MMX", followed by AMD's "3D Now!", and finally the "SSE" in the recent Pentium III.
MMX and SSE: MMX is the abbreviation of "Multimedia Instruction Set" in English. *** has 57 instructions and is Intel's first expansion of the X86 instruction set that has been finalized since 1985. MMX is mainly used to enhance the CPU's processing of multimedia information and improve the CPU's ability to process 3D graphics, video and audio information. However, since only integer operations are optimized, floating point computing capabilities are not enhanced. Therefore, as 3D graphics become more and more widespread and the application of 3D web pages on the Internet increases, MMX is already unable to do what it wants. The MMX instruction can perform SIMD operations on integers, such as -40, 0, 1, 469 or 32766, etc.; the SSE instruction increases the SIMD operation capability on floating point numbers, such as -40.2337, 1.4355 or 87734 3226.012, etc. With MMX and SSE, one instruction can perform calculations on more than 2 data streams. Taking the previous example, instead of executing 529,000 instructions per second, we only need to execute 264,600. Because the same command can affect the left and right channels at the same time. When displaying, there is no need for 70778880 instructions per second, only 23592960, because the red, green, and blue channels can all be controlled by the same instructions.
SSE: SSE is the abbreviation of "Internet Streaming Single Instruction Sequence Extensions/Internet Streaming SIMDExt ensions" in English. It was first used by Intel Corporation in Pentium III. In fact, the originally rumored MMX2 was later called KNI (Katmai New Instruction), and Katmai is actually the current Pentium III. SSE*** has 70 instructions, which not only cover all the functions of the original MMX and 3D Now! instruction sets, but also especially strengthen the SIMD floating point processing capabilities. In addition, it also specifically strengthens the CPU to process 3D in response to the current growing development of the Internet. Web page and other audio and video information technology processing capabilities. After the CPU has a special extended instruction set, it must also function with the corresponding support of the application. Therefore, when the current most advanced Penthm III 450 and Pentium II 450 run applications that also do not have extended instruction support, the speed between them The difference is not big.
In addition to maintaining the original MMX instructions, SSE has added 70 new instructions. While speeding up floating point operations, it also improves memory usage efficiency, making the memory speed appear faster. The improvement in game performance is very significant. According to Intel, SSE has a particularly obvious impact on the following areas: 3D geometric operations and animation processing; graphics processing (such as Photoshop); video editing/compression/decompression (such as MPEG and DVD) ); speech recognition; and sound compression and synthesis, etc.
3D NOW!: A multimedia extended instruction set developed by AMD, with a total of 27 instructions. In view of the weakness of the MMX instruction set that does not enhance floating point processing capabilities, it focuses on improving the performance of AMD's K6 series CPUs. 3D graphics processing capabilities, but due to limited instructions, this instruction set is mainly used in 3D games and has insufficient processing support for other commercial graphics applications.
(4) CPU production process and product architecture
1. CPU production process
Among the parameters indicating CPU performance, "process technology" is often included , including "0.35um" or "0.25um" etc. Generally speaking, the smaller the data in "Process Technology" is, the more advanced the CPU production technology is. Currently, CMOS technology is mainly used in the production of CPUs.
CMOS is the English abbreviation of "Complementary Metal Oxide Semiconductor". When using this technology to produce CPUs, a "light knife" is used to process various circuits and components. Metal aluminum is deposited on the silicon material and then the "light knife" is used to carve wires to connect the components. Nowadays, the accuracy of photolithography is generally expressed in microns (um). The higher the accuracy, the more advanced the production process is. Because the higher the precision, more components can be produced on the same volume of silicon material, and the processed connecting lines are thinner, so the CPU produced can have a very high operating frequency. Because of this, the first-generation Pentium CPU produced when only the 0.65um process could be used had a working main frequency of only 60/66MHz. Subsequently, the production process gradually developed to 0.35um and 0.25um, so the working main frequency was also produced accordingly. Pentium MMX up to 266MHz and Pentium II CPU clocked up to 500MHz. Due to current limitations of science and technology, the current CPU production process can only reach 0.25um, so Intel, AMD, Cyrix and other companies are moving to 0.18um and copper wires (using metallic copper to precipitate on silicon materials instead of the original aluminum) Due to technical efforts, it is estimated that as long as the production process reaches 0.18um, it will be very common to produce a CPU with a main frequency of 1000MHz.
In order to continue competing with Intel for the right to develop microprocessors in the next century, AMD has reached a seven-year technology cooperation agreement with Motorola. Motorola will license its newly developed copper wire process technology (Copper Interconnect) to AMD. AMD is preparing to manufacture K7 microprocessors up to 1000MHz (1GHz) within the year 2000. CPUs will move toward faster, 64-bit structures. The manufacturing process of CPU will be more refined and will transition from the current 0.25 micron to 0.18 micron. By the middle of 2000, most CPU manufacturers will adopt the 0.18 micron process. After 2001, many manufacturers will switch to the 0.13 micron copper manufacturing process. Improvements in manufacturing technology mean smaller size, higher integration, and less power consumption. The advantages of copper technology are very obvious. It is mainly reflected in the following aspects: the conductivity of copper is better than that of aluminum that is commonly used today, and copper has small resistance and low heat generation, which can ensure the reliability of the processor in a wider range; use copper below 0.13 microns and Process chip manufacturing technology will effectively increase the operating frequency of the chip and reduce the size of the existing die. Compared with traditional aluminum process technology, copper process chip manufacturing technology will effectively increase the speed of the chip and reduce the chip area. From a development perspective, the copper process will eventually replace the aluminum process.
Each CPU produced by each manufacturer has a name (trade name), code (development code) and logo (special pattern). Among them, Intel's early products were named after i80x86, namely the previous 286, 386, 486, etc. When Intel developed the fifth generation product 586, it was changed to Pentium due to trademark registration troubles and the Chinese trademark name "Pentium" was also registered for it. ”, and thus came the Pentium Pro (high-power Pentium), Pentium II (Pentium 2nd generation), Pentium III (Pentium 3rd generation) and Celeron (Celeron). The current names do not reflect the CPUs of the same type. Specifications, this will start to improve after Intel officially launches PIII with a front-side bus of 133MHz. In the future, you can understand the general technical characteristics of this CPU just by seeing the name of the CPU.
In addition, manufacturers have another development code for each type of CPU including products with the same name but different technical specifications. For example, Intel's PⅡ produced using 0.35 and 0.25 processes each has a code name: Klamath and Destrutes. At the same time, each name of Itel's CPU has a special trademark pattern as a symbol. The situation of AMD and Cyrix is ??similar to Intel. Each of their CPUs also has a name, code name and logo, but they do not have an official Chinese name yet.
2. Internal structure of the CPU
The internal structure of the CPU we currently use can actually be divided into two structures: single bus and dual bus. Due to the internal structure characteristics of the CPU, the packaging form of the CPU is determined. and installation specifications, so here is a brief introduction.
Before Intel developed the Pentium Pro, various CPUs above 486, such as the classic Pentium, consisted of a main processor, a math coprocessor, a controller, various registers and an L1 Cache. There are still a large number of CPUs that continue to be produced with this internal structure model, such as AMD's K6-2, Cyrix's MⅡ and IDT-C6 and other CPUs. Starting from P6 (the development code name of Pen-tium Pro), in order to further improve the data exchange speed between the CPU and L2 Cache, Intel adopted the cache control circuit and L2 Cache (secondary cache) originally installed on the computer motherboard. The method made on the same piece of silicon material is integrated into the CPU chip, so that the data exchange between the CPU core and the cache does not need to go through the external bus but directly through the cache bus inside the CPU. Since the CPU core is connected to the memory and the CPU is connected to the high-speed The data exchange channels between caches are separated to form the original P6 dual-bus architecture mode (see Figure 1). Judging from the actual application results of Pentium Pro, this technical measure is very successful and is a major improvement in CPU development technology. Due to the superiority of the P6 dual-bus structure, all CPUs with internal L2 Cache and cache controllers have transitioned from the traditional single-bus mode to the dual-bus mode, such as Intel's P II, new Celeron and P III; AMD's K6-III and K7, etc.
3. CPU architecture and packaging method
The CPU architecture is determined according to the type and specifications of the CPU’s installation socket. Currently commonly used CPUs can be divided into two architectures: Socket x and Slotx according to their installation socket specifications.
The Socket x architecture CPU is divided into two types: Socket 7 and Socket 370, which are installed using the 321-pin Socket 7 and 370-pin Socket370 sockets respectively. Socket 7 and Socket 370 are very similar in appearance and have the same size, but Socket 370 has one more pin jack than Socket 7. The Slot x architecture CPU can be divided into three types: Slot 1, Slot 2 and Slot A, which are installed using corresponding Slot slots. Slot 1 and Slot A are both 242-line slots, but they are different in mechanical and electrical standards, so they are incompatible with each other. Slot 2 is a larger slot specifically used to install Xeons in the P Ⅱ and P Ⅲ sequences. Xeon is a CPU designed for workgroup servers.
Encapsulation is the last process in the CPU production process. Encapsulation is a protective measure that uses specific materials to solidify the CPU chip or CPU module in it to prevent damage. Generally, the CPU must be encapsulated before it can be delivered to the user. .
The packaging method of the CPU depends on the CPU installation form and device integration design. Usually, the CPU installed using the Socket socket can only be packaged using the PGA (Grid Array) method, while the CPU installed using the Slot x slot can All are packaged in the form of SEC (Single Side Connector Box). Currently, the CPUs that use PGA packaging mainly include Intel's Celeron, AMD's K6-2, K6-Ⅲ and Cyrix's MⅡ. In the past, Celeron used to use SEC packaging, but now they have gradually switched to PGA packaging (see Figure 4) . CPUs using SEC packaging include Intel's PⅡ, PⅢ and AMD's K7. Among them, Intel's Slot architecture CPU is actually packaged using three single-sided connector boxes: SEPP, SECC and SECC2.
Although the Celeron and K6-III of the above CPUs integrate 128KB and 256KB L2 Cache and cache controller respectively, they use the same piece of silicon material to manufacture the CPU core and L2 Cache and cache controller are manufactured using the same method, so they are smaller in size and can be packaged in PGA. However, the main reason why Celeron uses PGA packaging is to reduce production costs, and the main reason why K6-III uses PGA packaging is because Intel patents the Slot 1, Slot 2 and Socket 370 sockets it developed, so AMD can only K6-III is produced using Socket 7 architecture and PGA packaging.
There are currently two manufacturing methods for Slot architecture CPUs. One is to install the separately manufactured CPU core chip, high-speed Cache controller chip and L2 Cache chip on a PCB (circuit board), and then install them. Install the single-sided connector box and fan to complete the final production of the CPU. CPUs made using this type of structure and method include Intel's PⅡ, PⅢ and AMD's K7. The second is to install the complete CPU (including CPU core, high-speed Cache controller chip and L2Cache chip) chip on the circuit board. At this time, the circuit board only plays the role of installing the Slot interface. Finally, install the single-sided connector box and fan to form a complete CPU. The only CPUs made using this structure and method are some of Intel's Celerons.