Surreptitious software is an emerging branch in the field of computer security research in the past decade. In the process of researching covert software, we not only need to draw on computer security technologies, but also use a large number of technologies in other fields of computing science, such as cryptography, steganography, digital watermarking, software metric, reverse engineering and compilation. Server optimization, etc. We use these technologies to satisfy the need to securely store secret information in computer programs, although the manifestations of these needs vary widely and in different ways. The word "secret" in this book has a broad meaning. The technologies introduced in the book (code obfuscation, software watermarks and fingerprints, anti-tampering technology, software "birthmarks", etc.) are used to prevent others from plagiarizing the intellectual achievements in the software. For example, the use of fingerprint technology in software can be used to track whether the software has been pirated, code obfuscation technology can make it more difficult for attackers to reverse analyze the software, and anti-tampering technology can make it difficult for others to create cracked versions of the software, etc.
Okay, now let’s talk about why you need to read this book, who uses covert software, and what this book will cover.
Why read this book
Unlike traditional security research, covert software does not care about how to protect computers from computer viruses. It cares about how the author of computer viruses prevents others from invading. Analyzing viruses! Similarly, we don't care about whether the software has security vulnerabilities. What we care about is how to covertly add some code to the program that will only be executed when the program is tampered with. In the field of cryptography research, the security of encrypted data depends on the confidentiality of the encryption key, and what we are studying now is exactly how to hide the key. There are a lot of software measurement techniques in software engineering to ensure that programs are well structured, and in this book we will use the same techniques to make programs complex and difficult to read. Many of the techniques described in this book are based on algorithms developed through compiler optimization technology research. However, the purpose of compilation optimization is to make the compiler generate programs that are as small as possible and run as fast as possible. Using some of the techniques introduced in this book However, the generated program will be large and slow to execute. Finally, traditional digital watermarking and steganography try to hide the information to be hidden in images, audios, videos and even plain text files, while covert software hides the information to be hidden in computer code.
So why should you read this book? Why learn about a security technology that doesn't protect your computer from viruses or worms? Why learn a compilation optimization technique that will only make the code size larger and execution slower? Why spend energy on a branch of cryptography that violates the basic premise of cryptography: that the key cannot be obtained by an attacker?
The answer is that traditional computer security and cryptography research results sometimes cannot solve the security problems encountered in actual work and that need to be solved urgently. For example, this book will show how to use software watermarking technology to prevent software piracy. A software watermark is a unique identifier (similar to a credit card number or a copyright statement) embedded in a program that links a copy of the program to you (the author of the program) or the customer. If you find that pirated CDs of your software are being sold on the market, you can use the watermark extracted from the pirated software to trace the master copy of the pirated software that was originally bought from you. When providing beta versions of newly developed games to partners, you can also add a digital watermark to the beta version. If you feel like someone has leaked your code, you can identify the culprit (among many partners) and take him to court.
For another example, if a new algorithm is added to a new version of the program, you certainly don't want your competitors to get this algorithm and add it to their software. At this time, you can obfuscate the program and make it as complex and difficult to understand as possible, making it inefficient for competitors to reverse engineer the software. And if you really suspect that someone has plagiarized your code, this book will also teach you how to use software "birthmarks" to confirm your suspicions.
For another example, your program contains a certain piece of code that cannot be known, and you want to ensure that the program cannot run normally without this piece of code. For example, you don't want a hacker to modify the software in a program to use license verification code, or a key that can be used to decrypt MP3 files in a digital rights management system. Chapter 7 will discuss various anti-tampering techniques to ensure that tampered programs stop functioning properly.
I heard you put the key in the executable file? What a terrible idea! Past experience tells us that any approach similar to "non-disclosure, security" will eventually end in failure, and no matter how the key is hidden in the program, it will not escape the hands of a tenacious enough reverse engineer. Of course, you must admit that what you did is still right. None of the techniques described in this book guarantee that your software will always be safe from hackers. There is no need to guarantee that something will always be kept secret, there is no need to guarantee that the program will never be tampered with, and there is no need to guarantee that the code will never be plagiarized. Unless there is any major breakthrough in this research field, all that can be expected is to delay the opponent's attack. Our goal is to slow down the attacker's attack speed to a low enough level that he will feel that attacking your software is too painful or the price will be too high to give up the attack. It's also possible that the attacker patiently took a long time to break through your defenses, but by then you've made enough money from the software, or you've used a newer version of the code (then what he got is no longer the same) It’s worthless).
For example, you are a pay channel operator, and users watch the TV programs you provide through set-top boxes. Each set-top box is tagged - somewhere in the code is stored a unique identifier (ID) assigned to each user, so you can decide whether to allow or deny a specific user based on their payment status. Watch programs on the channel. But now a hacker group has found and disassembled this code, discovered the algorithm for calculating user IDs, and sold the method of modifying user IDs to netizens at a low price online. What should you do at this time? You may have thought of using a tamper-resistant smart card, but this is not as difficult to crack as it seems, as will be explained in Chapter 11. Or you might want to obfuscate the code to make it more difficult to analyze. Or you can use tamper-proof technology to automatically stop the program as soon as it is modified. More likely, you'll use a mix of the above techniques to protect your code. But despite all the technology, you have to know and accept the fact that your code can still be cracked and secrets can still be revealed (in this case the user ID on the set-top box can still be tampered with). How could this happen? This is simply because the idea of ??"secret, safe" is fundamentally flawed. However, since all the techniques introduced in this book cannot give you a "perfect and long-term security guarantee", then why should you use these techniques and why should you buy such a book? The answer is simple, the longer the code can withstand hacker attacks, the more customers will subscribe to the channel, and the longer the cycle of upgrading the set-top box will be, the more money you will make and save.
It’s that simple.
Who uses cloaking software
Many well-known companies have a strong interest in cloaking software. It's hard to truly grasp the extent to which the technology in question is actually used in practice (because most companies are extremely tight-lipped about how they protect their code), but we can still classify them based on their patent applications and ownership. The level of interest in covert software is highly guessable. Microsoft has several success stories about software watermarks, code obfuscation and software "birthmarks" to start a company. Apple has a patent on code obfuscation, presumably to protect its iTunes software. Convera, an independent company from Intel, focuses on researching code tamper-proof technology for digital rights management, that is, hiding encryption algorithms and keys in program code.
In December 2007, Cloakware was acquired by Irdeto, a Dutch company specializing in pay TV business, for US$72.5 million. Even relative newcomer Sun Microsystem has filed some patent applications in the field of code obfuscation.
Skype’s VoIP client also uses code obfuscation and anti-tampering technologies similar to Arxan [24], Intel [27] and [89] to be mentioned in this book to prevent reverse engineering. For Skype, protecting the integrity of its client is undoubtedly extremely important, because once someone successfully reverse-engineers its client software and parses out the network protocols used by Skype, hackers can write cheap code that can work with Skype. Software that allows for normal communication (so that people don't have to use Skype). Therefore, keeping the network protocol private will help Skype have a large user base. This is probably the reason why eBay acquired Skype for US$2.6 billion in 2005. In fact, the use of covert software technology also bought Skype enough time to become a leader in VoIP technology. Even if the Skype protocol is analyzed at this time (hackers have indeed done this, see Section 7.2.4 for details), hackers will not be able to come up with a similar software that can shake Skype's market position.
Academic researchers have studied covert software technology from various angles. Some researchers with a background in compiler and programming language research, such as ourselves, will naturally join the research in this field, because most algorithms involving code transformation will involve static analysis problems, and this problem is compilation This is all too familiar to researchers of optimization techniques. Although in the past, most cryptography researchers disdained to study the issue of "non-disclosure, that is, security", recently some cryptography researchers have begun to apply cryptography-related techniques to software watermarks and discover the limitations of code obfuscation technology. . Researchers from the fields of multimedia watermarking, computer security, and software engineering have also published extensively on covert software. Unfortunately, the research progress in this field has been greatly delayed due to the lack of specialized journals and academic conferences for researchers to communicate with each other. In fact, researchers have been working hard to get these research results accepted by traditional academic conferences and journals, and are still working hard. Academic conferences that have published covert software research results include the ACM Symposium on POPL (Principles of Programming Languages, Programming Principles), Information Hiding Symposium, IEEE Software Engineering Symposium, Advanced Cryptography Conference (CRYPTO), ISC (Information Security Conference) and other academic conferences on digital rights management. As the field of covert software becomes more and more mainstream in academic research, we hope to have journals, symposiums, and even seminars dedicated to covert software, but unfortunately this has not happened so far.
The military also spends a lot of effort (and taxpayer money) on covert software. For example, the software watermarking algorithm [95] patent owned by Cousot belongs to the French Thales Group, the ninth largest defense engineering contractor in the world. The following is a text quoted from the latest (2006) US military bidding document [303] about AT (anti-tamper) technology research.
Now, all US military project execution departments (PEOs) and project managers (PMs) must use the AT strategies developed by the military and the Department of Defense in the system when designing and implementing relevant systems. Embedded software is the core of modern weapons systems and is one of the most important technologies to be protected. AT technology can effectively ensure that these technologies are not used by other countries (people) for reverse engineering analysis. Code that is only compiled by a standard compiler without AT technology protection is easily reverse-analyzed.
When analyzing software, reverse engineering analysts use a combination of tools such as debuggers, decompilers, disassemblers, etc., as well as various static and dynamic analysis techniques. The purpose of using AT technology is to make reverse engineering more difficult, thereby preventing the United States' technological advantages from being stolen by other countries. In the future, it will be necessary to provide the PEO and PM of the army with a more useful, effective and diverse AT tool set...The purpose of developing AT technology is to provide a high-strength shell that can withstand reverse engineering analysis, thereby maximizing the delay of the enemy Attacks on protected software. In this way, the United States has the opportunity to maintain its advantages in high-tech fields or slow down the leakage of its weapons technology. Ultimately, the US military will be able to continue to maintain its technological superiority, thus ensuring its absolute superiority in armaments.
This bidding document comes from the US Military Missile and Space Program (Design Department) and focuses on the protection of real-time embedded systems. We have reason to believe that the reason for this bidding document is that the US military is worried that the missiles fired at the enemy will fail to explode after landing for various reasons, giving the enemy the opportunity to access the controls embedded in the missiles that are responsible for guiding the missiles to fly over the target. software.
The following is another quote from the U.S. Department of Defense[115].
Proactive Software Protection (SPI) is one of the responsibilities of the Department of Defense, which must develop and deploy related protection technologies to ensure the security of computer programs containing critical information on defense weapons systems. SPI provides a brand-new security protection method. It does not protect the security of computers or networks (like traditional security technologies), but only enhances the security of the computer program itself. This new approach could significantly improve the Department of Defense's information security posture. SPI has a wide range of applications, and all programs from desktop computers to supercomputers can be protected using SPI technology. It's a complete layer (in software protection technology) and an example of "defense in depth." SPI technology is a supplement to traditional security technologies such as network firewalls and physical security, but its implementation does not rely on these traditional security devices. SPI technology is now deployed in selected HPC centers and more than 150 Department of Defense agencies and other military bases built and maintained by commercial companies. Extensive deployment of SPI technology will effectively enhance the protection of critical application technologies by the United States and the U.S. Department of Defense.
.What does the above passage illustrate? It shows that the U.S. Department of Defense is not only concerned about whether missiles will fall into enemy territory, but also about the safety of software running in its own computer centers with high security and performance. In fact, theft and counter-espionage are eternal themes among anti-espionage agencies and intelligence agencies. For example, if a program on a fighter jet needs to be updated, then we will most likely use a laptop to connect to the fighter jet to perform the update operation. But what happens if the laptop is accidentally lost, or is simply controlled by other governments using some method, as is often shown in movies? The other party will immediately take the relevant code for reverse engineering analysis, and use the analysis results to improve the software used in its fighter jets. What's more, the other party will quietly add a Trojan horse to your software and make the plane fall from the sky at a specific time. If we cannot absolutely guarantee that the above scene is 100% impossible to occur, covert software can at least serve as the last line of defense for security (at least it can also be held accountable afterwards). For example, software in an airplane could create a fingerprint signature with the ID of the person who has access to the software in question. If one day these codes are found on fighter jets of other countries, they can be immediately reverse-engineered and deduced who is the culprit of the leak.
What? I hear you say, why should I be interested in how governments and business giants protect their secrets? If hackers crack these software, they will only get some meager profits through their own labor. Having said that, the benefits these protection technologies bring to you ultimately outweigh the benefits they bring to business giants.
The reason is that for you, legal forms of protection (such as patents, trademarks, and copyrights) only work if you have the financial resources to take down the other party in court. In other words, even if you think that a large company has plagiarized a very "money-making" idea by cracking your code, you will not be able to sue Microsoft in court through a marathon lawsuit unless you have enough money. The economic strength can survive in this competition of financial resources. The protection techniques discussed in this book, such as code obfuscation and tamper-proofing, are cheap and easy to use, and can be used by small and medium-sized enterprises as well as commercial giants. And if you go to sue this big company at this time, you can also use technologies such as watermarks or software "birthmarks" to present real evidence that the code has been plagiarized on the spot in court.
Finally, I have to briefly mention another type of people who are extremely good at using covert software - bad guys. Virus authors have been very successful in using code obfuscation technology to disguise virus codes so that they can evade detection by anti-virus software. It’s worth mentioning that hackers often break hackers when they use these techniques (such as protecting DVDs, games, and cable TV), but they are more difficult to fight when hackers use these techniques (such as building malware).
Contents of this book
The purpose of covert software research is to invent a method that can delay the progress of the opponent (reverse engineering analysis) as much as possible, while also minimizing the number of errors in the program due to the use of this technology. Algorithms that add computational overhead when executing. At the same time, we also need to invent an evaluation technology that allows us to say "After using algorithm A in the program, compared with the original program, it takes T units more time for hackers to break the new program, and the increased performance overhead of the new program is 0", or at the very least we should be able to say "Compared to algorithm B, code protected using algorithm A is more difficult to break." It should be especially emphasized that covert software research is still in its infancy. Although we will introduce all relevant protection algorithms and evaluation algorithms to everyone in the book, the current status of this art is not ideal yet (you can’t be too optimistic by then). Disappointed).
In this book, we try to organize all current research results on covert software and introduce them to readers in a systematic way. We strive to cover a technology in each chapter and describe its application areas and currently available algorithms. Chapter 1 will give some basic concepts in the field of covert software; Chapter 2 uses adversarial demonstration mode to introduce the tools and techniques commonly used by hackers when reverse engineering software, and then introduces how to prevent hacker attacks based on these tools and techniques; Chapter 2 Chapter 3 details the techniques used by hackers and software defenders to analyze computer programs; Chapters 4, 5 and 6 respectively introduce algorithms related to code obfuscation; Chapter 7 introduces algorithms related to tamper-proof technology; Chapter 7 Chapters 8 and 9 introduce algorithms related to watermarks respectively; Chapter 10 introduces algorithms related to software “birthmarks”; Chapter 11 describes software protection technology based on hardware devices.
If you are a business manager and are just interested in the current research status of covert software and how these technologies can be applied to your projects, then just read Chapters 1 and 2. If you are a researcher with a background in compiler design, it is recommended to jump directly to Chapter 3 and start reading. However, subsequent chapters are best read in order. This is because...well, let's give an example. The chapter introducing watermarking technology will use the knowledge introduced in the code obfuscation chapter. Of course, during the writing of this book, we have tried to make each chapter self-contained, so (if you have some background knowledge) it is not a bad idea to skip a chapter or two occasionally. If you are an engineer and want to use relevant technologies to strengthen your software, it is strongly recommended that you read all the contents of Chapter 3 carefully. If possible, you should also read a few more textbooks on compilation principles. Supplement your knowledge of "Program Static Analysis". Then you can jump to the chapters that interest you. If you are a college student and read this book as a textbook for a course, you should read it completely page by page, and don't forget to review it at the end of the semester.
I hope this book can do two things.
First of all, I hope to prove to you, dear reader, that there are a lot of wonderful ideas in code obfuscation, software watermarking, software "birthmarking" and anti-tampering technologies that are worth your time to learn, and that these technologies can also be used to protect the software. Secondly, I hope that this book can bring together all the current useful information in this field, thereby providing a good starting point for in-depth research on covert software.
Christian Collberg and Jasvir Nagra
February 2, 2009 (Groundhog Day)
P.S. There was actually a third one writing this book Purpose. If while reading this book, you suddenly have a brilliant idea, which inspires your ambition to devote yourself to covert software research, then, dear reader, my third goal will be achieved. Please tell us about your new algorithm and we will add it to the next edition of this book!