Results 1 - 10
of
140
Automatic Analysis of Malware Behavior using Machine Learning
, 2009
"... Malicious software—so called malware—poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in form of computer viruses, Internet worms and ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
(Show Context)
Malicious software—so called malware—poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable to process the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing an accurate discovery and discrimination of novel malware variants.
Loop-extended symbolic execution on binary programs
- In Newsome, McCamant, and Song: Measuring Channel Capacity to Distinguish Undue Influence 12 2009/6/8 Symposium on Software Testing and Analysis (ISSTA
, 2009
"... Mixed concrete and symbolic execution is an important technique for finding and understanding software bugs, including securityrelevant ones. However, existing symbolic execution techniques are limited to examining one execution path at a time, in which symbolic variables reflect only direct data de ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
(Show Context)
Mixed concrete and symbolic execution is an important technique for finding and understanding software bugs, including securityrelevant ones. However, existing symbolic execution techniques are limited to examining one execution path at a time, in which symbolic variables reflect only direct data dependencies. We introduce loop-extended symbolic execution, a generalization that broadens the coverage of symbolic results in programs with loops. It introduces symbolic variables for the number of times each loop executes, and links these with features of a known input grammar such as variable-length or repeating fields. This allows the symbolic constraints to cover a class of paths that includes different numbers of loop iterations, expressing loop-dependent program values in terms of properties of the input. By performing more reasoning symbolically, instead of by undirected exploration, applications of loop-extended symbolic execution can achieve better results and/or require fewer program executions. To demonstrate our technique, we apply it to the problem of discovering and diagnosing buffer-overflow vulnerabilities in software given only in binary form. Our tool finds vulnerabilities in both a standard benchmark suite and 3 real-world applications, after generating only a handful of candidate inputs, and also diagnoses general vulnerability conditions.
Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves
"... Cross-site scripting defenses often focus on HTML documents, neglecting attacks involving the browser’s contentsniffing algorithm, which can treat non-HTML content as HTML. Web applications, such as the one that manages this conference, must defend themselves against these attacks or risk authors up ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
(Show Context)
Cross-site scripting defenses often focus on HTML documents, neglecting attacks involving the browser’s contentsniffing algorithm, which can treat non-HTML content as HTML. Web applications, such as the one that manages this conference, must defend themselves against these attacks or risk authors uploading malicious papers that automatically submit stellar self-reviews. In this paper, we formulate content-sniffing XSS attacks and defenses. We study contentsniffing XSS attacks systematically by constructing highfidelity models of the content-sniffing algorithms used by four major browsers. We compare these models with Web site content filtering policies to construct attacks. To defend against these attacks, we propose and implement a principled content-sniffing algorithm that provides security while maintaining compatibility. Our principles have been
Measuring Channel Capacity to Distinguish Undue Influence
"... The channel capacity of a program is a quantitative measure of the amount of control that the inputs to a program have over its outputs. Because it corresponds to worst-case assumptions about the probability distribution over those inputs, it is particularly appropriate for security applications whe ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
(Show Context)
The channel capacity of a program is a quantitative measure of the amount of control that the inputs to a program have over its outputs. Because it corresponds to worst-case assumptions about the probability distribution over those inputs, it is particularly appropriate for security applications where the inputs are under the control of an adversary. We introduce a family of complementary techniques for measuring channel capacity automatically using a decision procedure (SAT or #SAT solver), which give either exact or narrow probabilistic bounds. We then apply these techniques to the problem of analyzing false positives produced by dynamic taint analysis used to detect control-flow hijacking in commodity software. Dynamic taint analysis is based on the principle that an attacker should not be able to control values such as function pointers and return addresses, but it uses a simple binary approximation of control that commonly leads to both false positive and false negative errors. Based on channel capacity, we propose a more refined quantitative measure of influence, which can effectively distinguish between true attacks and false positives. We use a practical implementation of our influence measuring techniques, integrated with a dynamic taint analysis operating on x86 binaries, to classify tainting warnings produced by vulnerable network servers, such as those attacked by the Blaster and SQL Slammer worms. Influence measurement correctly distinguishes real attacks from tainting false positives, a task that would otherwise need to be done manually.
Binary code extraction and interface identification for security applications
- In ISOC NDSS’10
, 2010
"... Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security appli ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
(Show Context)
Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security applications, including reusing the proprietary cryptographic or unpacking functions from a malware sample and for rewriting a network dialog. In this paper we conduct the first systematic study of automated binary code reuse and its security applications. The main challenge in binary code reuse is understanding the code fragment’s interface. We propose a novel technique to identify the prototype of an undocumented code fragment directly from the program’s binary, without access to source code or symbol information. Further, we must also extract the code itself from the binary so that it is self-contained and can be easily reused in another program. We design and implement a tool that uses a combination of dynamic and static analysis to automatically identify the prototype and extract the instructions of an assembly function into a form that can be reused by other C code. The extracted function can be run independently of the rest of the program’s functionality and shared with other users. We apply our approach to scenarios that include extracting the encryption and decryption routines from malware samples, and show that these routines can be reused by a network proxy to decrypt encrypted traffic on the network. This allows the network proxy to rewrite the malware’s encrypted traffic by combining the extracted encryption and decryption functions with the session keys and the protocol grammar. We also show that we can reuse a code fragment from an unpacking function for the unpacking routine for a different sample of the same family, even if the code fragment is not a complete function. 1
Emulating Emulation-Resistant Malware
- In Proceedings of the Workshop on Virtual Machine Security (VMSec
, 2009
"... The authors of malware attempt to frustrate reverse engineering and analysis by creating programs that crash or otherwise behave differently when executed on an emulated platform than when executed on real hardware. In order to defeat such techniques and facilitate automatic and semi-automatic dynam ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
(Show Context)
The authors of malware attempt to frustrate reverse engineering and analysis by creating programs that crash or otherwise behave differently when executed on an emulated platform than when executed on real hardware. In order to defeat such techniques and facilitate automatic and semi-automatic dynamic analysis of malware, we propose an automated technique to dynamically modify the execution of a whole-system emulator to fool a malware sample’s anti-emulation checks. Our approach uses a scalable trace matching algorithm to locate the point where emulated execution diverges, and then compares the states of the reference system and the emulator to create a dynamic state modification that repairs the difference. We evaluate our technique by building an implementation into an emulator used for in-depth malware analysis. On case studies that include real samples of malware collected in the wild and an attack that has not yet been exploited, our tool automatically ameliorates the malware sample’s anti-emulation checks to enable analysis, and its modifications are robust to system changes.
Inspector Gadget: Automated extraction of proprietary gadgets from malware binaries
- In Proceedings of the IEEE Symposium on Security and Privacy
, 2010
"... Abstract—Unfortunately, malicious software is still an unsolved problem and a major threat on the Internet. An important component in the fight against malicious software is the analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate cou ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Unfortunately, malicious software is still an unsolved problem and a major threat on the Internet. An important component in the fight against malicious software is the analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate countermeasures. Manual approaches are frequently used to analyze certain key algorithms, such as downloading of encoded updates, or generating new DNS domains for command and control purposes. In this paper, we present a novel approach to automatically extract, from a given binary executable, the algorithm related to a certain activity of the sample. We isolate and extract these instructions and generate a so-called gadget, i.e., a stand-alone component that encapsulates a specific behavior. We make sure that a gadget can autonomously perform a specific task by including all relevant code and data into the gadget such that it can be executed in a self-contained fashion. Gadgets are useful entities in analyzing malicious software: In particular, they are valuable for practitioners, as understanding a certain activity that is embedded in a binary sample (e.g., the update function) is still largely a manual and complex task. Our evaluation with several real-world samples demonstrates that our approach is versatile and useful in practice. I.
Combining Controlflow Integrity and Static Analysis for Efficient and Validated Data Sandboxing
- In Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS) (2011
"... In many software attacks, inducing an illegal control-flow transfer in the target system is one common step. Control-Flow Integrity (CFI [1]) protects a software system by enforcing a pre-determined control-flow graph. In addition to providing strong security, CFI enables static analysis on lowlevel ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
(Show Context)
In many software attacks, inducing an illegal control-flow transfer in the target system is one common step. Control-Flow Integrity (CFI [1]) protects a software system by enforcing a pre-determined control-flow graph. In addition to providing strong security, CFI enables static analysis on lowlevel code. This paper evaluates whether CFI-enabled static analysis can help build efficient and validated data sandboxing. Previous systems generally sandbox memory writes for integrity, but avoid protecting confidentiality due to the high overhead of sandboxing memory reads. To reduce overhead, we have implemented a series of optimizations that remove sandboxing instructions if they are proven unnecessary by static analysis. On top of CFI, our system adds only 2.7% runtime overhead on SPECint2000 for sandboxing memory writes and adds modest 19 % for sandboxing both reads and writes. We have also built a principled data-sandboxing verifier based on range analysis. The verifier checks the safety of the results of the optimizer, which removes the need to trust the rewriter and optimizer. Our results show that the combination of CFI and static analysis has the potential of bringing down the cost of general inlined reference monitors, while maintaining strong security.
Detecting Passive Content Leaks and Pollution in Android Applications
- In Proceedings of the 20th Annual Symposium on Network and Distributed System Security, NDSS ’13
, 2013
"... In this paper, we systematically study two vulnerabili-ties and their presence in existing Android applications (or “apps”). These two vulnerabilities are rooted in an unpro-tected Android component, i.e., content provider, inside vul-nerable apps. Because of the lack of necessary access con-trol en ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
In this paper, we systematically study two vulnerabili-ties and their presence in existing Android applications (or “apps”). These two vulnerabilities are rooted in an unpro-tected Android component, i.e., content provider, inside vul-nerable apps. Because of the lack of necessary access con-trol enforcement, affected apps can be exploited to either passively disclose various types of private in-app data or inadvertently manipulate certain security-sensitive in-app settings or configurations that may subsequently cause se-rious system-wide side effects (e.g., blocking all incoming phone calls or SMS messages). To assess the prevalence of these two vulnerabilities, we analyze 62, 519 apps collected in February 2012 from various Android markets. Our re-sults show that among these apps, 1, 279 (2.0%) and 871 (1.4%) of them are susceptible to these two vulnerabilities, respectively. In addition, we find that 435 (0.7%) and 398 (0.6%) of them are accessible from official Google Play and some of them are extremely popular with more than 10, 000, 000 installs. The presence of a large number of vulnerable apps in popular Android markets as well as the variety of private data for leaks and manipulation reflect the severity of these two vulnerabilities. To address them, we also explore and examine possible mitigation solutions. 1
Statically-Directed Dynamic Automated Test Generation
, 2011
"... We present a new technique for exploiting static analysis to guide dynamic automated test generation for binary programs, prioritizing the paths to be explored. Our technique is a three-stage process, which alternates dynamic and static analysis. In the first stage, we run dynamic analysis with a sm ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
We present a new technique for exploiting static analysis to guide dynamic automated test generation for binary programs, prioritizing the paths to be explored. Our technique is a three-stage process, which alternates dynamic and static analysis. In the first stage, we run dynamic analysis with a small number of seed tests to resolve indirect jumps in the binary code and build a visibly pushdown automaton (VPA) reflecting the global control-flow of the program. Further, we augment the computed VPA with statically computable jumps not executed by the seed tests. In the second stage, we apply static analysis to the inferred automaton to find potential vulnerabilities, i.e., targets for the dynamic analysis. In the third stage, we use the results of the prior phases to assign weights to VPA edges. Our symbolic-execution based automated test generation tool then uses the weighted shortest-path lengths in the VPA to direct its exploration to the target potential vulnerabilities. Preliminary experiments on a suite of benchmarks extracted from real applications show that static analysis allows exploration to reach vulnerabilities it otherwise would not, and the generated test inputs prove that the static warnings indicate true positives.