JACY: a JVM-Based Intrusion Detection and Security Analysis System

(1)

HAL Id: hal-03168756

https://hal.archives-ouvertes.fr/hal-03168756v3

Preprint submitted on 2 May 2021

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

JACY: a JVM-Based Intrusion Detection and Security Analysis System

Mohammadreza Ashouri, Christoph Kreitz, Thomas Austin, Jacir Luiz Bordim

To cite this version:

Mohammadreza Ashouri, Christoph Kreitz, Thomas Austin, Jacir Luiz Bordim. JACY: a JVM-Based

Intrusion Detection and Security Analysis System. 2021. �hal-03168756v3�

(2)

and Security Analysis System

Mohammadreza Ashouri^?, Chrsitoph Kreitz^??, Thomas H. Austin^{? ? ?}, and Jacir Luiz Bordim^†

Abstract. This paper introduces a practical approach to identify input sanitization errors in Java applications. Our introduced technique analyzes the bytecode of given Java applications based on a successful combination of call graph backward slicing and dynamic taint tracking. As a result, our analysis technique allows overcoming common restrictions in previous work such as unexpected runtime errors in the Java Virtual Machine (JVM) or applications altering the normal behavior of target programs under analysis and the lack of source code. Our approach can be deployed without special firmware modifications or root privileges on different standard operating systems supporting the JVM, e.g., Linux, Windows, Mac OS. Moreover, we evaluated our technique with a new Java benchmark suite called Orbitz Security Benchmark Suite. Orbitz includes8 programs written in modern Java compilers (e.g., Java SE 9, 11, 14, and 16), comprises1,201,934 lines of code with different workloads for various application domains. As a result, we could identify 130 security violations out of 349 suspicious runtime data flows, demonstrating our proposed mechanism’s aptitude to identify real-world security issues in the Java ecosystem.

Introduction

The lack of proper input sanitization in software systems can lead to serious security issues and financial damages, such as leakage of sensitive information and data loss. Consequently, Java applications that contain such bugs are vulnerable to input sanitization errors, which adversaries can exploit to hijack confidential data from target machines, damage system resources, and provide additional support for further attacks. For instance, internet worms often exploit sanitization errors and infect hundreds of thousands of machines in a short amount of time, causing hundreds of millions of dollars in losses [17, 57].

Nevertheless, it seems that there is still a significant deficiency in the existence of practical systems and approaches to tackle this fundamental coding issue. Naturally, a practical solution should allow identifying zero-day vulnerabilities, exploits, and runtime attacks in both free and open-source (FOSS) and commercial off-the- shelf programs (COTS) before being exploited by adversaries. Furthermore, such

?Department of Cyber Security, Saint P¨olten University of Applied Sciences

?? Department of Computer Science, Cornell University

? ? ? Department of Computer Science, San Jos´e State University

†Department of Computer Science, University of Bras´ılia

(3)

mechanisms should be easy to deploy, cost-effective, and result in few false positives and false negatives.

This paper introduces an effective mechanism to identify input sanitization errors in real-world Java COTS and FOSS in the JVM ecosystem, swiftly extending to the Android ecosystem and other JVM-based ecosystems such as Scala, Clojure, and Kotlin. Our work introduces the following contributions:

i Tracking only actual runtime data flows by handling real environment inputs ii Non-reliance on source code

iii Providing a safe runtime environment for vulnerable applications by blocking the exploitation of sanitization errors and reporting adversaries

iv Introducing a modern benchmark suite for security evaluation in Java based on new Java Development Kits (JDKs)

v Preventing runtime intrusion in Java bytecode by intercepting runtime attacks and vulnerability exploitation

1 Java Virtual Machine

The Java Virtual Machine (JVM) is a virtual machine that allows a computer to run Java programs as well as programs written in other languages (e.g., Scala, Kotlin, Ceylon) that are also compiled to Java bytecode [47]. Hence, we can describe JVM as an abstract computing machine that has its own instruction set, byte codes, and manages multiple memory blocks. When the compilation of a Java program takes place, it generates a sequence of bytecode as an array of bytes. These bytecode instructions are described in a class file, which also holds the program data [50].

1.1 Java Class File

A class file is a file that holds bytecode and has a.class extension that can be executed by the JVM. A Java compiler creates a class file from.javafiles as a result of the successful compilation. The Java class file is a precisely defined format for compiled Java. The program source code is compiled into class files that can be loaded and executed by any standard Java Virtual Machine [18]. In addition, the class files can travel across a network before being loaded by the JVM. JVMs are available for common platforms, and a class file compiled on one platform will execute on another platforms’ JVM. This makes Java applications platform-independent [35].

Listing 1.1 represents the structure of a sample class and how to initiate it in Java.

A class can also be defined as an element in object-oriented programming that aggregates attributes (fields), which can be publicly accessible or not, and methods (functions), which can also be public or private, and usually write/read those attributes.

Figure 1 illustrates instantiating of a class in the Java ecosystem.

(4)

1 public class Sample{

2

3 private String property1 ;

4 private int property2 ;

5

6 void SetMethod(intval) {

7 this . property2=val;

8 }

9

10 int GetMethod2(){

11 return this . property2 ;

12 }

13 }

14

15 class TestClass {

16 public static void main (String [] args ) {

17

18 // create an instance of the class .

19 Sample sample = new Sample();

20 sample.SetMethod(2019);

21 System.out. println (sample.GetMethod2());

22 }

23 }

Listing 1.1: Java class instancing at runtime.

Fig. 1: Instantiating is the creation of a new instance of a class and is part of object-oriented programming, which is when an object is an instance of a class

(5)

1.2 Java Bytecode

Since the JVM bytecode is the instruction set for the JVM, it acts similar to an assembler [21]. As soon as a program is compiled, Java bytecode is generated.

Required resources for running the bytecode are made available by the JVM, which calls the processor to allocate the required resources. JVM’s are stack-based, so they stack implementation to read the codes. Diagram 2 outlines the compilation process. Listing 1.2 and Listing 1.3 respectively show the bytecode of the class after the compilation.

Fig. 2: Overview of the compilation process in the Java compiler

1 for (int i = 2; i <1000; i++){

2 for (int j = 2; j <i ; j ++){

3 if ( i % j == 0)

4 continue outer ;

5 }

6 System.out. println ( i ) ;

7 }

Listing 1.2: Sample Java code before the compilation

1.3 Bytecode Portability

Bytecode is essentially the machine level language which runs on the JVM. Hence, when a class is being loaded, it gets a stream of bytecode per class method. Also,

(6)

1 Code:

2 0: iconst 2

3 1: istore 1

4 2: iload 1

5 3: sipush 1000

6 6: if icmpge 44

7 9: iconst 2

8 10: istore 2

9 11: iload 2

10 12: iload 1

11 13: if icmpge 31

12 16: iload 1

13 17: iload 2

14 18: irem # remainder

15 19: ifne 25

16 22: goto 38

17 25: iinc 2, 1

18 28: goto 11

19 31: getstatic #84;//Field java/lang/System.out:Ljava/io/PrintStream;

20 34: iload 1

21 35: invokevirtual #85;//Method java/io/PrintStream.println :( I )V

22 38: iinc 1, 1

23 41: goto 2

24 44: return

Listing 1.3: Manifesting the bytecode of the compiled class after the compilation process in Java

if this method was called during the program execution, the bytecode for that method gets invoked. This flexibility results in portability, which is lacking in other programming languages, such as C and C++ [43]. Portability guarantees that a program can be implemented and executed on a wide range of platforms, such as web, desktop, mobile devices, and mainframes.

1.4 JVM Internals

In more detail, bytecodes will be executed by the Java Runtime Environment (JRE) [42]. JRE is the implementation of JVM, which analyzes, interprets and executes bytecodes [20]. The JVM includes three main subsystems as follows:

– Class Loader Subsystem – Runtime Data Area – Execution Engine

Classloader.

The classloader is responsible for loading program class files. Any program must be first loaded by the classloader. There exist three built-in classloaders in the Loading phase: BootStrap ClassLoader, Extension ClassLoader, and Application ClassLoader.

(7)

– Bootstrap ClassLoader: It is the first classloader, which is the superclass of Extension classloader. It loads thert.jar file, which contains all class files of Java Standard Edition, such as java.lang package classes, java.net package classes, java.util package classes, java.io package classes, java.sql package classes.

– Extension ClassLoader: This is the child classloader of Bootstrap and the parent classloader of System classloader. It is responsible for loading the jar files located inside of the compiler library folder.

– System/Application ClassLoader: This sub-component, which is recognized as the child classloader of the Extension classloader, is responsible for loading the class files fromClasspath. It is also called ”Classloader”.

Listing 1.4 presents how to interact with the JVM classLoader in Java.

1 public class ClassLoaderExample

2 {

3 public static void main(String [] args )

4 {

5 // Let's print the classloader name of current class .

6 //Application/System classloader will load this class

7 Class c=ClassLoaderExample.class;

8 System.out. println (c . getClassLoader() ) ;

9 // If we print the classloader name of String , it will print null because it is an

10 //in−built class which is found in rt . jar , so it is loaded by Bootstrap classloader

11 System.out. println ( String . class . getClassLoader() ) ;

12 }

13 }

14 // output : sun.misc.Launcher$AppClassLoader@4e0e2f2a

15 // null

Listing 1.4: Working with ClassLoader at runtime

1.5 Bytecode Instrumentation in Java

Bytecode instrumentation (BCI) is an important capability of the JVM to modify the program execution at runtime without knowing or changing its source code [13, 9, 27]. In more concrete terms, this feature allows utilizing the instrumentation API, which is provided by the JVM to alter existing loaded bytecodes before the execution [26].

Common use cases for BCI are dynamic analysis, event logging, memory leakages detection, and performance monitoring [6, 24]. Using this technique makes it possible to introduce almost any changes to an already deployed Java application by operating on its bytecode level which is interpreted by the JVM at runtime, without modifying the application’s source code (since there is no need for re-compilation, re-assembly and re-deployment) [27]. It is conducted by implementing an agent that makes it

(8)

possible to transform every class loaded by the JVM classloader before being used for the first time [25]. Since the manual bytecode manipulation is relatively complicated and error-prone, there are some BCI libraries bundled in the JRE to help developers in the BCI implementation, e.g., BCEL [6], Java Assist [29], ASM [16], and Byte Buddy [52].

Operational Code.Java programs are compiled into a generic intermediate format, which is called the JVM bytecode. A method in the bytecode is a series of instructions.

Each instruction consists of a one-byte operational code defining the operation and it is followed by one or more arguments. An instruction can be expressed in the following format:

1 <index><opcode>[<operand1>[<operand2>...]][<comment>]

Note that<index>is the index of the operational code in the array that holds the bytecode. It can be considered as a bytecode offset from the beginning of the method. The following code shows a simple constructor from a MIDlet application with a simple methodString.valueOf():

1 public MainMidlet(){

2 String number = String.valueOf(”H”);

3 }

The corresponding bytecode is shown in Listing 1.5, which comprises the index and the instruction code in each line. The opcodeaload 0[this] takes the reference of the class from the stack. The line 5 and line 7 are two instructions of method calls.invokespecial invokes the method of an instance of class.invokestatic invokes the static method of a class. Opcode ldcpushes the value of the String ’H’ onto the stack. Theastore 1 stores the value returned from the method call into the variable number.

1 // Method descriptor #10 ()V

2 // Stack: 1, Locals : 2

3 public MainMidlet();

4 0 aload 0 [ this ]

5 1 invokespecial javax . microedition . midlet .MIDlet() [12]

6 4 ldc <String H >[14]

7 6 invokestatic java . lang . String .valueOf(java . lang .Object) : java . lang . String [16]

8 9 astore 1 [number]

9 10 return

Listing 1.5: Showing corresponding bytecode associated with the simple MIDlet method.

(9)

Methodology

In this research, we seek to identify input sanitization errors in Java applications, which lead to various code injection attacks such as SQL injection, cross site scripting (XSS), and remote code execution [33]. In other words, in our threat model, adversaries can launch successful attacks through input data sanitization errors and exploit sensitive coding methods in a target Java applications.

Furthermore, we assume that cyberattacks not exploiting code-level vulnerabilities are outside the scope of our research. For instance, hijacking external service security tokens due to the weak security of these external services is considered a separate topic.

Additionally, the denial-of-service (DoS) behavior of interrupting service is not in our research scope [45], even though, in some cases, it can be identified by JACY through a proper set of source-sink specifications. Henceforth, in this work, we aim to identify runtime attacks and taint-style vulnerabilities caused by code-level sanitization errors in a target program.

Sanitization: Sanitization is the process of assuring that data adapts to the program conditions (security-related requirements regarding leakage or disclosure of sensitive data). Sanitization may involve eliminating unwanted characters from the input by eliminating, substituting, or terminating the characters. Sanitization functions can often neglect particular characters or unknown complexities in the code or they may not be sufficiently maintained when new features are appended to the software.

Security violation:We assume that there exists at least one security vulnerability (violation) in a given application, which involves at least one input that is introduced by an untrusted source (e.g., web forms, program arguments, file, network, external process, keyboard). This input can reach at least one sensitive method (e.g., console, database, file, network) without being appropriately sanitized. Consequently, if we can identify such an unsafe data flow during the execution time of a program under analysis, we can detect the security vulnerability associated with that particular runtime data flow.

1.6 Problem Scope

We aim to raise the bar for the aforementioned code-level attacks by providing a conservative monitoring system. To this end, our technique performs the following stages:

1. Statically analyzing the binary files of a given Java applications to generate a call graph

2. Performing backward slicing to recognize potential exploitable paths 3. Instrumenting the extracted paths

4. Dynamically monitoring runtime behavior of the instrumented paths our dynamic taint analysis (DTA) engine in order to report the propagation of non-properly sanitized input arriving from external sources during runtime

(10)

Fig. 3: Presenting the abstract architecture of JACY

Our approach can also be used as a complementary tool for program debugging, auditing, or penetration testing for software developers and security experts. The abstract architecture of our analysis mechanism is shown in Figure 3.

Call Graph Generation.JACY builds a dependency call graph [28, 44, 31] for a given application class in order to represent all potential sensitive paths that can be exploited by potential vulnerabilities. In other words, the generated graph call, which is called ”Code Property Graph” (CPGs) [53] illustrates a graph in which we can discover critical methods (we call them sinks). Given a source method (i.e., the program receives input) and a sink method, generated CPGs can be effectively utilized to find data dependency paths between source and sink arguments, and therefore potential attacking runtime data flows.

Backward Slicing. Because these sinks nodes can be exploited by adversaries, it is vital to control them in order to determine whether they are vulnerable to input sanitization errors. It shall be noted that, in this paper, an entry point means an external source of arbitrary input data (e.g., user inputs, web forms, network packages, files). Accordingly, a specific ”backward slicing” algorithm [7, 34] (Figure 4) is leveraged to find an actual entry point to introduce arbitrary input to the program.

The backward slicing method we used is shown in Algorithm 1, and it is expected to return the list of potential critical (i.e. exploitable) paths.

(11)

Fig. 4: Backward slicing

Procedure 1 Backward slicing

Input :Sink Methods

Result:Critical Paths

1 initialization SensitiveNodes←SearchSensitiveNodes(SensitiveMethods)

2 foreachSensitiveNode∈SensitiveNodesdo

3 SensitivePaths=AnalyzeNode(SensitiveNode)

4 end

5 returnSensitivePaths

6 FunctionAnalyzeNode(node):

7 SensitivePaths←[]

8 ExtractedPaths=Backward(SensitiveNode)

9 foreachpath∈ExtractedPathsdo

10 ifpathhasasourcethen

11 SensitivePaths←path;

12 else

13 invokePaths=NodeAnalysis(invokeNode) SensitivePaths←path+invokePaths

14 end

15 end

16 returnSensitivePaths;

17 FunctionBackward(node):

18 IntraPaths←[]

19 whilenode is not a source node is not a method argumentdo

20 AddNodes=ExtractNextNode(node) UnNodes=FILTERNodes(AddNodes) node←UnNodes

21 end

22 IntraPaths=ReachPathToSources(node) returnIntraPaths;

This algorithm starts scanning nodes for any call-site (invocation) to a sink method.

Each node represents an instruction in the graph. Therefore, the data dependency edges for all variables used in that sink instruction is traversed backwards by using the function NodeAnalysis. This function also calls theBackward function to gain all data dependency paths from an instruction node to either a source or a method argument (if the instruction is inside a method). If a path ends at a method argument, NodeAnalysis will be invoked recursively over the nodes. Furthermore, theBackward function classifies paths between sources and sinks. It also controls protection methods (e.g., sanitization functions) in paths and prunes non-exploitable ones. Finally, the ReachPathtoSources function returns critical paths in the graph.

(12)

2 Bytecode Instrumentation

While the source code of synthetic and educational software projects is often available on the internet (places like GitHub), the source code of commercial off-the- shelf (COTS) applications is usually non-public for various understandable reasons, such as protecting intellectual property [39, 6]. This limitation represents a challenge when conducting practical vulnerability analysis for real-world COTS.

In the effort to overcome this obstacle, we created a JVM agent [22], named

”Instrumentor” to manipulate the bytecode of a target application and implement our dynamic taint tracking and analysis. Therefore, in contrast to previous work, such as [39, 38, 30, 48, 41, 54], in this work we do not require source code for the instrumentation and analysis. In other words, the BCI agent allowed us to track the propagation of input data during execution time even in the absence of source code.

Listing 1.6 represents the standard initiation of a Java application alongside the BCI agent.

1 java −javaagent:agentA.jar −javaagent:agentB.jar TargetApp

Listing 1.6: Launching a Java application carried along with 2 JVM agents.

Instrument agent. In this work, we created our BCI agent with the help of the ByteBuddy [36] library due to its simplicity in comparison with other libraries in Java [6, 8] It shall be noted that the agent loads the target bytecode to the ClassLoader of the local JVM (see Listing 1.6). Subsequently, it injects logging code into the bytecode. This injected code is a prerequisite for the later performance of dynamic taint tracking during the target application’s execution time. Figure 5 illustrates the BCI agent in our approach.

Fig. 5: Instrumenting Java application bytecode

3 Dynamic Taint Tracking

The dynamic taint analysis module (DTA) is the main component for monitoring and analyzing the propagation of untrusted input data introduced by external sources in a target program. In other words, the DTA is responsible for monitoring the

(13)

dynamic execution of an instrumented bytecode and identifying sanitization errors at runtime. Hence, if tainted data can reach a sensitive sink without proper sanitization, the DTA module will mitigate the attack and report the associated security violation.

The DTA module comprises the following components:

Sources:Source methods identify where untrusted data originate from.

Sinks:These are programming locations (i.e., functions) used to control the state of tainted data in order to identify any malicious activity in the code. For example, executeQuery(SQLquery) andprocessBuilder.command(”bash”, ”-c”, ”ls /home/”);

are associated with SQL injection and RCE attacks, respectively. Table 1 presents some of the sink methods in Java.

Table 1: Some of the J2EE sink methods

Sink Description

tomcat.util.net.NioBufferHandler.getReadBuffer Reads the buffer of Apache server

javax.servlet.jsp.JspWriter.println(C) Prints objects/strings/booleans on web documents javax.servlet.http.Part.getHeaderNames Gets the header names

org.eclipse.jetty.server.Request.getReader Shows a Post form item

java.lang.Runtime.exec Executes the string command in a process java.sql.Statement.executeQuery Executes a given SQL statement with JDBC java.net.URL.openConnection Returns an HttpURLConnection object javax.servlet.jsp.JspWriter Writes characters to stream or console org.apache.struts.action.ActionForward.setPath Sets URI to which control should be forwarded ognl.OgnlReflectionProvider.getValue Evaluates the provided OGNL expression util.TextParseUtil.ParsedValueEvaluator Evaluates the value of OGNL value stack turbine.om.peer.BasePeer.executeQuery Executes a given query in Apache Turbin org.hibernate.Session.createSQLQuery Executes a given SQL statement in Hibernate

Intrusion Terminator:After identifying a runtime attack based on a non-sanitized data flow, this component protects the end-user by terminating the vulnerable Java process.

3.1 Shadow Objects

To track the propagation of tainted data introduced by triggered sources in the bytecode, we utilized the ”memory shadowing” technique in our analysis. It shall be noted that this technique has been introduced in the previous research work [46, 23, 2, 12]. However, in this paper we optimized it in order to track the potential exploitable path introduced by the call graph analysis only. This extension enables our DTA system to log or tracks only necessary tainted objects at runtime by making a shadow instance of them in the runtime bytecode without creating a separate structure, which requires extra resources such as memory.

As a result, we could reduce runtime overhead and unexpected errors associated with dynamic taint tracking.

Our DTA module performs the following operations during the execution time of a target application:

(14)

1. Labeling program various inputs from untrusted source methods as ”tainted”.

2. Identifying intrusions by tracking suspicious data flows

3. Using regular expressions to analyze the lack of proper sanitization on target data flows

3.2 Intrusion Termination

IF the DTA found a regex match, it provides a report associated with the vulnerable data flow in a target program. This YAML report introduces necessary information regarding the vulnerability and its associated sink. Listing 1.7 presents a generated report for SQL injection vulnerabilities in a data flow on one of our benchmark programs.

1

2 −app: ZooBank

3 JVM: Java HotSpot(TM) 64−Bit Server VM

4 user : root

5 violation :

6 −login . class

7 −StrEmail

8 −org.apache.http . client .methods.HttpGet

9 −java. sql . Statement.executeQuery

10 − false

11 −202452139

12 tainted : ;/∗∗/SELECT sleep(25)

13 inside : wool

14 −1615106908

15 −true

16 −true

17 − false

Listing 1.7: Showing a generated report by JACY, which comprises necessary information regarding a runtime traced attack

After identifying a non-sanitized data flow, i.e., a vulnerability, JACY block successful attacks by terminating the vulnerable Java process. To end an instrumented Java program under the dynamic taint tracking, the JACY agent injects theexit() method of the System class. It is the most popular way to end a program in Java.

System. exit() terminates the Java Virtual Machine (JVM) that exits in the current program that we are running (Listing 1.8).

Even though this approach can also interrupt the normal behavior of a Java program, it protects the end-users from potential exploitation and further damage by exploiting a hidden security vulnerability introduced in the target application. It shall be noted that the Termination occurs only when there is a successful attack. On the other hand, JACY does not disturb application behavior under normal circumstances.

(15)

Our method, along with a vulnerability report, allows for the protection of users until a proper bug fix for the vulnerability is implemented.

1 protected void VulnerableMethod(HttpServletRequest request,

2 HttpServletResponse response)

3 throws ServletException , IOException{

4 boolean success = false ;

5 String username = request.getParameter(”username”);

6 String password = request.getParameter(”password”);

7 // Unsafe query which uses string concatenation

8 String query =”select ∗from tblAccounts where username='”+

9 username +”'and password ='”+ password +”'”;

10 Connection conn = null;

11 Statement stmt = null;

12 try {

13 conn = DriverManager.getConnection(”jdbc:mysql://0.0.0.0:3306/user”,

14 ”root”, ”toor”) ;

15 stmt = conn.createStatement();

16

17 //−−−−−−Termination−−−−−−

18 System.err . println (”attack intercepted !”) ;

19 System.exit (1) ;

20 //−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

21

22 ResultSet rs = stmt.executeQuery(query);

23 if ( rs . next() ) {

24 // Login Successful if match is found

25 success = true;

26 }

27 }catch (Exception e) {

28 e . printStackTrace () ;

29 } finally {

30 try {

31 ...

Listing 1.8: JACY injected the exit() method to the bytecode of a given intrumented application in order to circumvent a SQL injection attack during the execution time

Evaluation

Even though there are a number of standard Java-based benchmark suites (e.g., Secure Micro-Bench [40], DaCapo [14]), they are mostly relatively old, and therefore, do not reflect the modern features introduced by recent Java Development Kits (JDKs). Consequently, in this research, we built and introduced our own Java benchmark suite, which includes modern features of JDKs for both security and performance measurements, which we called benchmark suite ORBIT. The Orbit benchmark suite aggregates various application domains, such as database, web, network, and file management. It also includes programming classes associated with multi-thread programming, image processing, and message passing. To challenge

(16)

the usefulness of JACY, we also deliberately introduced different security issues such as cross-site scripting, SQL injection, code injection, insecure deseriazalition, and directory traversal. Table 2 represents our benchmark suite, which comprises 1,201,934 lines of code (LoC).

Table 2: Summary of the security collection of ORBIT.

Name Description LoC

DNA String functions 7011

Duke Browser Web browsing functions 48295

Icy Files File management 3724

Memory Leaks Using various methods and loops for Garbage Collector security issues 2387 Net Messenger Network messaging using various network APIs with the core encryption libraries 129057

Picture-Pro Image processing 8700

RAT Remote screen control with the core libraries, no encryption used 825098 Zoo-Bank Database application with common functionalities used in database systems 177662

3.3 Analysis Results

We performed our evaluation on a mac-OS Sierra machine (version 12) with 128 GB memory and Intel XEON W CPU, using OpenJDK 11 as the runtime environment.

As Table 3 indicates, JACY could successfully report a total of301 security violations in our ORBIT benchmark suite, in which 248 are actual vulnerabilities, while53 are false positives.

Table 3:Actual Vuln.indicates true vulnerabilities after our validation process, andFP introduces false positives.

Project Tainted Flows Reported Vuln. Actual Vuln. FP

DNA 7 0 0 0

Duke Browser 3 1 1 0

Icy Files 80 31 28 3

Memory Leaks 8 7 3 4

Net Messenger 43 41 27 14

Picture-Pro 67 43 37 6

RAT 59 11 9 2

Zoo-Bank 82 16 15 1

Total Results 349 150 120 30

3.4 Attack Simulation

In order to perform various runtime attacks, we went through the following stages:

1. We specified a remote access to the benchmark applications (e.g., by network requests)

(17)

Table 4: Various identified vulnerabilities

Benchmark DNA DUKEBROWSER ICYFILES MEMORYLEAKS NETMESSENGER PICTURE-PRO RAT ZOO-BANK

SQL INJECTION 7 7 7 7 7 7 7 3

CROSS-SITE SCRIPTING (XSS) 7 7 7 7 3 7 7 3 REMOTE CODE EXECUTION 7 3 7 7 7 7 3 3 INSECURE DESERIALIZATION 7 3 7 7 7 3 7 7 XML EXTERNAL ENTITIES (XXE)3 7 7 7 3 7 7 3 OGNL INJECTION 7 7 7 7 7 7 7 3

2. We directed exploits’ payloads to the sources in target applications. To do so, we used Burp Intruder [4], which is a standard and configurable security tool for executing different sets of exploit strings.

3.5 Result Verification

We manually examined each reproduced report by comparing its detected sources, sinks, and tainted values with our pre-knowledge about the vulnerabilities intentionally installed in the benchmark suite to verify our reports’ accuracy. In some cases, we also imported the benchmark source code into the IntelliJ IDEA. Then, we started to debug and monitor corresponding data flows with runtime values to intercept potential hidden vulnerabilities that we were unaware of. We often employedCurl to direct exploit payloads to the benchmarks’ sinks.

False Positives.JACY mistakenly reported53 data flows. We observed that this is due to two cases:

1. Lack of proper regex specifications in the DTA module.

2. The sink intervention problem, i.e., having a group of nested sink methods that interfere the taint tracing process.

4 Related Work

Static code analysis is one of the popular techniques for discovering programming bugs and security issues in software systems [15]. The main advantage of this method is that performing program analysis does not require the compilation and the execution of a given source code. Developers mostly use this technique during the development stages before releasing the program to the market [15].

For instance, FindBugs [10] is a popular and free, open-source static analysis tool, which can report various bugs and security errors. It works based on static specifications of target bugs and allows developers to review and correct their projects’

code during popular IDEs such as Eclipse [1] and NetBeans [3]. However, this tool’s main downside and similar static-based approaches are demanding source code, which is often unavailable for real-world COTS and introduces substantial false-positive results which occur due to lack of access to actual runtime data.

(18)

Other research tools manipulate the runtime environments (e.g., operating system, hardware [56]) to perform conservative dynamic taint tracking. In this regard, HiS- tar [55] introduced a tagging system to show the taint level of operating system (OS) abstractions and control information flow from sensitive objects to natural objects without using a trusted agent. PRECIP [51] is also proposed as a lightweight tool for blocking information leaks by intercepting system calls and monitoring output channels (e.g., network, file, process, database) in which sensitive input data, e.g., files, user inputs, are existed. To prevent running malicious programs, e.g., malware, from gaining access to the sensitives resources, PRECIP tracks runtime information flow at the object level at the kernel level. Although this tool works similarly to TaintEraser [56], it does not track taint propagation within the target programs’

code. Hence, PRECIP’s security policies have to stop all system calls when a program receives sensitive information. As a well-known taint analyzer for Linux, Dytan [19]

is a generic dynamic taint analysis framework for the Linux platform that supports configurable taint-propagation policies. However, Dytan requires significant execution overhead due to its complicated internals.

Moreover, Vachharajani et al. introduced RIFLE [49] as a runtime mechanism for implementing information-flow security policies. The proposed solution tracks information flow by using new hardware extensions based on the devised binary translation. RIFLE could also handle conditional dependencies and loops, but it requires significant hardware support, not instantly applicable to existing systems.

Similarly, Ho et al. [32] introduced a page-granularity taint analyzer built on the Xen virtual machine that monitors and switches dynamically from virtualization to hardware-based emulation to identify security violations if the processor accessed a tainted page.

However, the common drawbacks of the work mentioned above are the substantial runtime overhead stemming from the extensive monitoring system, lack of flexibility, and unwanted errors to target binaries. Furthermore, the restriction of using a specific lab-made runtime environment only prevents a wide range of real-world apps from benefiting from these approaches.

This paper aims to introduce a practical approach for performing in-depth security analysis for the JVM ecosystem by considering real-world conditions. Thus, we tried to leverage the useful ideas introduced in previous work, particularly in [6, 8, 7, 5, 11, 40, 29] and reduce the cons. We mainly focused on the JVM ecosystem in order to enhance the overall safety and security of real-world Java applications.

Please note that ub this paper we aim to introduce a practical software-level intrusion detection and attack prevention system that is able to identify a wide range of input sanitization errors in the real-world JVM COTs, where source code is often unavailable, overhead is critical, and punctuality is vital. In more concrete terms, our approach works based on a successful combination of backward slicing, bytecode instrumentation, and dynamic taint tracking. Moreover, as a software level intrusion detection system, JACY supports various JVM-based programming languages such as Java, Scala, Kotlin, and Clojure.

Finally, we implemented a guided bytecode instrumentation engine that only instruments potential executable paths in a target binary because of our call graph

(19)

algorithm. In contrast to previous work in the JVM [30, 12, 48, 39, 40, 37, 41], which instruments the whole of statements in target applications and imposes substantial runtime overhead and unexpected error, our binary instrumentation technique reduces the instrumentation time, runtime overhead, and unexpected errors.

In order to highlight the advantages of JACY in comparison with related work, we compared the critical features in our approach with the essential contributions in related work; the results of this comparison are shown in Table 5.

Table 5: Comparison between the advancements in JACY and related work.

JACY FindBugs Phosphor TaintDroid Heldar ScalaStyle Scapegoat WASP Juturna Trishul Chandra

Bytecode instrumentation D X D D D X X D D D D

Real-world case studies D D X D D - - - - - -

Compatibility on standard JVMs D - - - X - - X X X X

String propagation D D D D D - - D D D D

Byte array propagation D - - D X - - - - - -

All primitives data type propagation D - D - X - - - - - -

Supporting reflection D D - D D X X - - - -

Attack (vulnerability) detection D D X X D D X - - - -

Key concept Hybrid Static Dynamic Dynamic Dynamic Static Static Dynamic Hybrid Dynamic Dynamic

Adaptable source/sink specification (by users) D X X X D D X - D - -

False-positive filtration D X X X X X X - - - -

According to our comparison results, the number of reported true and false positives indicates that our hybrid analysis is more effective in detecting actual vulnerable tainted flows and pruning false-positive flows that do not involve runtime attacks and vulnerabilities present in the code. These precise results were gained by accessing the actual runtime information, controlling the content of introduced untrusted inputs, and pruning false positives data flows from our analysis.

Future Work

Improving specification completeness.While our evaluation proves our security technique’s soundness and application, we could analyze only those security issues, which are well-specified for our DTA engine. However, if a particular source is missed, potential vulnerabilities provoked by the source will be missed as well.

Although inferring blueprints in the general case is a task for future research, we used a hybrid strategy to find corresponding sources in a given application call graph. For example, we found the sources by backtracking sensitive nodes (i.e., sinks) in the generated call graph. However, we acknowledge that using sophisticated code obfuscation tools can interrupt our approach. Thus, tackling code obfuscation techniques for the sake of data flow specification would be one of the future works.

Auto-sanitize malicious input.By giving write privilege to JACY for restricting (overwriting) vulnerable tainted methods, we should be able to fix sanitization errors and protect target applications against runtime threats at the bytecode level. By the implementation of these features, instead of terminating the whole vulnerable

(20)

Duk e-Bro

wser

Icy-Files

Picture-Pro

Net-Messenger

RAT

DNA Zoo-B

ank

Memo ry-Leaks

0 10 20 30 40

1

28

37

27

9

0

15

3 1

16

29

11

2

0

12

2 Numberofactualvulnerabilitiesreported

JACY FindBugs

Fig. 6: The number of true positives on ORBIT benchmark suite analyzed by JACY and FindBugs.

Duk e-Bro

wser

Icy-Files

Picture-Pro

Net-Messenger

RAT

DNA Zoo-B

ank

Memo ry-Leaks

0 5 10 15 20

0

3

6

14

2

0 1

4 2

7

10

19

2

0

8

4

Numberoffalsepositivesreported

JACY FindBugs

Fig. 7: The number of false positives on ORBIT benchmark suite analyzed by JACY and FindBugs.

(21)

process, we can better mitigate and control runtime attacks in order to protect vulnerable applications against attacks (e.g., internet worms) to create an additional opportunity for security teams to deliver proper security patches.

Multi-layer sanitization.In real-world applications, there are conditions in which tainted data must be sanitized to be used, for instance, as a SQL command (e.g., searching database), as well as if the data is used again, for example, on a web page (e.g., as a result of a search box). The current version of JACY does not include the analysis for cases where ”multi-layer sanitization” is needed. Hence, we would consider implementing this feature as a future extension.

Conclusion

In this paper, we introduced a JVM-based mechanism for detecting and blocking sanitization-error-based vulnerabilities and runtime exploitation. We name our mechanism ”JACY”, which is an intrusion detection and protection system designed specifically for real-world Java COTS and FOSS. Our technique works based on a combination of static call graph analysis and bytecode instrumentation, which is tailored to a dynamic taint tracking system to trace and analyze the propagation of runtime data and circumvent malicious activities inside of instrumented software.

References

1. Enabling open innovation & collaboration — the eclipse foundation.

https://www.eclipse.org/. (Accessed on 01/11/2020).

2. [sonarjava-1709] fn in s2077: support extra methods such as springframework sql- related methods - sonarsource. https://jira.sonarsource.com/browse/SONARJAVA- 1709. (Accessed on 07/16/2019).

3. Welcome to netbeans. https://netbeans.org/. (Accessed on 11/05/2019).

4. Ivan Andrianto, MM Inggriani Liem, and Yudistira Dwi Wardhana Asnar. Web application fuzz testing. In2017 International Conference on Data and Software Engineering (ICoDSE), pages 1–6. IEEE, 2017.

5. Mohammadreza Ashouri. Detecting input sanitization errors in scala. In2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW), pages 313–319. IEEE, 2019.

6. Mohammadreza Ashouri. Practical dynamic taint tracking for exploiting input sanitization error in java applications. InAustralasian Conference on Information Security and Privacy, pages 494–513. Springer, 2019.

7. Mohammadreza Ashouri. Kaizen: a scalable concolic fuzzing tool for scala. InPro- ceedings of the 11th ACM SIGPLAN International Symposium on Scala, pages 25–32, 2020.

8. Mohammadreza Ashouri and Christoph Kreitz. Scalayzer: a portable tool for vulnerability analysis in scala. InProceedings of the 4th ACM/IEEE Symposium on Edge Computing, pages 371–376. ACM, 2019.

9. Arra E Avakian, Randolph G Hudson, Martha S Borkan, Rowan H Maclaren, Raymond M Bloom, and Jeffrey Rees. Instrumenting java code by modifying bytecodes, January 27 2009. US Patent 7,484,209.

10. Nathaniel Ayewah, William Pugh, David Hovemeyer, J David Morgenthaler, and John Penix. Using static analysis to find bugs. IEEE software, 25(5):22–29, 2008.

11. Jonathan Bell and Gail Kaiser. Phosphor: Illuminating dynamic data flow in commodity jvms. ACM Sigplan Notices, 49(10):83–101, 2014.

(22)

12. Jonathan Bell and Gail Kaiser. Dynamic taint tracking for java with phosphor. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, pages 409–413. ACM, 2015.

13. Walter Binder, Jarle Hulaas, and Philippe Moret. Advanced java bytecode instrumentation. InProceedings of the 5th international symposium on Principles and practice of programming in Java, pages 135–144. ACM, 2007.

14. Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. The dacapo benchmarks: Java benchmarking development and analysis.

InACM Sigplan Notices, volume 41, pages 169–190. ACM, 2006.

15. Dan Boxler and Kristen R Walcott. Static taint analysis tools to detect information flows. InProceedings of the International Conference on Software Engineering Research and Practice (SERP), pages 46–52. The Steering Committee of The World Congress in Computer Science, Computer . . . , 2018.

16. Eric Bruneton. Asm 3.0 a java bytecode engineering library. URL: http://download.

forge. objectweb. org/asm/asmguide. pdf, 2007.

17. Ben Buchanan. The Hacker and the State: Cyber Attacks and the New Normal of Geopolitics. Harvard University Press, 2020.

18. Patrick Chan and Rosanna Lee. The Java class libraries: An annotated reference.

Addison-Wesley Longman Publishing Co., Inc., 1996.

19. James Clause, Wanchun Li, and Alessandro Orso. Dytan: a generic dynamic taint analysis framework. InProceedings of the 2007 international symposium on Software testing and analysis, pages 196–206. ACM, 2007.

20. Zack Coker, Michael Maass, Tianyuan Ding, Claire Le Goues, and Joshua Sunshine.

Evaluating the flexibility of the java sandbox. In Proceedings of the 31st Annual Computer Security Applications Conference, pages 1–10. ACM, 2015.

21. Markus Dahm. Byte code engineering. InJIT’99, pages 267–277. Springer, 1999.

22. Markus Dahm, J van Zyl, and E Haase. The bytecode engineering library (bcel), 2003.

23. Manuel Egele and Kruegel. Dynamic spyware analysis. 2007.

24. William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS), 32(2):5, 2014.

25. Nan Fan, Allan Bradley Winslow, Ting Bin Wu, and Jean Xu Yu. Automatic deployment of java classes using byte code instrumentation, March 12 2013. US Patent 8,397,227.

26. Marco Gagliardi. Injection of updated classes for a java agent, October 24 2017. US Patent 9,798,557.

27. Allen Goldberg and Klaus Haveland. Instrumentation of java bytecode for runtime analysis. 2003.

28. David Grove, Greg DeFouw, Jeffrey Dean, and Craig Chambers. Call graph construction in object-oriented languages. InProceedings of the 12th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 108–124, 1997.

29. Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamic taint propagation for java.

InComputer Security Applications Conference, 21st Annual, pages 9–pp. IEEE, 2005.

30. Vivek Haldar, Deepak Chandra, and Michael Franz, editors. Dynamic taint propagation for Java., December 2005.

31. Mary W Hall and Ken Kennedy. Efficient call graph analysis. ACM Letters on Programming Languages and Systems (LOPLAS), 1(3):227–242, 1992.

(23)

32. Alex Ho, Michael Fetterman, Christopher Clark, Andrew Warfield, and Steven Hand.

Practical taint-based protection using demand emulation. InACM SIGOPS Operating Systems Review, volume 40, pages 29–41. ACM, 2006.

33. Yunhan Jack Jia, Qi Alfred Chen, Shiqi Wang, Amir Rahmati, Earlence Fernandes, Zhuoqing Morley Mao, Atul Prakash, and Shanghai JiaoTong Unviersity. Contexlot:

Towards providing contextual integrity to appified iot platforms. InNDSS, 2017.

34. Yu Kashima, Takashi Ishio, and Katsuro Inoue. Comparison of backward slicing techniques for java. IEICE TRANSACTIONS on Information and Systems, 98(1):119–

130, 2015.

35. Douglas Kramer. The java platform. White Paper, Sun Microsystems, Mountain View, CA, 1996.

36. Eugene Kuleshov. Using the asm framework to implement common java bytecode transformation patterns. Aspect-Oriented Software Development, 2007.

37. Yue Li, Tian Tan, and Jingling Xue. Understanding and analyzing java reflection. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(2):7, 2019.

38. Benjamin Livshits, Michael Martin, and Monica S Lam. Securifly: Runtime protection and recovery from web application vulnerabilities. Technical Report, 2006.

39. V Benjamin Livshits and Monica S Lam. Tracking pointers with path and context sensitivity for bug detection in c programs. InProceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering, pages 317–326, 2003.

40. V Benjamin Livshits and Monica S Lam. Finding security vulnerabilities in java applications with static analysis. InUSENIX Security Symposium, volume 14, pages 18–18, 2005.

41. Florian D Loch, Martin Johns, Martin Hecker, Martin Mohr, and Gregor Snelting.

Hybrid taint analysis for java ee. InProceedings of the 35th Annual ACM Symposium on Applied Computing, pages 1716–1725, 2020.

42. Mauro Marinilli. Java deployment. Sams Publishing, 2002.

43. Lutz Prechelt. An empirical comparison of c, c++, java, perl, python, rexx and tcl.

IEEE Computer, 33(10):23–29, 2000.

44. Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch, and Mira Mezini. Call graph construction for java libraries. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 474–486, 2016.

45. Eyal Ronen and Adi Shamir. Extended functionality attacks on iot devices: The case of smart lights. In2016 IEEE European Symposium on Security and Privacy (EuroS&P), pages 3–12. IEEE, 2016.

46. Dawn Song, D Brumley, H Yin, J Caballero, I Jager, MG Kang, Z Liang, J Newsome, P Poosankam, and P Saxena. Bitblaze: Binary analysis for computer security, 2013.

47. Robert F St¨ark, Joachim Schmid, and Egon B¨orger.Java and the Java virtual machine:

definition, verification, validation. Springer Science & Business Media, 2012.

48. Omer Tripp, Marco Pistoia, Stephen J Fink, Manu Sridharan, and Omri Weisman. Taj:

effective taint analysis of web applications. InACM Sigplan Notices, volume 44, pages 87–97. ACM, 2009.

49. Neil Vachharajani, Matthew J Bridges, Jonathan Chang, Ram Rangan, Guilherme Ottoni, Jason A Blome, George A Reis, Manish Vachharajani, and David I August.

Rifle: An architectural framework for user-centric information-flow security. In37th International Symposium on Microarchitecture (MICRO-37’04), pages 243–254. IEEE, 2004.

50. Bill Venners. The java virtual machine. McGraw-Hill, New York, 1998.

(24)

51. XiaoFeng Wang, Zhuowei Li, Ninghui Li, and Jong Youl Choi. Precip: Towards practical and retrofittable confidential information protection. InNDSS, 2008.

52. Rafael Winterhalter. Byte buddy.(2017). Retrieved April, 15:2017, 2017.

53. Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In2014 IEEE Symposium on Security and Privacy, pages 590–604. IEEE, 2014.

54. Jeong Yang, Young Lee, Amanda Fernandez, and Joshua Sanchez. Evaluating and securing text-based java code through static code analysis. Journal of Cybersecurity Education, Research and Practice, 2020(1):3, 2020.

55. Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazi`eres. Making information flow explicit in histar. InProceedings of the 7th symposium on Operating systems design and implementation, pages 263–278. USENIX Association, 2006.

56. David Yu Zhu, Jaeyeon Jung, Dawn Song, Tadayoshi Kohno, and David Wetherall.

Tainteraser: Protecting sensitive data leaks using application-level taint tracking. ACM SIGOPS Operating Systems Review, 45(1):142–154, 2011.

57. Aaron Zimba and Mumbi Chishimba. Understanding the evolution of ransomware:

paradigm shifts in attack structures. International Journal of computer network and information security, 11(1):26, 2019.