Introduction

The purpose of this assignment is to get you acquainted with dynamic analysis. In particular, it will help you get familiar with dynamic bytecode instrumentation using ASM, a popular Java bytecode manipulation and analysis library used in many software products.

From a high level view, ASM provides a simple API for decomposing, modifying, and recomposing Java bytecode. When used together with Java Agent, it provides a powerful way to dynamically manipulate Java classes as they're being loaded into the Java Virtual Machine (JVM).

Part 0: Getting Started

In this section, we’ll get started by downloading ASM and setting up your development environment.

Installing ASM

Following this page to install the Eclipse ASM plugin. The plugin will help you automatically generate Java bytecode from source code and visualize bytecode.

In your Eclipse workspace, create a new Java project "a2". Download asm-7.1.jar into the directory "a2/lib/" and add it to the build path of the project.

Quick Start with Java Agent

Java provides a cool option, called "javaagent", to dynamically modify Java classes as they're being loaded into the JVM. To use this option, we simply need to write an agent - a piece of Java code that respects some format, and execute the command java -javaagent:jarpath[=options] program, where jarpath is the path to the agent, options is the agent options, and program is any Java program you want to run.

Specifically, the agent is a JAR file containing a folder "META-INF", which contains a file named "MANIFEST.MF". The file "MANIFEST.MF must contain the attribute Premain-Class. The value of this attribute is the name of the agent class. The agent class implements a premain method public static void premain(String agentArgs, Instrumentation inst), which is similar to the main application entry point. After the JVM has initialized, the premain method will be called first, then the real application main method. You can find more detailed description about Java agent on the official documentation page.

Let's first create an agent file "iagent.jar" containing only the folder "META-INF" and file "MANIFEST.MF", and name the agent class as Transformer, that is, add "Premain-Class: Transformer" in "MANIFEST.MF".

$ cd a2/lib
$ mkdir META-INF
$ echo "Premain-Class: Transformer" > META-INF/MANIFEST.MF
$ jar cfm iagent.jar META-INF/MANIFEST.MF

Next, we create the agent class Transformer in the project source folder. The code is shown below:

public class Transformer implements ClassFileTransformer {
    public static void premain(String agentArgs, Instrumentation inst) {
		inst.addTransformer(new Transformer());
    }
    public byte[] transform(ClassLoader loader,String cname, Class<?> c, ProtectionDomain d, byte[] cbuf) throws IllegalClassFormatException {
	    System.out.println("transforming class "+cname);
	    return cbuf;
    }
}

The agent class Transformer implements the method transform in the interface ClassFileTransformer, which will be called when every Java class is first loaded. The method transform may transform the supplied class file in cbuf and return a new replacement. We will implement a few interesting transformations using ASM later. For now, we simply print the loaded class name and keep cbuf unchanged.

We provide iagent.jar and Transformer.java. You can run a test on HelloThread.java with the following commands:

$ cd a2
$ java -javaagent:lib/iagent.jar -cp bin/ HelloThread

The following log will be printed, showing the loaded classes during the execution of HelloThread.

transforming class java/lang/invoke/MethodHandleImpl
transforming class java/lang/invoke/MemberName$Factory
transforming class java/lang/invoke/LambdaForm$NamedFunction
transforming class java/lang/invoke/MethodType$ConcurrentWeakInternSet
transforming class java/lang/invoke/MethodHandleStatics
transforming class java/lang/invoke/MethodHandleStatics$1
transforming class java/lang/invoke/MethodTypeForm
transforming class java/lang/invoke/Invokers
transforming class java/lang/invoke/MethodType$ConcurrentWeakInternSet$WeakEntry
transforming class java/lang/Void
transforming class java/lang/IllegalAccessException
transforming class sun/misc/PostVMInitHook
transforming class sun/launcher/LauncherHelper
transforming class HelloThread
transforming class sun/launcher/LauncherHelper$FXHelper
transforming class java/lang/Class$MethodArray
transforming class HelloThread$TestThread
transforming class java/lang/Shutdown
transforming class java/lang/Shutdown$Lock

Quick Start with ASM

We are going to use four important classes provided in ASM to perform instrumentation: ClassReader ClassWriter ClassVisitor MethodVisitor. Below is a typical usage of them:

ClassReader cr = new ClassReader(cbuf);
ClassWriter cw = new ClassWriter(cr, 0);
ClassVisitor cv = new ClassAdapter(cw);
cr.accept(cv, 0);
cbuf = cw.toByteArray();

class ClassAdapter extends ClassVisitor {
@Override
public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
        MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions);
	return new MethodAdapter(mv);
}

ClassVisitor is a visitor to visit a Java class, often subclassed to implement specific functionality. For example, here we use ClassAdapter which extends ClassVisitor and overrides its visit methods. Similarly, MethodVisitor is a visitor to visit a Java method and is also often subclassed (for example, by MethodAdapter here) to implement instrumentation. ClassReader parses the Java byte array cbuf and calls the appropriate visit methods of a given class visitor for each field, method and bytecode instruction encountered. Finally, ClassWriter generates a new byte array taking the effect of visit methods into consideration.

Example: suppose we want to trace the thread start operation Thread.start() in HelloThread.java, we'll write code in MethodAdapter similar to below:

class MethodAdapter extends MethodVisitor {
    MethodAdapter(MethodVisitor mv) {
	super(Opcodes.ASM5,mv);
    }
    @Override
    public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {
    	switch (opcode) {
	    case Opcodes.INVOKEVIRTUAL:
		//check if it is "Thread.start()"
        	if(isThreadClass(owner) && name.equals("start") && desc.equals("()V")) {
	            mv.visitInsn(Opcodes.DUP);
	            mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logStart","(Ljava/lang/Thread;)V",false);
		}
	    default: mv.visitMethodInsn(opcode, owner, name, desc,itf);
	}
    }
}

In the code above, we override the method visitMethod and add our instrumentation to call a method Log.logStart(t) before any Thread.start() instruction, where "t" is the thread object. The method Log.logStart(t) is below:

public static void logStart(final Thread t) {
    String name = Thread.currentThread().getName();
    String name_t = t.getName();
    System.out.println("Thread "+name+" start new Thread "+name_t);
}

We provide full sample code ClassAdapter.java, MethodAdapter.java, and Log.java. To test on HelloThread.java, download them into your project source folder, and uncomment the code in the transform method in Transformer.java, then run:

$ java -javaagent:lib/iagent.jar -cp lib/asm-7.1.jar:bin/ HelloThread

It will print the following message:

Thread main start new Thread Thread-0

Google to find more ASM tutorials. This presentation might be interesting to you.

Part 1: Instrumenting Thread Synchronizations

In this section, we'll continue to add instrumentation to trace more thread synchronization operations. In addition to Thread.start(), we are also interested in Thread.join(), Thread.wait(), Thread.wait(long timeout), Thread.wait(long timeout, int nanos), Thread.notify(), Thread.notifyAll(), as well as lock and unlock operations on synchronized methods and blocks.

Your task: complete the code in MethodAdapter.java to add all these instrumentations. You are also provided with all the logging methods in Log.java, you goal is to invoke these methods at proper bytecode locations with correct arguments.

public static  void logStart(final Thread t)
public static  void logJoin(final Thread t)
public static  void logLock(final Object lock)
public static  void logUnlock(final Object lock) 
public static  void logWait(final Object o)
public static  void logNotify(final Object o)
public static  void logNotifyAll(final Object o) 

Test your implementation with HelloThread1.java, it should print similar logging messages as below:

$ java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/ HelloThread1
Thread main start new Thread Thread-0
Thread Thread-0 lock object 272890728
Thread Thread-0 wait signal on object 272890728
Thread main start new Thread Thread-1
Thread main lock object 854453928
Thread main unlock object 854453928
Thread Thread-1 lock object 272890728
Thread Thread-1 notify signal on object 272890728
Thread Thread-1 unlock object 272890728
Thread Thread-0 unlock object 272890728

Hints

  • In Java bytecode, the beginning and end of a synchronized block correspond to MONITORENTER and MONITOREXIT. In MethodAdaptor, you can override the method visitInsn to add your instrumentation. For synchronized methods, there are no corresponding MONITORENTER and MONITOREXIT. You need to instrument after the beginning and before the end of a synchronized method. See the sample code below. It has implemented most parts, but is not complete.
    @Override
    public void visitInsn(int opcode) {
        switch (opcode) {
    	case Opcodes.MONITORENTER:
    	    mv.visitInsn(Opcodes.DUP);
    	    mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logLock","(Ljava/lang/Object;)V",false);
    	    break;
    	case Opcodes.MONITOREXIT:
    	    mv.visitInsn(Opcodes.DUP);
    	    mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
    	    break;
    	case Opcodes.IRETURN:
    	case Opcodes.LRETURN:
    	case Opcodes.FRETURN:
    	case Opcodes.DRETURN:
    	case Opcodes.ARETURN:
    	case Opcodes.RETURN:
    	case Opcodes.ATHROW:
    	{
    	    if(isSynchronized){
    		if(isStatic){
    		    mv.visitInsn(Opcodes.ACONST_NULL);
    		    mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
    		}
    		else{
    		    mv.visitVarInsn(Opcodes.ALOAD, 0);
    		    mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
    		}
    	    }
    	}
    	default:break;
        }
        mv.visitInsn(opcode);
    }
    

Part 2: Instrumenting Field and Array Accesses

In this section, we proceed to use dynamic instrumentation to trace another kind of operations: heap accesses. In addition to tracing field accesses that we have done with Soot in Assignment 1, we are also interested in tracing array accesses. Different from the Jimple code in Soot, at the bytecode level there are four different field access instructions GETSTATIC GETFIELD PUTSTATIC PUTFIELD and 16 array access instructions AALOAD BALOAD CALOAD DALOAD FALOAD IALOAD LALOAD SALOAD AASTORE BASTORE CASTORE DASTORE FASTORE IASTORE LASTORE SASTORE. If you are interested in knowing all the 256 Java bytecodes (198 are currently in use) for Java 8, here is a full list of them. Wiki has also a good summary of them.

We also provide the logging methods for field and array accesses in Log.java:

public static void logFieldAcc(final Object o, String name, final boolean isStatic, final boolean isWrite)
public static void logArrayAcc(final Object a, int index, final boolean isWrite)

Your task: write code in MethodAdapter.java to instrument every field and array access to print out the access information. For example, on HelloThread2.java, it should print similar logging messages as below:

Thread main wrote instance field y of object HelloThread2@4a248a0a
Thread main wrote instance field b of object HelloThread2@4a248a0a
Thread main wrote instance field c of object HelloThread2@4a248a0a
Thread main wrote static field x
Thread main wrote static field z
Thread main wrote instance field y of object HelloThread2@4a248a0a
Thread main read instance field c of object HelloThread2@4a248a0a
Thread Thread-0 read instance field val$hello of object HelloThread2$1@5f8a8ae7
Thread main wrote array [index] [C@a574b2 [0]
Thread Thread-0 read instance field c of object HelloThread2@4a248a0a
Thread main read instance field c of object HelloThread2@4a248a0a
Thread Thread-0 read array [index] [C@a574b2 [0]
Thread main wrote array [index] [C@a574b2 [1]
Thread Thread-0 read instance field val$hello of object HelloThread2$1@5f8a8ae7

Hints

  • To instrument field accesses, you need to override the method visitFieldInsn:
  • @Override
    public void visitFieldInsn(int opcode, String owner, String name, String desc) {
          switch (opcode) {
                case GETSTATIC:
    		//your code here
                    break;
                case PUTSTATIC:
                	//your code here
                    break;
                case GETFIELD:
                	//your code here
                    break;
                case PUTFIELD:
                	//your code here
    		//this part is slightly more complicated
                 default: break;
          	}
        mv.visitFieldInsn(opcode, owner, name, desc);
    }    
    
  • To instrument array accesses, you need to handle the following cases in the method visitInsn:
  • case AALOAD:case BALOAD:case CALOAD:case SALOAD:case IALOAD:case FALOAD:case DALOAD:case LALOAD:
        //your code here
        break;
    case AASTORE:case BASTORE:case CASTORE:case SASTORE:case IASTORE:case FASTORE:
        //your code here
        //this part is slightly more complicated
        break;
    case DASTORE:case LASTORE:
        //your code here
        //this part is slightly more complicated
        break;
    
  • Be careful about double type and long type, they make things slightly more complicated.
  • You would need to store and load data from stack to local variables. Make sure the maximal stack size and the maximum number of local variables are properly set by overriding the method visitMaxs:
  • @Override
    public void visitMaxs(int maxStack, int maxLocals) {
        mv.visitMaxs(X, Y); // set X and Y to a proper value
    }
    

Bonus: can you also instrument the value of each heap access? That is, call the following methods upon each field and array access, where v is the value read or written by the access:

public static void logFieldAcc(final Object o, String name, final boolean isStatic, final boolean isWrite, final Object v)
public static void logArrayAcc(final Object a, int index, final boolean isWrite, final Object v)

Part 3: Dynamic Data Race Detection

Data races, or race conditions, are an important class of concurrency errors that have caused many serious problems in real world software. In this section, we are going to develop a dynamic race detection tool based on the instrumention in the first two parts.

By definition, a data race occurs when there exist (1) two concurrent accesses to the same data by different threads, (2) at least one of them is a write, and (3) the two accesses are not properly protected by synchronizations, that is, not protected by the same lock or there is no happens-before relation between them.

Your task is to implement a simple algorithm that checks the three conditions above. Specically, the algorithm runs in five steps: 1. find shared data; 2. find accesses to shared data; 3. find pair of conflicting accesses by different threads; 4. for each pair, check if the two accesses are protected by the same lock; 5. for each pair, check if any one of the two accesses must happen before the other. The must-happen-before relation () is defined as: for two accesses e1 and e2, e1 → e2 holds if any of the following three conditions is satisfied:

  • (i) e1 and e2 are by the same thread and e1 is executed before e2;
  • (ii) e1 starts a new thread t and e2 is the first event of t;
  • (iii) there exists another access e3 such that e1 → e3 and e3 → e2.

For each pair of conflicting accesses by different threads, if both 4 and 5 are satisfied, we report a race formed by the two accesses. For example, for the following code snippets in HelloThread3.java, line 10 x =1 and line 27 r2 =x form a race because they both access x and line 10 is a write, and line 27 is not protected by a lock. Most of your code will be written in Log.java. To test your implementation, you can run:

java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/ HelloThread3
It is expected to print the following message or similar:

Data race detected: 
Thread main wrote static field x
Thread Thread-0 read static field x

To evaluate your tool finally, you should be able to run it on a real program ftpserver.jar:

java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/:ftpserver.jar driver.FTPMainDriver

Bonus: the race report above doesn't look very good because there is no information about the location of race accesses. Can you also get the class name and the line number of each access? That is, to report something like below:

Data race detected: 
Thread main wrote static field HelloThread3.x at line 10
Thread Thread-0 read static field HelloThread3.x at line 27

Hints

  • You can get the line number of a bytecode instruction by overriding the method visitLineNumber in class MethodAdapter.