The purpose of this assignment is to get you acquainted with dynamic analysis. In particular, it will help you get familiar with dynamic bytecode instrumentation using ASM, a popular Java bytecode manipulation and analysis library used in many software products.
From a high level view, ASM provides a simple API for decomposing, modifying, and recomposing Java bytecode. When used together with Java Agent, it provides a powerful way to dynamically manipulate Java classes as they're being loaded into the Java Virtual Machine (JVM).
In this section, we’ll get started by downloading ASM and setting up your development environment.
Following this page to install the Eclipse ASM plugin. The plugin will help you automatically generate Java bytecode from source code and visualize bytecode.
In your Eclipse workspace, create a new Java project "a2
".
Download asm-7.1.jar into the directory "a2/lib/
" and
add it to the build path of the project.
Java provides a cool option, called "javaagent
", to dynamically modify Java classes as they're being loaded into the JVM.
To use this option, we simply need to write an agent - a piece of Java code that respects some format,
and execute the command java -javaagent:jarpath[=options] program
, where
jarpath
is the path to the agent, options
is the agent options,
and program
is any Java program you want to run.
Specifically, the agent is a JAR file containing a folder "META-INF
", which contains a file named "MANIFEST.MF
".
The file "MANIFEST.MF
must contain the attribute Premain-Class
.
The value of this attribute is the name of the agent class.
The agent class implements a premain method public static void premain(String agentArgs, Instrumentation inst)
,
which is similar to the main application entry point. After the JVM has initialized, the premain method will be called first, then the real application main method.
You can find more detailed description about Java agent on the official documentation page.
Let's first create an agent file "iagent.jar
" containing only the folder "META-INF
"
and file "MANIFEST.MF
", and name the agent class as Transformer
, that is, add "Premain-Class: Transformer
" in "MANIFEST.MF
".
$ cd a2/lib
$ mkdir META-INF
$ echo "Premain-Class: Transformer" > META-INF/MANIFEST.MF
$ jar cfm iagent.jar META-INF/MANIFEST.MF
Next, we create the agent class Transformer
in the project source folder. The code is shown below:
public class Transformer implements ClassFileTransformer {
public static void premain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new Transformer());
}
public byte[] transform(ClassLoader loader,String cname, Class<?> c, ProtectionDomain d, byte[] cbuf) throws IllegalClassFormatException {
System.out.println("transforming class "+cname);
return cbuf;
}
}
The agent class Transformer
implements the method transform
in the interface ClassFileTransformer
, which will be called
when every Java class is first loaded. The method transform
may transform the supplied class file in cbuf
and return a new replacement.
We will implement a few interesting transformations using ASM later.
For now, we simply print the loaded class name and keep cbuf
unchanged.
We provide iagent.jar and Transformer.java. You can run a test on HelloThread.java with the following commands:
$ cd a2
$ java -javaagent:lib/iagent.jar -cp bin/ HelloThread
The following log will be printed, showing the loaded classes during the execution of HelloThread
.
transforming class java/lang/invoke/MethodHandleImpl
transforming class java/lang/invoke/MemberName$Factory
transforming class java/lang/invoke/LambdaForm$NamedFunction
transforming class java/lang/invoke/MethodType$ConcurrentWeakInternSet
transforming class java/lang/invoke/MethodHandleStatics
transforming class java/lang/invoke/MethodHandleStatics$1
transforming class java/lang/invoke/MethodTypeForm
transforming class java/lang/invoke/Invokers
transforming class java/lang/invoke/MethodType$ConcurrentWeakInternSet$WeakEntry
transforming class java/lang/Void
transforming class java/lang/IllegalAccessException
transforming class sun/misc/PostVMInitHook
transforming class sun/launcher/LauncherHelper
transforming class HelloThread
transforming class sun/launcher/LauncherHelper$FXHelper
transforming class java/lang/Class$MethodArray
transforming class HelloThread$TestThread
transforming class java/lang/Shutdown
transforming class java/lang/Shutdown$Lock
We are going to use four important classes provided in ASM to perform instrumentation: ClassReader ClassWriter ClassVisitor MethodVisitor. Below is a typical usage of them:
ClassReader cr = new ClassReader(cbuf);
ClassWriter cw = new ClassWriter(cr, 0);
ClassVisitor cv = new ClassAdapter(cw);
cr.accept(cv, 0);
cbuf = cw.toByteArray();
class ClassAdapter extends ClassVisitor {
@Override
public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions);
return new MethodAdapter(mv);
}
ClassVisitor
is a visitor to visit a Java class, often subclassed to implement specific functionality.
For example, here we use ClassAdapter
which extends ClassVisitor
and overrides its visit methods.
Similarly, MethodVisitor
is a visitor to visit a Java method and is also often subclassed (for example, by MethodAdapter
here) to implement instrumentation.
ClassReader
parses the Java byte array cbuf
and calls the appropriate visit methods of
a given class visitor for each field, method and bytecode instruction encountered.
Finally, ClassWriter
generates a new byte array taking the effect of visit methods into consideration.
Example: suppose we want to trace the thread start operation Thread.start()
in HelloThread.java,
we'll write code in MethodAdapter
similar to below:
class MethodAdapter extends MethodVisitor {
MethodAdapter(MethodVisitor mv) {
super(Opcodes.ASM5,mv);
}
@Override
public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {
switch (opcode) {
case Opcodes.INVOKEVIRTUAL:
//check if it is "Thread.start()"
if(isThreadClass(owner) && name.equals("start") && desc.equals("()V")) {
mv.visitInsn(Opcodes.DUP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logStart","(Ljava/lang/Thread;)V",false);
}
default: mv.visitMethodInsn(opcode, owner, name, desc,itf);
}
}
}
In the code above, we override the method visitMethod
and add our instrumentation to call a method Log.logStart(t)
before any Thread.start()
instruction, where "t
" is the thread object.
The method Log.logStart(t)
is below:
public static void logStart(final Thread t) {
String name = Thread.currentThread().getName();
String name_t = t.getName();
System.out.println("Thread "+name+" start new Thread "+name_t);
}
We provide full sample code ClassAdapter.java, MethodAdapter.java, and Log.java.
To test on HelloThread.java, download them into your project source folder,
and uncomment the code in the transform
method in Transformer.java, then run:
$ java -javaagent:lib/iagent.jar -cp lib/asm-7.1.jar:bin/ HelloThread
It will print the following message:
Thread main start new Thread Thread-0
Google to find more ASM tutorials. This presentation might be interesting to you.
In this section, we'll continue to add instrumentation to trace more thread synchronization operations.
In addition to Thread.start()
, we are also interested in Thread.join()
,
Thread.wait()
, Thread.wait(long timeout)
, Thread.wait(long timeout, int nanos)
,
Thread.notify()
, Thread.notifyAll()
, as well as lock and unlock operations on synchronized
methods and blocks.
Your task: complete the code in MethodAdapter.java to add all these instrumentations. You are also provided with all the logging methods in Log.java, you goal is to invoke these methods at proper bytecode locations with correct arguments.
public static void logStart(final Thread t)
public static void logJoin(final Thread t)
public static void logLock(final Object lock)
public static void logUnlock(final Object lock)
public static void logWait(final Object o)
public static void logNotify(final Object o)
public static void logNotifyAll(final Object o)
Test your implementation with HelloThread1.java, it should print similar logging messages as below:
$ java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/ HelloThread1
Thread main start new Thread Thread-0
Thread Thread-0 lock object 272890728
Thread Thread-0 wait signal on object 272890728
Thread main start new Thread Thread-1
Thread main lock object 854453928
Thread main unlock object 854453928
Thread Thread-1 lock object 272890728
Thread Thread-1 notify signal on object 272890728
Thread Thread-1 unlock object 272890728
Thread Thread-0 unlock object 272890728
Hints
synchronized
block correspond to MONITORENTER
and MONITOREXIT
.
In MethodAdaptor
, you can override the method visitInsn
to add your instrumentation.
For synchronized
methods, there are no corresponding MONITORENTER
and MONITOREXIT
.
You need to instrument after the beginning and before the end of a synchronized method. See the sample code below.
It has implemented most parts, but is not complete.
@Override
public void visitInsn(int opcode) {
switch (opcode) {
case Opcodes.MONITORENTER:
mv.visitInsn(Opcodes.DUP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logLock","(Ljava/lang/Object;)V",false);
break;
case Opcodes.MONITOREXIT:
mv.visitInsn(Opcodes.DUP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
break;
case Opcodes.IRETURN:
case Opcodes.LRETURN:
case Opcodes.FRETURN:
case Opcodes.DRETURN:
case Opcodes.ARETURN:
case Opcodes.RETURN:
case Opcodes.ATHROW:
{
if(isSynchronized){
if(isStatic){
mv.visitInsn(Opcodes.ACONST_NULL);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
}
else{
mv.visitVarInsn(Opcodes.ALOAD, 0);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, "Log", "logUnlock","(Ljava/lang/Object;)V",false);
}
}
}
default:break;
}
mv.visitInsn(opcode);
}
In this section, we proceed to use dynamic instrumentation to trace another kind of operations: heap accesses.
In addition to tracing field accesses that we have done with Soot in Assignment 1,
we are also interested in tracing array accesses. Different from the Jimple code in Soot,
at the bytecode level there are four different field access instructions GETSTATIC
GETFIELD
PUTSTATIC
PUTFIELD
and 16 array access instructions
AALOAD
BALOAD
CALOAD
DALOAD
FALOAD
IALOAD
LALOAD
SALOAD
AASTORE
BASTORE
CASTORE
DASTORE
FASTORE
IASTORE
LASTORE
SASTORE
.
If you are interested in knowing all the 256 Java bytecodes (198 are currently in use) for Java 8, here is a full list of them.
Wiki has also a good summary of them.
We also provide the logging methods for field and array accesses in Log.java:
public static void logFieldAcc(final Object o, String name, final boolean isStatic, final boolean isWrite)
public static void logArrayAcc(final Object a, int index, final boolean isWrite)
Your task: write code in MethodAdapter.java to instrument every field and array access to print out the access information. For example, on HelloThread2.java, it should print similar logging messages as below:
Thread main wrote instance field y of object HelloThread2@4a248a0a
Thread main wrote instance field b of object HelloThread2@4a248a0a
Thread main wrote instance field c of object HelloThread2@4a248a0a
Thread main wrote static field x
Thread main wrote static field z
Thread main wrote instance field y of object HelloThread2@4a248a0a
Thread main read instance field c of object HelloThread2@4a248a0a
Thread Thread-0 read instance field val$hello of object HelloThread2$1@5f8a8ae7
Thread main wrote array [index] [C@a574b2 [0]
Thread Thread-0 read instance field c of object HelloThread2@4a248a0a
Thread main read instance field c of object HelloThread2@4a248a0a
Thread Thread-0 read array [index] [C@a574b2 [0]
Thread main wrote array [index] [C@a574b2 [1]
Thread Thread-0 read instance field val$hello of object HelloThread2$1@5f8a8ae7
Hints
visitFieldInsn
:@Override
public void visitFieldInsn(int opcode, String owner, String name, String desc) {
switch (opcode) {
case GETSTATIC:
//your code here
break;
case PUTSTATIC:
//your code here
break;
case GETFIELD:
//your code here
break;
case PUTFIELD:
//your code here
//this part is slightly more complicated
default: break;
}
mv.visitFieldInsn(opcode, owner, name, desc);
}
visitInsn
:case AALOAD:case BALOAD:case CALOAD:case SALOAD:case IALOAD:case FALOAD:case DALOAD:case LALOAD:
//your code here
break;
case AASTORE:case BASTORE:case CASTORE:case SASTORE:case IASTORE:case FASTORE:
//your code here
//this part is slightly more complicated
break;
case DASTORE:case LASTORE:
//your code here
//this part is slightly more complicated
break;
visitMaxs
:@Override
public void visitMaxs(int maxStack, int maxLocals) {
mv.visitMaxs(X, Y); // set X and Y to a proper value
}
Bonus: can you also instrument the value of each heap access?
That is, call the following methods upon each field and array access, where v
is the value
read or written by the access:
public static void logFieldAcc(final Object o, String name, final boolean isStatic, final boolean isWrite, final Object v)
public static void logArrayAcc(final Object a, int index, final boolean isWrite, final Object v)
Data races, or race conditions, are an important class of concurrency errors that have caused many serious problems in real world software. In this section, we are going to develop a dynamic race detection tool based on the instrumention in the first two parts.
By definition, a data race occurs when there exist (1) two concurrent accesses to the same data by different threads, (2) at least one of them is a write, and (3) the two accesses are not properly protected by synchronizations, that is, not protected by the same lock or there is no happens-before relation between them.
Your task is to implement a simple algorithm that checks the three conditions above.
Specically, the algorithm runs in five steps:
1. find shared data;
2. find accesses to shared data;
3. find pair of conflicting accesses by different threads;
4. for each pair, check if the two accesses are protected by the same lock;
5. for each pair, check if any one of the two accesses must happen before the other.
The must-happen-before relation (→
) is defined as: for two accesses e1
and e2
, e1 → e2
holds
if any of the following three conditions is satisfied:
e1
and e2
are by the same thread and e1
is executed before e2
;e1
starts a new thread t
and e2
is the first event of t
;e3
such that e1 → e3
and e3 → e2
.
For each pair of conflicting accesses by different threads,
if both 4 and 5 are satisfied, we report a race formed by the two accesses.
For example, for the following code snippets in HelloThread3.java,
line 10 x =1
and line 27 r2 =x
form a race because they both access x
and line 10 is a write,
and line 27 is not protected by a lock.
Most of your code will be written in Log.java.
To test your implementation, you can run:
java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/ HelloThread3
It is expected to print the following message or similar:
Data race detected:
Thread main wrote static field x
Thread Thread-0 read static field x
To evaluate your tool finally, you should be able to run it on a real program ftpserver.jar:
java -javaagent:lib/iagent.jar -cp lib/asm-5.0.3.jar:bin/:ftpserver.jar driver.FTPMainDriver
Bonus: the race report above doesn't look very good because there is no information about the location of race accesses. Can you also get the class name and the line number of each access? That is, to report something like below:
Data race detected:
Thread main wrote static field HelloThread3.x at line 10
Thread Thread-0 read static field HelloThread3.x at line 27
Hints
visitLineNumber
in class MethodAdapter
.