Monday, October 3, 2011

JavaOne 2011: JVM Bytecode for Dummies

Charles Nutter's "JVM Bytecode for Dummies (and for the rest of you, as well)" was the last technical session for me early Monday evening. It was held in the Hilton San Francisco Yosemite Ballroom A/B/C. Ten to twenty percent of the audience responded affirmatively when Nutter asked who had done something with bytecode. Nutter stated that he's been working full time on JRuby for nearly five years and has had to work closely with JVM bytecode in this role. He stated that this particular presentation is really one of two parts at JavaOne 2011, though he has presented both as a single presentation previously.. Today's presentation is how to inspect it, how to generate it, and how it works. His other presentation will go even deeper into the bytecode. The room was packaged, so it seems that this is definitely a popular topic.

Nutter's "Bytecode Definition" consists of two bullets from Wikipedia. He provided a more detailed slide on "Byte Code," which is a one-byte instruction with 256 possible "opcodes" available (and about 200 used currently). Microsoft's CLR is another example, but it has two-byte "Wordcodes" and operations that are similar to the JVM.

Nutter stated that learning bytecode is useful for better understanding the platform and (his words) "it's fun to play with." From a more practical perspective, Nutter pointed out that there may come a day when a developer needs to read bytecode (rather than doing it just for fun). A fourth and less practical purpose is to write one's own JVM language.

Nutter showed a slide with Java's "Hello World" (which he called "the longest 'Hello World' in the world"). He then discussed javap and some of its options before showing different byte code representations produced by javap with different options.

Another option that Nutter talked about to emit byte codes was use of ASM to manipulate Java bytecode. Nutter also talked about BiteScript ["a (J)Ruby DSL for emitting JVM bytecode"] and Mirah. For developers who want only Java dependency, Nutter introduced JiteScript. Nutter had slides on the JVM Stack, basic operations, stack operations, stack juggling, typed opcodes, constant values, local variable tables, arrays, math operations, conversions, flow control, classes/types, invokedynamic, and exceptions and synchronization.

There were several bits of bytecode wisdom that Nutter told us about that I want to list. Nutter said that booleans in bytecode are treated simply as integers (reminds me of my C/C++ days!). Similarly, boolean and bitwise operators act as if on ints. Indeed, Nutter stated that integers get special treatment in Java bytecode. He also explained that 64-bit values require two slots. Instance methods have "this" at zero in the local variable tables. The "iinc" variable is what is produced from i++, but takes two arguments)! One is which one to increment and one is by how much. Nutter also showed that Java does have a GOTO at the bytecode level! Nutter noted that signatures of classes and types is probably the most difficult part.

Nutter showed an example of Fibonacci in byte code with BiteScript syntax before moving onto real-world cases using bytecode manipulation. He specifically mentioned JRuby, Groovy, Hibernate, java.lang.reflect.Proxy. He reviewed his tools including BiteScript, JiteScript, and ASM that backs both of them up. He then previewed his part 2 including tracking JVM bytecode and analyzing performance of JVM bytecode. It is scheduled for 10 am on Wednesday in this same room (Hilton Yosemite A/B/C).

The version of this presentation Nutter presented at OSCON 2011 is available on SlideShare. It appears to be very similar to the version he presented at OSCON 2011 (in fact, I found it helpful to look at it while listening to his presentation because it was hard to see the bottom of the screens in that room), but the JavaOne version adds slides on invokedynamic. Nutter made some simple but highly effective animations on many of his slides, so I highly recommend accessing this online version of the slides. Nutter also stated that you can paste the code in these animated slides into a BiteScript file and run it. One thing to note is that the OSCON version of slides has 209 slides (many exist because of "almost duplicate" slides used in "animation"), so it is evidently inclusive of both parts that Nutter mentioned at the beginning.

I believe that Nutter achieved his stated goal of presenting Java bytecode in an approachable manner. His slides are a great (and rare) combination of being useful for real-time presentation and useful as reference material. The "animation" (duplicate slides with changed/new/removed values) certainly took significant investment of his time, but the result is extremely helpful. Nutter mixes tables with reference material on codes and their meanings with animated charts showing samples of those in action. This is definitely a presentation to use to learn and to reference when trying to understand Java bytecode better. Even with Nutter splitting his overall presentation into two parts, this one (of two) parts was still loaded with details. It felt like I was drinking from a fire hose (and, for me, that's a good thing!). It wasn't an easy presentation to end the day with (no being lazy and checking out here), but Nutter's dynamic and energetic style helped keep the energy level reasonable. I even started to wonder if this possibly might be just a little "fun" after all?

No comments: