Wednesday, July 3, 2013

Comparing JARs with Groovy

It can sometimes be useful to compare the contents of two JARs. In this blog post, I demonstrate a Groovy script that acts like a simple "diff" tool for comparing two JAR files.

The Groovy script shown here, jarDiff.groovy, can undoubtedly be improved upon, but does perform what I wanted it to. The script compares two provided JARs in the following ways:

  • Shows path, name, and size of both JARs regardless of whether they are identical or different.
  • Shows entries in each JAR that do not exist in the other JAR
  • Shows entries that are in common (by name) in each JAR but have different attributes (CRC, size, or modification date)

The above characteristics of the script's output mean that, for identical JARs, only the path/file name of each JAR and the size of each JAR are displayed. For different JARs, those same attributes will be displayed along with entries that exist in only one JAR and not the other and entries common between the two JARs with differing CRC, size, or modification date. An important distinction to make regarding this script is that it mostly is useful for comparing metadata in two JARs and does not provide differencing at the level of methods/APIs (as would be provided by a tool such as javap) or at the source code level (would require a decompiler). This script identifies that differences exist and these other tools can then be used to investigate the deeper details of the differences.

#!/usr/bin/env groovy

/**
 * jarDiff.groovy
 *
 * jarDiff.groovy <first_jar_file> <second_jar_file>
 *
 * Script that compares two JAR files, reporting basic characteristics of each
 * along with differences between the two JARs.
 */

if (args.length < 2)
{
   println "\nUSAGE: jarDiff.groovy <first_jar_file> <second_jar_file>\n"
   System.exit(-1)
}

TOTAL_WIDTH = 180
COLUMN_WIDTH = TOTAL_WIDTH / 2 - 3
ROW_SEPARATOR = "-".multiply(TOTAL_WIDTH)

import java.util.jar.JarFile

def file1Name = args[0]
def jar1File = new JarFile(file1Name)
def jar1 = extractJarContents(jar1File)
def file2Name = args[1]
def jar2File = new JarFile(file2Name)
def jar2 = extractJarContents(jar2File)

def entriesInJar1ButNotInJar2 = jar1.keySet() - jar2.keySet()
def entriesInJar2ButNotInJar1 = jar2.keySet() - jar1.keySet()

println ROW_SEPARATOR
println "| ${file1Name.center(COLUMN_WIDTH)} |${file2Name.center(COLUMN_WIDTH)} |"
print "| ${(Integer.toString(jar1File.size()) + " bytes").center(COLUMN_WIDTH)} |"
println "${(Integer.toString(jar2File.size()) + " bytes").center(COLUMN_WIDTH)} |"
println ROW_SEPARATOR

if (jar1File.manifest != jar2File.manifest)
{
   def manifestPreStr = "# Manifest Entries: "
   def manifest1Str = manifestPreStr + Integer.toString(jar1File.manifest.mainAttributes.size())
   print "| ${manifest1Str.center(COLUMN_WIDTH)} |"
   def manifest2Str = manifestPreStr + Integer.toString(jar2File.manifest.mainAttributes.size())
   println "${manifest2Str.center(COLUMN_WIDTH)} |"
   println ROW_SEPARATOR
}

entriesInJar1ButNotInJar2.each
{ entry1 ->
   print "| ${entry1.center(COLUMN_WIDTH)} |"
   println "${" ".center(entry1.size() > COLUMN_WIDTH ? 2 * COLUMN_WIDTH - entry1.size() : COLUMN_WIDTH)} |"
   println ROW_SEPARATOR
}
entriesInJar2ButNotInJar1.each
{ entry2 ->
   print "| ${" ".center(entry2.size() > COLUMN_WIDTH ? 2 * COLUMN_WIDTH - entry2.size() : COLUMN_WIDTH)}"
   println "| ${entry2.center(COLUMN_WIDTH)} |"
   println ROW_SEPARATOR
}

jar1.each 
{ key, value ->
   if (!entriesInJar1ButNotInJar2.contains(key))
   {
      def jar2Entry = jar2.get(key)
      if (value != jar2Entry)
      {
         println "| ${key.center(COLUMN_WIDTH)} |${jar2Entry.name.center(COLUMN_WIDTH)} |"
         if (value.crc != jar2Entry.crc)
         {
            def crc1Str = "CRC: ${value.crc}"
            def crc2Str = "CRC: ${jar2Entry.crc}"
            print "| ${crc1Str.center(COLUMN_WIDTH)} |"
            println "${crc2Str.center(COLUMN_WIDTH)} |"
         }
         if (value.size != jar2Entry.size)
         {
            def size1Str = "${value.size} bytes"
            def size2Str = "${jar2Entry.size} bytes"
            print "| ${size1Str.center(COLUMN_WIDTH)} |"
            println "${size2Str.center(COLUMN_WIDTH)} |"
         }
         if (value.time != jar2Entry.time)
         {
            def time1Str = "${new Date(value.time)}"
            def time2Str = "${new Date(jar2Entry.time)}"
            print "| ${time1Str.center(COLUMN_WIDTH)} |"
            println "${time2Str.center(COLUMN_WIDTH)} |"
         }
         println ROW_SEPARATOR
      }
   }
}

/**
 * Provide mapping of JAR entry names to characteristics about that JAR entry
 * for the JAR indicated by the provided JAR file name.
 *
 * @param jarFile JAR file from which to extract contents.
 * @return JAR entries and thir characteristics.
 */
def TreeMap<String, JarCharacteristics> extractJarContents(JarFile jarFile)
{
   def jarContents = new TreeMap<String, JarCharacteristics>()
   entries = jarFile.entries()
   entries.each
   { entry->
      jarContents.put(entry.name, new JarCharacteristics(entry.name, entry.crc, entry.size, entry.time));
   }
   return jarContents
}

UPDATE: The above script references a class called JarCharacteristics. This class is a simple data holder made really easy in Groovy thanks to Groovy's implicit property support and the utility of the @Canonical annotation.

JarCharacteristics.groovy
@groovy.transform.Canonical
class JarCharacteristics
{
   String name
   long crc
   long size
   long time
}

I did not need to write get/set methods as Groovy provides them out-of-the-box and the use of @Canonical means that I get equals(Object) and hashCode() overridden implementations "for free" along with implicit constructor support (as well as toString() which the script does not make use of).

Like all Groovy scripts, the above could be written in Java, but Groovy is better suited to script writing than Java. The above Groovy script makes use of Groovy features that I have covered in previous blog posts such as Scripted Reports with Groovy (for formatting output of differences) and Searching JAR Files with Groovy (for perusing and reading JAR files).

There are several potential enhancements for this script. These include having the script show differences in MANIFEST.MF files beyond the differences detected in all files in the JARs by comparing the contents of one manifest file to another. Other enhancements might use comparison of the methods defined on the classes/interfaces/enums contained in the JARs via use of reflection. For now, however, I am content to use javap or javac -Xprint to see the method changes once the above script identifies differences in a particular class, enum, or interface.

Being able to quickly identify differences between two JARs can be beneficial in a variety of circumstances such as comparing versions of one's own generated JARs for changes or for comparing JARs of provided libraries and frameworks that are not named in such a way to make their differences obvious. The Groovy script demonstrated in this post identifies high-level differences between two JARs and at the same time shows off some nice Groovy features.

6 comments:

@DustinMarx said...

I added the code listing for the JarCharacteristics class used by the Groovy script to the above blog post along with a paragraph before and paragraph after the code listing for JarCharacteristics.groovy.

Dustin

Unknown said...

Dustin,

Your script was exactly what I needed. I also used it to compare two WAR files.

Thanks!

I added some CliBuilder stuff to allow it to ignore timestamp, size and/or CRC differences to cut down on the noise.

Do you want the diff?

Rick

@DustinMarx said...

Rick, Yes, please do post the diff. I was thinking about using CliBuilder for a verbose option to show more on manifest differences. If you don't mind, I might merge your additions with my manifest verbosity option based upon my latest post and provide the entire script in a new post.

Dustin

Unknown said...

--- jarDiff.groovy.orig 2013-07-12 11:33:17.484919500 -0400
+++ jarDiff.groovy 2013-07-12 11:43:14.135477300 -0400
@@ -3,28 +3,50 @@
/**
* jarDiff.groovy
*
- * jarDiff.groovy
+ * jarDiff.groovy -htsc
*
* Script that compares two JAR files, reporting basic characteristics of each
* along with differences between the two JARs.
*/

-if (args.length < 2)
-{
- println "\nUSAGE: jarDiff.groovy \n"
- System.exit(-1)
-}
-
TOTAL_WIDTH = 180
COLUMN_WIDTH = TOTAL_WIDTH / 2 - 3
ROW_SEPARATOR = "-".multiply(TOTAL_WIDTH)

import java.util.jar.JarFile

-def file1Name = args[0]
+// Set up the CLI options
+//
+def cli = new CliBuilder( usage: 'jarDiff.groovy -h -tsc ')
+cli.with {
+ h longOpt: 'help', 'usage information'
+ t longOpt: 'ignoreTime', args: 0, required: false, type: Boolean, 'Ignore time differences'
+ s longOpt: 'ignoreSize', args: 0, required: false, type: Boolean, 'Ignore size differences'
+ c longOpt: 'ignoreCrc', args: 0, required: false, type: Boolean, 'Ignore CRC differences'
+}
+
+def opt = cli.parse(args)
+if (!opt) return
+if (opt.h) {
+ cli.usage()
+ System.exit(-1)
+}
+
+def ignoreTime = opt.t
+def ignoreSize = opt.s
+def ignoreCrc = opt.c
+
+if (opt.arguments().size < 2)
+{
+ println "Two JAR files required\n"
+ cli.usage()
+ System.exit(-1)
+}
+
+def file1Name = opt.arguments()[0]
def jar1File = new JarFile(file1Name)
def jar1 = extractJarContents(jar1File)
-def file2Name = args[1]
+def file2Name = opt.arguments()[1]
def jar2File = new JarFile(file2Name)
def jar2 = extractJarContents(jar2File)

@@ -60,36 +82,42 @@
println ROW_SEPARATOR
}

-jar1.each
+jar1.each
{ key, value ->
if (!entriesInJar1ButNotInJar2.contains(key))
{
def jar2Entry = jar2.get(key)
if (value != jar2Entry)
{
- println "| ${key.center(COLUMN_WIDTH)} |${jar2Entry.name.center(COLUMN_WIDTH)} |"
- if (value.crc != jar2Entry.crc)
- {
- def crc1Str = "CRC: ${value.crc}"
- def crc2Str = "CRC: ${jar2Entry.crc}"
- print "| ${crc1Str.center(COLUMN_WIDTH)} |"
- println "${crc2Str.center(COLUMN_WIDTH)} |"
- }
- if (value.size != jar2Entry.size)
- {
- def size1Str = "${value.size} bytes"
- def size2Str = "${jar2Entry.size} bytes"
- print "| ${size1Str.center(COLUMN_WIDTH)} |"
- println "${size2Str.center(COLUMN_WIDTH)} |"
- }
- if (value.time != jar2Entry.time)
- {
- def time1Str = "${new Date(value.time)}"
- def time2Str = "${new Date(jar2Entry.time)}"
- print "| ${time1Str.center(COLUMN_WIDTH)} |"
- println "${time2Str.center(COLUMN_WIDTH)} |"
+ boolean crcDiff = (!ignoreCrc && value.crc != jar2Entry.crc)
+ boolean sizeDiff = (!ignoreSize && value.size != jar2Entry.size)
+ boolean timeDiff = (!ignoreTime && value.time != jar2Entry.time)
+
+ if(crcDiff || sizeDiff || timeDiff) {
+ println "| ${key.center(COLUMN_WIDTH)} |${jar2Entry.name.center(COLUMN_WIDTH)} |"
+ if (crcDiff)
+ {
+ def crc1Str = "CRC: ${value.crc}"
+ def crc2Str = "CRC: ${jar2Entry.crc}"
+ print "| ${crc1Str.center(COLUMN_WIDTH)} |"
+ println "${crc2Str.center(COLUMN_WIDTH)} |"
+ }
+ if (sizeDiff)
+ {
+ def size1Str = "${value.size} bytes"
+ def size2Str = "${jar2Entry.size} bytes"
+ print "| ${size1Str.center(COLUMN_WIDTH)} |"
+ println "${size2Str.center(COLUMN_WIDTH)} |"
+ }
+ if (timeDiff)
+ {
+ def time1Str = "${new Date(value.time)}"
+ def time2Str = "${new Date(jar2Entry.time)}"
+ print "| ${time1Str.center(COLUMN_WIDTH)} |"
+ println "${time2Str.center(COLUMN_WIDTH)} |"
+ }
+ println ROW_SEPARATOR
}
- println ROW_SEPARATOR
}
}
}
@@ -111,4 +139,3 @@
}
return jarContents
}
-

Unknown said...

That may not have come out very nicely.

Can I email my version of the script to you?

Rick

@DustinMarx said...

siom79/japicmp is a project that provides a "comparison of two versions of a jar archive."