Tuesday, March 3, 2015

What's the diff?

Hello everyone!


Have you ever had to compare two files?  It's pretty easy on linux, there is a nice standard tool called 'diff' which can show you the differences between two text files.  you also can use vimdiff to get a nicer way of seeing the differences, word by word instead of just line by line...

But what do you do if you need to diff two jar files?  or if you want to diff two binary files while also displaying any strings that happen to be present?

Behold, a nice script that will let you use almost any tool on your path to output a representation of, well, ANYTHING, and then diff it with the tool of your choice.  I may end up developing a version that could take different command line arguments to change things up, but right now I just edit the file and save it with different names, like jardiff.pl, binarydiff.pl, etc...

jardiff.pl
#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;

my $left = shift();
my $right = shift();

die unless ((-e $left) and (-e $right));

my $vis_tool = "jar -tf";
my $diff_tool = "vimdiff";

my $cmd="bash -c \"$diff_tool <($vis_tool $left) <$vis_tool $right)\"";
exit system($cmd);
Lets go over what is going on with each line...

#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
A good way to start out perl scripts, first line tells your shell what program to execute, and the rest of the lines are good ways of checking yourself if you make a mistake.. giving useful parsing error messages, enforcing stricter syntax, etc..

my $left = shift();
my $right = shift();
This brings the 1st and 2nd arguments off the command line and stores them in the $left and $right values.

die unless ((-e $left) and (-e $right));
This line exits the script if both the left and right arguments do not exist as files on your system.  I should add a usage statement to make things easier.

my $vis_tool = "jar -tf";
my $diff_tool = "vimdiff";
Here is where our choices come in.  In this instance, our visualization tool is 'jar -tf', which lists the contents of a jar specified by the file you pass to the command.  If you also care about the dates and timestamps, you can add the -v command to also display that additional information.

The $diff_tool is pretty self explanatory... you can use diff, vimdiff, or whatever diff program you have available.

my $cmd="bash -c \"$diff_tool <($vis_tool $left) <$(vis_tool $right)\"";
exit system($cmd);
And this simply puts all the parts together, and runs it on the shell. Lets work left to right.

To make sure we execute this with bash, we use bash -c \" $CMD_HERE \" to execute the command.  We need to escape the quotes because we quoted the string to assign it to my $cmd.

The $diff_tool is pretty simple, we just envoke the command we specified earlier.  In most cases, your diff tool expects to get two files to compare (you can use a tool like diff3 to do a three-way diff).

This brings us to the <( ... )operator.  This executes the given command in a subshell, and output gets passed in as a file handle back to the original command.  If we didn't do this, we would have to dump the output of each of the individual $vis_tool commands to a temp file, and then pass each file to the diff tool, and then clean up our temp files.

Finally, we actually execute the command with
exit system($cmd);
The other really useful way that I make this script is my 'binary' diff version.  Most people would use hexdump to  get a more diff-able version of a binary file, but I actually prefer to use xxd to get output that is both in hex and it attempts to 'stringify' as much of the binary as it can, and it shows it off to the side.

daryl$ xxd xpp3_min-1.1.4c.jar | head 
0000000: 504b 0304 0a00 0000 0000 12a5 6a35 0000  PK..........j5.. 0000010: 0000 0000 0000 0000 0000 0900 0000 4d45  ..............ME 0000020: 5441 2d49 4e46 2f50 4b03 040a 0000 0008  TA-INF/PK....... 0000030: 0011 a56a 35e5 823f 7b5e 0000 006a 0000  ...j5..?{^...j.. 0000040: 0014 0000 004d 4554 412d 494e 462f 4d41  .....META-INF/MA 0000050: 4e49 4645 5354 2e4d 46f3 4dcc cb4c 4b2d  NIFEST.MF.M..LK- 0000060: 2ed1 0d4b 2d2a cecc cfb3 5230 d433 e0e5  ...K-*....R0.3.. 0000070: 72cc 4312 712c 484c ce48 5500 8a01 25cd  r.C.q,HL.HU...%. 0000080: f48c 78b9 9c8b 5213 4b52 5374 9d2a 41ea  ..x...R.KRSt.*A. 0000090: 4df5 0ce2 0d8c 7593 0ccc 1534 824b f314  M.....u....4.K..

This diff technique has been very helpful for me comparing all kinds of files, and I hope it can help you too!