The CompLearn Command Shell

Command Names
  ncd - computes the Normalized Compression Distance
  maketree - generates a best-fitting unrooted binary tree from a given
             distance matrix.

QuickStart

ncd, by default, uses the blocksort compressor and file input format.  Two
filenames are passed in as command-line arguments. The contents of the files
are then compressed and the NCD between two files is returned.

Example:

  ncd filename1 filename2

Selecting a Compressor

There are currently three compressors supported by the ncd command-line tool:
bzip, zlib and google. A compressor may be selected by adding a -C or
--compressor option, followed by the compressor type. Please note: In order for
the google compressor to work, you must obtain a GoogleKey and place it in your
Complearn configuration file.  See Creating a Configuration File.

Option:

  -C, --compressor=[ bzip | zlib | google ]

Example:

  ncd -C zlib filename1 filename2
  ncd --compressor=google filename1 filename2


Selecting a Input Mode

The input mode selected determines how a DataBlock Enumeration is created.
The default mode is file mode and may be changed by adding command-line
options which switch to a new mode.  Such a command-line option is followed
by one or more arguments, depending on the mode selected.

File Mode
Takes as an argument a filename whose contents are to be compressed.

String Literal Mode
Takes as an argument a string whose contents are to be compressed. By default,
each string literal is separated by whitespace. For string literals containing
white space, surround with double quotes.

Plain List Mode
Takes as an argument a filename which contains list of filenames to be
individually compressed. Each filename is separated by a linebreak.

Term List Mode
Takes as an argument a filename whose contents contain a list of string
literals to be individually compressed.  Each string literal is separated by a
linebreak.

Directory Mode
Takes as an argument the name of a directory whose file contents are
individually compressed.

Windowed Mode
Takes as an argument a filename and options which define how the file is to be
enumerated into "windows." Windows are created using the following criteria in
bytes: starting position, step size, window width and last position. The
resulting windows are then individually compressed.

Options:
  -f, --file-mode=FILE
  -l, --literal-mode=STRING
  -p, --plainlist-mode=FILE
  -t, --termlist-mode=FILE
  -d, --directory-mode=DIR
  -w, --windowed-mode=FILE,firstpos,stepsize,width,lastpos

Examples:

  ncd filename1 -l string1
    - computes the NCD between contents of a file and a string literal

  ncd -l string1 -f filename1
    - computes the NCD between a string literal and the contents of a file

  ncd -l string1 "s t r i n g 2"
    - computes the NCD between two string literals

  ncd -p filename1 -f filename2
    - computes a list of NCDs for files in a plain list and a single file

  ncd -t filename1 -d directory1
    - computes a matrix of NCDs for string literals in a term list and the
      files found in a directory

  ncd -w filename1,1,25,50,100 filename1,10,25,50,110
    - computes a matrix of NCDS for windows created from a single file against
      windows from the same file but with a different starting position and
      ending position



