= Instructions to use splitMixed =

Please note:

 This is a priliminary implementation.
 Questions to: rolZand@cebitec.uni-bieleZfeld.de (without the two Zs!)

Command:

 java -jar splitMixed.jar ARGS


ARGS:

 String - file containing the clusters of the normal data set (see *)
 int - minimal fragment length for normal data set (including the reads)
 int - maximal fragment length for normal data set (including the reads)
 int - minimal size for clusters; smaller clusters are discarded
 String - file containing the mappings of the normal data set (see **)
 int - minimal fragment length for mixed data set (including the reads)
 int - maximal fragment length for mixed data set (including the reads)
 String - file containing the mappings of the mixed data set (see **)
 String - file name for output (see ***)

File formats:

 (*) cluster file:
 One cluster per line in form of a list of the contained mate pairs; the end of the cluster is indicated by END.
 Each mate pair is represented by the following elements separated by whitespaces: 
  ID (arbitrary)
  chromosome left mate maps to (ignored)
  F (for forward) (ignored)
  positioin of right end of left mate
  chromosome right mate maps to (ignored)
  R (for reverse) (ignored)
  positioin of left end of right mate 
  quality value

 Example:
  SOLEXA5_31:4:17:63:1196 chr21 F 286282 chr21 R 286482 36 30.0 SOLEXA5_51:5:5:1346:1531 chr21 F 286342 chr21 R 286561 36 64.0 END
  SOLEXA5_31:4:17:63:1617 chr21 F 634287 chr21 R 634515 36 41.0 SOLEXA5_51:6:93:471:1971 chr21 F 634283 chr21 R 634517 36 43.0 END

  To transform the output of GASV to this format, use the converter "convert.jar" as follows:
  java -jar convert.jar mappingsFile(see **) GASVclusterFile > newClusterFile


 (**) mapping file:
 output format of MAQ mapview


 (***) output:
  In the ".clusters" file, there is one line per cluster with the following columns:

  ID - one id per deletion, normal and refined have the same
  TYPE - normal, refined or somatic
  HET - hom or het or ---. hom means, there are indications that the deletion id homozygous. (preliminary and ignored)
  LL - left end of the left most left end of a mate pair in the cluster
  L - right end of the right most left end of a mate pair
  R - left end of the left most right end
  RR - right end of the right most right end
  (the breakpoint region is (L...R))
  RANMIN - minimal deletion length
  RANMAX - maximal deletion length
  CSIZE - cluster size = number of mappings in cluster
  MINQUAL - minimum of the quality of the mappings
  AVGQUAL - average of the quality of the mappings
  (the remaining columns will be empty)

  In the ".clusters.mappings" file, the corresponding mappings for each cluster are listed. Since normal and refined deletions have the same ID each, be careful with the ordering of the file. First=normal, second=refined. For each mapping the positions of the reads and the quality is given as well.

