MIGEC log structureΒΆ
Below is the description of log files produced by various MIGEC routines.
Checkout
De-multiplexing, barcode extraction and overlapping:
INPUT_FILE_1first input file containing R1 readsINPUT_FILE_2second input file containing R2 readsSAMPLEsample nameMASTERnumber of reads where primary (master) barcode was detectedSLAVEnumber of reads where secondary (slave) barcode was detectedMASTER+SLAVEnumber of reads where both barcodes wereOVERLAPPEDnumber of succesfully overlapped reads
Histogram
The routine produces a number of histograms for UMI coverage, i.e. statistics of the number of reads tagged with a given UMI:
overseq.txtcontains sample id and sample type (single/paired/overlapped) in the header, followed by UMI coverage (MIG size). Each row has total read counts for UMIs corresponding to a given UMI coverageoverseq-units.txtsame asoverseq.txt, but lists numbers of unique UMIs, not total read countsestimates.txtcontains sample id, sample type, total number of reads (TOTAL_READS) and UMIs (TOTAL_MIGS) in the sample and selected thresholds:OVERSEQ_THRESHOLD- UMI coverage threshold,COLLISION_THRESHOLD- if greater or equal toOVERSEQ_THRESHOLDwill search for UMIs that differ by a single mismatch and have a huge count difference and treat them as being the same UMI,UMI_QUAL_THRESHOLD- threshold for min UMI sequence quality,UMI_LEN- UMI lengthcollision1.txt- same asoverseq.txt, but lists only UMIs that are likely to be erroneous (i.e. have a 1-mismatch UMI neighbour with a substantially higher count)collision1-units.txt- same ascollision1.txt, but lists numbers of unique UMIs, not total read countspwm.txtandpwm-units.txt- a position weight matrix (PWM) representation of all UMI sequences
Assemble
Statistics of MIG (group of reads tagged with the same UMI) consensus sequence assembly. Note that it also contains summary of pre-filtering steps, e.g. UMIs with low coverage are filtered at this stage:
SAMPLE_IDsample nameSAMPLE_TYPEsample type (single/paired/overlapped)INPUT_FASTQ1first input file containing R1 readsINPUT_FASTQ2second input file containing R2 readsOUTPUT_ASSEMBLY1first output file containing R1 consensusesOUTPUT_ASSEMBLY2second output file containing R2 consensusesMIG_COUNT_THRESHOLDUMI coverage threshold used in assemble procedureMIGS_GOOD_FASTQ1number of succesfully assembled consensuses from R1MIGS_GOOD_FASTQ2same for R2MIGS_GOOD_TOTALnumber of succesfully assembled consensuses that have both R1 and R2 partsMIGS_TOTALtotal number of input UMIs prior to coverage filteringREADS_GOOD_FASTQ1number of reads in succesfully assembled consensuses from R1READS_GOOD_FASTQ2same for R2READS_GOOD_TOTALnumber of paired reads in succesfully assembled consensuses that have both R1 and R2 parts. If a given assembled consensus contains inequal number of reads in R1 and R2, an average number is added to this statisticREADS_TOTALtotal number of input reads prior to coverage filteringREADS_DROPPED_WITHIN_MIG_1number of reads dropped during consensus assembly as they had high number of mismatches to the consensus in R1READS_DROPPED_WITHIN_MIG_2same for R2MIGS_DROPPED_OVERSEQ_1number of UMIs dropped due to insufficient coverage in R1MIGS_DROPPED_OVERSEQ_2same for R2READS_DROPPED_OVERSEQ_1number of reads in UMIs dropped due to insufficient coverage in R1READS_DROPPED_OVERSEQ_2same for R2MIGS_DROPPED_COLLISION_1number of UMIs dropped due to being an erroneous (1-mismatch) variant of some UMI with higher count in R1MIGS_DROPPED_COLLISION_2same for R2READS_DROPPED_COLLISION_1number of reads in UMIs dropped due to being an erroneous (1-mismatch) variant of some UMI with higher count in R1READS_DROPPED_COLLISION_2same for R2
CdrBlast
Statistics of V(D)J mapping with BLAST algorithm:
SAMPLE_IDsample nameDATA_TYPEraw reads (raw) or assembled consensuses (asm)OUTPUT_FILEoutput file nameINPUT_FILESlist of input filesEVENTS_GOODnumber of MIGs (group of reads tagged with the same UMI, equals to number of reads for raw data) that were V(D)J mapped and passed the quality thresholdEVENTS_MAPPEDnumber of MIGs that were V(D)J mappedEVENTS_TOTALnumber of input MIGsREADS_GOODnumber of reads that were V(D)J mapped and passed the quality thresholdREADS_MAPPEDnumber of reads that were V(D)J mappedREADS_TOTALnumber of input reads
FilterCdrBlastResults
Statistics of the second round of TCR/Ig clonotype filtering that considers the number of supporting reads before and after consensus assembly:
SAMPLE_IDsample nameOUTPUT_FILEoutput file nameINPUT_RAWinput file containing CdrBlast results for raw readsINPUT_ASMinput file containing CdrBlast results for assembled consensusesCLONOTYPES_FILTEREDnumber of clonotypes (unique TCR/Ig V+CDR3 nucleotide+J combinations) that were filteredCLONOTYPES_TOTALnumber of input clonotypesEVENTS_FILTEREDnumber of MIGs in filtered clonotypesEVENTS_TOTALnumber of input MIGsREADS_FILTEREDnumber of reads in filtered clonotypesREADS_TOTALnumber of input readsNON_FUNCTIONAL_CLONOTYPESnumber of non-functional clonotypes that contain stop codon/frameshift in CDR3NON_FUNCTIONAL_EVENTSnumber of MIGs in non-functional clonotypesNON_FUNCTIONAL_READSnumber of reads in non-functional clonotypes