.. _describingFigure: Describing the figure ================================== The figure is characterized by a config file in json format, which is made up of 5 sections: :ref:`General`, :ref:`Output`, :ref:`Regions`, :ref:`Highlights` and :ref:`Tracks`. General ------- * ``layout``: * ``horizontal``: Draw all regions one after the other horizontally * ``circular``: Draw all regions one after the other in a circle * ``symmetrical``: Draw the regions in two rows, such that the tracks are symmetric: the bottom row has the tracks in the normal order, but the top row has its tracks in the reverse order. This is mainly intended to show copy numbers and breakpoints, with sv as the topmost track. * ``stacked``: Draw all regions horizontally, but instead of being next to each other horizontally, the different regions are stacked vertically. * ``reference``: Reference genome used. Files for hg19, hg38 and mm10 are provided. You can also choose a custom reference genome, but then you will need to provide the required files: * ``genes_file``: only required if you use a "genes" track, or if you want to highlight genes in a "copynumber" track. Should be in NCBI RefSeq format, which you can download for any reference genome from the `UCSC table browser `_ (Group: Genes and Genes Predictions, track: NCBI RefSeq, table: RefSeq All, output field separator: tsv, file type returned: gzip compressed). See `example here `_. Alternatively, gtf and gff3 files are also supported (but a bit slower). * ``cytobands_file``: only required if you use a "chr_axis" track with the "ideogram" style, or if you do not specify the end of a region (in which case the region is assumed to go until the end of the chromosome), or if you want to use the "add all chromosomes functionality. The file can be downloaded for any genome from the `UCSC table browser `_ (Group: Mapping and Sequencing, track: Chromosome Band (Ideogram), table: cytoBandIdeo, output field separator: tsv, file type returned: plain text). See `example here `_. Alternatively, you can provide a fasta index (.fai) in order to provide the chromosome lengths, but this will not be sufficient to plot ideograms. Output ------- * ``file``: Path to the output file. The format will be inferred from the file extension * ``dpi``: Higher values will result in higher resolution but larger file size. Even if the figure is saved as vector graphics, this parameter will still have an importance for the hic and alignments tracks because they are rasterized. * ``width``: Width of the figure in mm. The default is 180mm, which is a standard full-page figure. Regions ------- Regions are defined by chr, start and end. If end`_). Bedmethyl files must be bgzip-compressed and indexed with tabix, which can be done with ``bgzip sample.bedmethyl`` and ``tabix -p bed sample.bedmethyl.gz``). * bedgraph file: tab-separated file with no header and four columns: chr pos pos+1 basemodPercentage * 3-column tsv file: tab-separated file with no header and three columns: chr pos basemodPercentage .. image:: images/figure_basemod.png Parameters: * ``style``: "lines" (default) will plot a link all data points by a line. "dots" will show one dot per data point, which may be better for sparse data. * ``smooth``: only applicable if style is "lines". If the value is 0, will simply show the raw base modification frequency at each position, which might result in ragged lines. If the value is x>0, the basemodification frequency will be averaged among the next x and previous x positions (but only if they are within 100bp of the original position). Default: 4. * ``gap_frac``: only applicable if style is "lines". If two positions are separated by more than this value multiplied by the length of the region, the line will be split. This is to avoid long straight lines in places where there is no data. If you set this value to 1, there will always be a continous lines. Default: 0.1. * ``ymin``: minimum value for the y axis (default: 0). * ``ymax``: maximum value for the y axis (default: 1). * ``bams``: list of dictionaries with the following keys: * ``file``: path to a bam file with MM/ML tags. * ``base``: base for the base modification, e.g. C for cytosine. * ``mod``: modification, e.g. m for methylation or h for hydroxymethylation. * ``min_coverage``: minimum coverage at a position for the methylation frequency to be reported. * ``linewidth``: Width of the line showing the basemod frequency. * ``opacity``: Opacity of the line showing the basemod frequency. * ``fix_hardclip``: see fix_hardclip_basemod for alignments. * ``split_by_haplotype``: Whether or not to split by haplotype. * ``colors``: list of one (if split_by_haplotype is False) or two (if split_by_haplotype is True) colors for the lines showing the basemod frequency. * ``bedmethyls``: list of dictionaries with the following keys: * ``file``: path to a bedmethyl file, a bedgraph file, or a 3-column tsv file (see above for details about these formats). * ``mod``: modification, e.g. m for methylation or h for hydroxymethylation. * ``min_coverage``: minimum coverage at a position for the methylation frequency to be reported. * ``linewidth``: Width of the line showing the basemod frequency. * ``opacity``: Opacity of the line showing the basemod frequency. * ``color``: Color of the line showing the basemod frequency. sv ^^^^^^^^ Track with arcs showing structural variants. If only one of the two breakends is within a displayed region, then only a line will be drawn, with a label indicating the chromosome of the other breakend. Parameters: * ``file``: file containing the SV information. Can be a vcf, a bedpe file, or a tsv file with at least the four columns: "chr1", "pos1", "chr2" and "pos2". For a tsv file, you can also provide the "strand1" and "strand2" columns in order to color SVs according to SV type (see below), or directly a "color" column, otherwise all SVs will be black. * ``lw``: line width for the arcs showing the SVs. * ``color_del``, ``color_dup``, ``color_T2T``, ``color_H2H``, ``color_trans``: color of the arc representing the SV, depending on the SV type (respectively: deletion, duplication, tail-to-tail inversion, head-to-head inversion, translocation). If the input file contains neither colors nor orientation, then all arcs will be colored as ``color_del``. * ``min_sv_height``: when a circular layout is used, minimum height of the SVs, between 0 and 1 (default: 0.1). Increasing this value will result in short SVs (where both breakends are close) to have a larger height. copynumber ^^^^^^^^^^ Track showing copy numbers, for WGS data. There is no standard format to represent this data, so currently figeno accepts as input different formats produced by CNA callers: * `Control-FREEC `_, in which case you need to provide a freec_ratios file, which contains the copy number ratios in each bin, and optionally the called CNAs in freec_CNAs. * `purple `_, in which case you need to provide a purple_cn file, which contains the segmented copy numbers. * `delly `_, in which case you need to provide a delly_cn file, which contains the copy number in each bin, and optionally the called CNAs in delly_CNAs. The called CNAs will be used to colour the dots based on the called copy number, otherwise the dots will be colored based on their own copy number (which might be noisy). You can of course call CNAs with a different program, but you will then need to convert the files to one of the accepted formats (or you can suggest the addition of a new supported format). Parameters: * ``input_type``: "freec", "purple" or "delly". * ``freec_ratios``: tsv file containg at least the three columns: "Chromosome", "Start", "Ratio" (other columns will be ignored). Chromosome and start indicate the genomic position of the bin, and the copy number is ratio multiplied by ploidy. Rows with a ratio <0 will be ignored. See `example file `_. * ``freec_CNAs``: tsv file containing 5 columns without header. Each row indicates a copy number variant, where the columns indicate: chromosome, start, end, copy number, and the type of CNV ("gain" or "loss"). The called CNAs will be used to colour the dots based on the called copy number, otherwise the dots will be colored based on their own copy number (which might be noisy). See `example file `_. * ``purple_cn``: tsv file with the following columns: "chromosome", "start", "end", "copyNumber". The columns "bafCount" and "baf" are optional, if they are provided CNLOH will be shown. See `example file `_. * ``delly_cn``: tsv file (can be gzip-compressed) containing 6 columns: chr, start, end, binsize, counts, copy_number. The column names in the header will be ignored, but the order of the column is important. Only chr, start and copy_number will actually be used, so you can put placeholders for the other columns. See `example file `_. * ``delly_CNAs``: tsv file (without header) containing 5 columns: chr, start, end, id, copy_number (the column id is optional). Such a file can be created from the bcf file generated by delly with: ``bcftools query -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" out.bcf > seg.bed``. The called CNAs will be used to colour the dots based on the called copy number, otherwise the dots will be colored based on their own copy number (which might be noisy). See `example file `_. * ``genes``: comma-separated list of genes to highlight. * ``ploidy``: ploidy for the sample (default:2), only used if freec_ratios is used. * ``min_cn``, ``max_cn``: minimum and maximum copy number to display. If not provided, will automatically set these values to fit all copy numbers in the regions displayed. * ``marker_size``: size of the markers, if freec_ratios is provided (default: 0.7). * ``color_normal``, ``color_loss``, ``color_gain``, ``color_cnloh``: colors for the dots or segments depending on CNV status. CNLOH (copy neutral loss of heterozygosity) is only used if purple_cn is provided. * ``grid``: if True, might display horizontal and vertical lines (see below), depending on which of the other options are set. * ``grid_major``, ``grid_minor``: whether or not to display vertical lines for major and minor ticks, respectively. * ``grid_cn``: whether or not to display horizontal lines for each integer copy number. ase ^^^^^^^^ Allele specific expression: track showing the variant allele frequency of SNPs in RNA-seq and DNA-seq data. This can be useful to show when only one allele is expressed, for example for imprinted genes or genes activated by enhancer hijacking. The input data required for this track is a tsv file which can be generated by `fast_ase `_ and which has the following columns: contig, position, variantID, refAllele, altAllele, refCount, altCount, refCount_DNA, and altCount_DNA (where refCount and altCount are the counts of reads supporting the reference and alternative alleles in the RNA-seq data). .. image:: images/figure_ase.png :width: 500 Parameters: * ``file``: Path to the tsv file generated by `fast_ase `_. * ``vcf_DNA``: Path to a vcf file containing the read counts in the DNA-seq data. This is only required if, instead of using fast_ase, you used GATK HaplotypeCaller and GATK ASEReadCounter for preprocessing the bam files. * ``color1``, ``color2``: Colors for the minor and major allele frequencies. * ``min_depth``: Minimum number of reads covering a variant in the RNA-seq data to show it. * ``max_bar_width``: Maximum width of the bars, in mm. If too many variants are shown, the actual width might be narrower, to avoid overlaps. * ``lw``: Line width for the line connecting each bar to the position of the corresponding SNP. * ``only_exonic``: If true, will only show variants located in exons (default: false). * ``grid``: if true, will display horizontal lines at VAFs of 0.25, 0.50 and 0.75 (default: false).