Visualization

We can use STAR to generate the RNA-Seq "signal" files using the coordinate-sorted BAM file (Aligned.sortedByCoord.out.bam). It is calculated as the number of reads mapped to each genomic position. The output file can be visualized in genome browsers such as the UCSC genomic browser or IGV browser.

Execute

cd ~/LSLNGS2015/

STAR --runMode inputAlignmentsFromBAM --runThreadN 6 \ 
--inputBAMfile RNASEQ_data/star_GM12878_rep1/Aligned.sortedByCoord.out.bam --outWigType bedGraph \ 
--outWigStrand Stranded --outWigNorm RPM --outFileNamePrefix RNASEQ_data/GM1287_rep1_

STAR --runMode inputAlignmentsFromBAM --runThreadN 6 \ 
--inputBAMfile RNASEQ_data/star_GM12878_rep2/Aligned.sortedByCoord.out.bam --outWigType bedGraph \ 
--outWigStrand Stranded --outWigNorm RPM --outFileNamePrefix RNASEQ_data/GM1287_rep2_

Resource usage

Sample ALPS Queue Name CPU Time Max Memory Duration
GM12878 16G 2223.97 sec. 7 GB 37 minutes 8 seconds
K562 16G 2703.06 sec. 7 Gb 45 minutes 8 seconds

Options

--runThreadN number of threads to run STAR

--inputBAMfile path to coordinate-sorted BAM file

--outWigType type of signal output (bedGraph or wiggle)

--outWigStrand strandedness of wiggle/bedGraph output (Stranded or Unstranded)

--outWigNorm type of normalization for the signal (RPM or none)

--outFileNamePrefix output files name prefix

Take a look at the generated signal files

ls -la ~/LSLNGS2015/RNASEQ_data/*_Signal.*.bg

-rw------- 1 u00ycl06 u00ycm02 813667524 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 848273078 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 778409908 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 817934615 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 787034556 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 820179772 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 753637183 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 791210099 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 315665648 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 319934522 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 295364571 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 300931098 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 444649222 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 450522698 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 415331427 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 423833653 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.Unique.str2.out.bg

How to visualize on genome browsers

You can load bedGraph into IGV or convert the bedGraph to bigWig using the bedGraphToBigWig and load bigWig to IGV or UCSC Genome Browser.

bedGraph to bigWig

Create the chrom.sizes file from FASTA sequence

samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa

bedGraphToBigWig Usage

bedGraphToBigWig in.bedGraph chrom.sizes out.bw

Unfortunately, the bedGraph might not be sorted properly. If you encounter error when executing bedGraphToBigWig, such as:

GM1287_rep1_Signal.Unique.str1.out.bg is not case-sensitive sorted at line 28093639.
Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C, or bedSort and try again.

You can use sort to sort the file, such as:

LC_COLLATE=C sort -k1,1 -k2,2n -o GM1287_rep1_Signal.Unique.str1.out.bg GM1287_rep1_Signal.Unique.str1.out.bg

Then re-run the conversion

bedGraphToBigWig GM1287_rep1_Signal.Unique.str1.out.bg \ 
../GENOME_data/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai \ 
GM1287_rep1_Signal.Unique.str1.out.bw

View in UCSC Genome Browser (Requires access to webserver) a bug regarding chr name

Firts, you need to upload/place the bigWig files at a location accessible online, i.e. in a web server, for UCSC Genome Browser to use and display it.

To view the bigWig as custom tracks, we create a text file (e.g. mytracks.txt) with following information.

track type=bigWig name=GM1287_rep1_str1 smoothingWindow=4 color=255,0,0 autoScale=on viewLimits=1:200 visibility=full windowingFunction=maximum bigDataUrl=https://domain_name_or_ip/folder_name/GM1287_rep1_Signal.UniqueMultiple.str1.out.bw

track type=bigWig name=GM1287_rep1_str2 smoothingWindow=4 color=0,0,255 autoScale=on viewLimits=1:200 visibility=full windowingFunction=maximum bigDataUrl=https://domain_name_or_ip/folder_name/GM1287_rep1_Signal.UniqueMultiple.str2.out.bw

Then, in your web browser, use the below URL to call your file:

https://genome.ucsc.edu/cgi-bin/hgTracks?db=GRCh38&position=chr6:122657583-122663796&hgct_customText=https://domain_name_or_ip/folder_name/mytracks.txt

results matching ""

    No results matching ""