NGSfileformats
1.Supposechromosome2isthefollowingsequence ACACGACTAA
• Forthegenomicregioninchromosome2containingtheDNAsequence ACGAC,ifwedescribeitusingbedformat,whatarethe chromosome,start,andend?
• InSAM,whatisthealignmentpositionoftheDNAsequence ACGAC?
• InBAM,whatisthealignmentpositionoftheDNAsequence ACGAC?
Solution:
• Thebedformatis chr227
• SAMis1-basedcoordinateformat.Thealignmentpositionis3.
• BAMis0-basedcoordinateformat.Thealignmentpositionis2.
2.Pleaseperformthefollowingconversions.
(a)Convertthefollowingsetofintervalsfromthe0-basedcoordinateformattothe1-basedcoordinateformat:3 100,0 89and 1000 2000.
(b)Convertthefollowingsetofintervalsfromthe1-basedcoordinateformattothe0-basedcoordinateformat:3..100,1..89and 1000..2000.
Solution:
• 4 100,1 89and1001 2000.
• 2..100,0..89and999..2000.
3.GivenaBAMfile input.bam,wewanttofindallalignmentswith maQ> 0usingsamtools.Whatshouldbethecommand?
Solution: samtoolsview-q1input.bam.
4 SolutionhandbookforAlgorithmsforNext-GenerationSequencing
4.GiventwoBEDfiles input1.bed and input2.bed,wewanttofindall genomicregionsin input1.bed thatoverlapwithsomegenomicregions in input2.bed.Whatshouldbethecommand?
Solution: bedtoolsintersect-ainput1.bed-binput2.bed-wa.
5.Forthefollowingwigglefile,canyoucompute coverage(3, 8), mean(3, 8), minVal(3, 8), maxVal(3, 8)and stdev(3, 8)?
fixedStepchrom=chr1start=1step=1span=1
Solution: coverage(3, 8)=1, mean(3, 8)= 15+30+20+25+30+20 6 , minVal(3, 8)=15, maxVal(3, 8)=30and stdev(3, 10)=.
6.CanyouproposeascripttoconvertaBAMfileintoabigWigfile?
Solution: Theinputisabamfile file.bam whiletheoutputisa bigWigfile file.bw.Intheconversionprocess,werequiresafilecontainingthesizesofthechromosomesofthehumangenome,whichis hg.chrom.sizes.Theconversioncanbedonein3stepsasfollows. bamToBed-ifile.bam|awk’{OFS="";print$1,$2,$3,$4}’| sort-k1,1>file.bed genomeCoverageBed-ifile.bed-ghg.chrom.sizes-bg>file.cov bedGraphToBigWigfile.covhg.chrom.sizesfile.bw