欢迎来到天天文库
浏览记录
ID:40713907
大小:5.48 MB
页数:33页
时间:2019-08-06
《Data-analysis-and-management-of-high-throughput-sequencing-data》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、DataAnalysis&Management!ofHigh-throughputSequencingData"QuoclinhNguyen"ResearchInformatics"GenomicsCore/MedicalResearchInstitute"CurrentIssues!CurrentIssues!The“QSEQ”file"Numberfilesperrun:120x8x(1
2、2)=960
3、1920filesNumberofreadsperrun:~40Mx8x(1
4、2)=~320M
5、640MTotaln
6、ucleotides(BPs):320Mx150x(1
7、2)=~48
8、96billionsWhatamIgoingtodowithmysequencingdata?"OverviewInfrastructure"So9、!!Cufflinks!!MACS!Archives!RTA!VeryFastNASRTA!/!dragQandQdrop!Gigabit!netowrk!/rcgenomics(20TBs)NAS/SAN!400TBs!MySQL!Oracle!SSH,FTP!hRp://lims.csms.edu!VMServer(Linux)LAMP!SMB,HTTP!Analysis!/!Mining!!LIMS(NGS/Microarray)Linux!(VM!&!Physical)!"OverviewInfrastructure10、"So11、TBs!>300nodesX2CPU=>600CPUs>300nodesx4G=>1200GsFastNASstoragecapacity~100TBsOtherstoragedevices~400TBsRNA-SeqAnalysisPipeline!Phase&1:&NGS&data&processing&Phase&2:&Basic&Analysis&Phase&3:&Downstream&Analysis&&&&&input!Binary!Data!Sample!1!….!Sample!N!genes.expr12、!transcripts.expr!BCL!files!BAM!BAM!hits.bam!transcripts.gif!Raw!Data!BAM!Q>!FASTQ!Cuffcompare!QSEQ!files!TopHat!Cuffdiff!Mapped!Reads!SAM!Q>BAM!!Mapped!reads!UCSC!!BAM!file!QCbysamtools!CLCBIO!Cufflinks!clean!genes/express!output!deliver!results!BAM!files!files!DataProc13、essing!Phase&1:&NGS&data&processing&Binary!Data!BCL!files!QC/Datafiltering•Binandremoveindexes•Removeadapters(ifany)Raw!Data!•RemoveduplicateQSEQ!files!•Removalofnon-mappedsequences•FilteroutreadmappingtoribosomalRNA•Percentageofribosomal?Mapped!Reads!SAM!Q>BAM!!QS14、EQfiles->Alignedfiles(BAM)QCbysamtools!•Parallelprocessing(MPI)•Whatisinthealignmentfiles?clean!BAM!files!The“BAM/SAM”file"BAM(BinaryAlignmentMapp
9、!!Cufflinks!!MACS!Archives!RTA!VeryFastNASRTA!/!dragQandQdrop!Gigabit!netowrk!/rcgenomics(20TBs)NAS/SAN!400TBs!MySQL!Oracle!SSH,FTP!hRp://lims.csms.edu!VMServer(Linux)LAMP!SMB,HTTP!Analysis!/!Mining!!LIMS(NGS/Microarray)Linux!(VM!&!Physical)!"OverviewInfrastructure
10、"So11、TBs!>300nodesX2CPU=>600CPUs>300nodesx4G=>1200GsFastNASstoragecapacity~100TBsOtherstoragedevices~400TBsRNA-SeqAnalysisPipeline!Phase&1:&NGS&data&processing&Phase&2:&Basic&Analysis&Phase&3:&Downstream&Analysis&&&&&input!Binary!Data!Sample!1!….!Sample!N!genes.expr12、!transcripts.expr!BCL!files!BAM!BAM!hits.bam!transcripts.gif!Raw!Data!BAM!Q>!FASTQ!Cuffcompare!QSEQ!files!TopHat!Cuffdiff!Mapped!Reads!SAM!Q>BAM!!Mapped!reads!UCSC!!BAM!file!QCbysamtools!CLCBIO!Cufflinks!clean!genes/express!output!deliver!results!BAM!files!files!DataProc13、essing!Phase&1:&NGS&data&processing&Binary!Data!BCL!files!QC/Datafiltering•Binandremoveindexes•Removeadapters(ifany)Raw!Data!•RemoveduplicateQSEQ!files!•Removalofnon-mappedsequences•FilteroutreadmappingtoribosomalRNA•Percentageofribosomal?Mapped!Reads!SAM!Q>BAM!!QS14、EQfiles->Alignedfiles(BAM)QCbysamtools!•Parallelprocessing(MPI)•Whatisinthealignmentfiles?clean!BAM!files!The“BAM/SAM”file"BAM(BinaryAlignmentMapp
11、TBs!>300nodesX2CPU=>600CPUs>300nodesx4G=>1200GsFastNASstoragecapacity~100TBsOtherstoragedevices~400TBsRNA-SeqAnalysisPipeline!Phase&1:&NGS&data&processing&Phase&2:&Basic&Analysis&Phase&3:&Downstream&Analysis&&&&&input!Binary!Data!Sample!1!….!Sample!N!genes.expr
12、!transcripts.expr!BCL!files!BAM!BAM!hits.bam!transcripts.gif!Raw!Data!BAM!Q>!FASTQ!Cuffcompare!QSEQ!files!TopHat!Cuffdiff!Mapped!Reads!SAM!Q>BAM!!Mapped!reads!UCSC!!BAM!file!QCbysamtools!CLCBIO!Cufflinks!clean!genes/express!output!deliver!results!BAM!files!files!DataProc
13、essing!Phase&1:&NGS&data&processing&Binary!Data!BCL!files!QC/Datafiltering•Binandremoveindexes•Removeadapters(ifany)Raw!Data!•RemoveduplicateQSEQ!files!•Removalofnon-mappedsequences•FilteroutreadmappingtoribosomalRNA•Percentageofribosomal?Mapped!Reads!SAM!Q>BAM!!QS
14、EQfiles->Alignedfiles(BAM)QCbysamtools!•Parallelprocessing(MPI)•Whatisinthealignmentfiles?clean!BAM!files!The“BAM/SAM”file"BAM(BinaryAlignmentMapp
此文档下载收益归作者所有
点击更多查看相关文章~~