Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown

Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown

ID:81816485

大小:1.78 MB

页数:8页

时间:2023-07-20

上传者:U-14522
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第1页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第2页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第3页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第4页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第5页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第6页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第7页
Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown_第8页
资源描述:

《Binary Classi fi er for Computing Posterior Error Probabilities in MetaMorpheus - Shortreed et al. - 2021 - Unknown》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

pubs.acs.org/jprTechnicalNoteBinaryClassifierforComputingPosteriorErrorProbabilitiesinMetaMorpheusMichaelR.Shortreed,RobertJ.Millikin,LeiLiu,ZachRolfs,RachelM.Miller,LeahV.Schaffer,BrianL.Frey,andLloydM.Smith*CiteThis:J.ProteomeRes.2021,20,1997−2004ReadOnlineACCESSMetrics&MoreArticleRecommendations*sıSupportingInformationABSTRACT:MetaMorpheusisafree,open-sourcesoftwareprogramfortheidentificationofpeptidesandproteoformsfromdata-dependentacquisitiontandemMSexperiments.Thereisinherentuncertaintyintheseassignmentsforseveralreasons,includingthelimitedoverlapbetweenexperimentalandtheoreticalpeaks,them/zuncertainty,andnoisepeaksorpeaksfromcoisolatedpeptidesthatproducefalsematches.Falsediscoveryratesprovideonlyaset-wiseapproximationforincorrectspectrummatches.HereweimplementedabinarydecisiontreecalculationwithinMetaMorpheustocomputeaposteriorerrorprobability,whichprovidesameasureofuncertaintyforeachpeptide-spectrummatch.Wedemonstrateitsutilityforincreasingidentificationsandresolvingambiguitiesinbottom-up,top-down,proteogenomic,andnon-specificdigestionsearches.KEYWORDS:MetaMorpheus,proteomics,DDA,binarydecisiontree,searchengine,bottom-up,top-down,posteriorerrorprobability,proteogenomics,opensource2■INTRODUCTIONreleasedanupdatedversionwiththisnewcapability.1Subsequently,weextendedthisworktocoverpreviouslyTheMorpheussearchalgorithmwasoriginallycreatedinunannotatedmodificationsthroughatwo-passsearchalgo-2013toaccommodatetheincreasedprevalenceofhigh-rithmdubbedglobalpost-translationalmodificationdiscoveryresolutiontandemmassspectra(MS/MS)inproteomics.The3(G-PTM-D).algorithmtookadvantageofthespecificityprovidedbyhighTheseearlysuccessesspawnedmanynewideasandmassaccuracytoassignchargestatesandremovenon-eventuallytheneedtoreleaseourownsoftwareprogram,DownloadedviaUNIVOFCONNECTICUTonMay16,2021at10:35:34(UTC).monoisotopicpeaks,butwithminimallossofsensitivity.The4MetaMorpheus,toaccommodatethegrowingfunctionality.scoringalgorithmconsideredonlythenumberofmatchingMetaMorpheusnowhascapacityformasscalibration,label-productsplusthefractionofspectrumintensityassignedto56Seehttps://pubs.acs.org/sharingguidelinesforoptionsonhowtolegitimatelysharepublishedarticles.freequantification,top-downsearches,cross-linksearches,matchingproducts.Thismodestprogram,remarkableinits7thediscoveryofO-glycosylatedpeptides,andnonspecificsimplicity,yieldedexcellentresults.8searches.OnecanalsoconductasinglesearchwithmultipleOurgroupdevelopedaninterestinidentifyingamultiplicity9proteases,improvingtheproteininferenceoversingle-ofpost-translationalmodifications(PTMs)withinasingleproteaseapproaches.However,untilrecently,thescoringsearchondataacquiredfromunenrichedsamples.Atthattime,algorithmhadevolvedlittle,andtheonlystatisticalmetricidentifyingPTMswasprimarilyperformedonsampleswhereprovidedwasagroup-wisefalsediscoveryrate(FDR)reportedthePTMofinterestwasenrichedandthemodificationwasset10,11inMetaMorpheusasaqvalue.Oneimportantvaluethatasvariablewithinthesearchengine.Thisstrategyhasbeenwasgreatlyneededwasanindividualconfidencemeasureforusedconventionallyformanyyears,butthevariableeachpeptide-spectrummatch(PSM)orproteoform-spectrummodificationstrategyfailstoyieldresultswithhighconfidencewhenthenumberofmodifiedpeptidesrepresentsasmallfractionofthetotal.OurideawastoallowvariableSpecialIssue:SoftwareToolsandResources2021modificationsstrategicallyonlyatannotatedpositionswithinReceived:October23,2020theproteomeandnowhereelse.Thisworkedremarkablywell,Published:March8,2021permittingtheanalysisofdozensofdifferentPTMtypeswithinasinglesearchandyieldingidentificationswithhighconfidence.WejoinedforceswiththeMorpheusteamand©2021AmericanChemicalSocietyhttps://dx.doi.org/10.1021/acs.jproteome.0c008381997J.ProteomeRes.2021,20,1997−2004

1JournalofProteomeResearchpubs.acs.org/jprTechnicalNotematch(PrSM),peptideorproteoformidentification.Thisinformationisvaluableasonebeginstheprocessofvalidatingandinterpretingproteomicsresults.Earlyapproachestothis1213werereportedbyKellerandthenbyAnderson.ToobtainindividualconfidencemetricsforMetaMorpheusidentifica-tions,onecouldmanuallycalculateaposteriorerror11probability(PEP)bydeterminingthelocalFDRforeachsetofmatcheswiththesameMetaMorpheusscore,oronecouldpostprocessresultsusingsoftwarecreatedbyother14−1812groups(e.g.,PercolatororPeptideProphet).19Hereweimplementabinarydecisiontree(BDT)inMetaMorpheusthatcomputesthePEPforeachspectrummatch.ThePEPofanindividualPSMrepresentstheprobabilitythattheidentificationisincorrect.ThePEPiseffectivelyanoptimizedscoringmetricwhenarrivedatusingtheBDTalgorithm.Thisoptimizationallowsgreaterdiscriminationbetweencorrectandincorrectmatches.TheessenceofaBDTistoaskaseriesoftrue/falsequestions,oneFigure1.Binarydecisiontreesarecreatedtoclassifysubjectsforeachattributeconsidered,toassigneachcandidatetooneaccordingtovariousattributesintooneoftwogroups(e.g.,true“T”oftwogroups.(SeeFigure1.)Inthiswork,thecandidatesareorfalse“F”).Therearethreestagesintheprocess:creationofthespectrummatches,andtheattributesincludethefractionofBDT,usingtrainingdata;testingoftheBDTperformance,usingtestdata;andfinally,applicationoftheBDTtothecompletesetofdata.matchedintensity,thelongestuninterruptedsequenceofThereisnooverlapinmembershipbetweenthetrainingandtestsets.matchedfragmentions,thenumberofmissedcleavages,andThesimplifiedexampleofaBDTshowninthisfigureusesthreesoon.(SeeTable1.)Thetwogroupsarecorrectandincorrectdifferentattributestoclassifyalloftheincomingsubjectsintoeightmatches.Eachquestioncanbeaskedonlyoncealongapath,groups.Thethreeattributesprovideatotalofsevendifferentgatesandtheorderthequestionsareaskedisautomaticallythateffectivelyshuttlesubjectsintodifferentgroups(leaves).Eachoptimizedforefficiencyandaccuracy.TheBDTistrainedsubject(e.g.,aPSM)isevaluatedwithrespecttoeachattribute.Inonasubset(75%)ofspectrummatchesfromatarget/decoythisfigure,classificationisshownasa1or0,whichworksforyesorsearchandthenappliedtoandvalidatedontheremaining25%noattributes(e.g.,Isthisavariantpeptidesequence?).However,thisofspectrummatches.Thisprocessisrepeatedfourtimessoisanoversimplificationforattributesthatarecontinuousvariablesthatnospectrummatchisscoredusingatrainingsetinwhich(e.g.,intensity),andinthesecases,theclassificationinvolvesregression.Theorderoftheattributesischosentomaximizeitwasincluded.separationbetweenthetwogroupsateachstageusingtrainingdata.AmajoradvantageoftheBDTisthatanewmodelcanbeThefractionoffalsesubjectsintheleavesofeachbranchofthetreequicklygeneratedforeachsearch.MetaMorpheusisaflexibleprovidestheprobabilityofbeingfalse(posteriorerrorprobability)forsearchenginecapableofanalyzingmanydifferentdatatypesallsubjectsinthatleafoncethebinarydecisiontreehasbeenappliedusinganynumberofproteases,fragmentationtypes/energies,tothedata.KnownfalsetestsubjectsarePSMsmarkedasdecoys.instrumentresolutions,andotherparameters.Therefore,itAttribute“A”isautomaticallyselectedtomaximizetheseparationneededanapproachtocomputingspectrummatchPEPswithbetweentrueandfalse.AttributeBischosennextusingthesamecomparableflexibility.BDTscanberapidlytrainedusingaguidingprinciple,whichismaximizationoftheseparationbetweenwidevarietyofdifferentattributes.Later,wedescribeourtrueandfalse.ThesecondattributecanbedifferentbetweendifferentimplementationoftheBDTinMetaMorpheusandprovidebranchesofthetree.Eachattributecanbeusedonlyoncealongabranch.Thebranchmayterminateif100%purityisachievedatanyresultsforsearchesofbottom-up,top-down,nonspecific,andlevel.Oncethetreeisconstructed,itisevaluatedusingasimilarlyproteogenomicdata.sized,separatesetoflabeledtestdata.Atthispoint,theBDTistrainedandreadytoassigngroupmembershiptoeachdatapointand■EXPERIMENTALSECTIONtocomputetheposteriorerrorprobability.TheMetaMorpheussearchsoftware,whichincludestheBDTfunctionality,iscodedintheC#programminglanguage.Thissoftwareisopen-sourceandfreelyavailablewithapermissiveMITlicense(https://github.com/smith-chem-wisc/MetaMorpheususingavarietyoftestdatasetsavailablewithMetaMorpheus).MetaMorpheusisalsoavailableasaDockerinstructionsontheMetaMorpheusGitHubpage.Anextensivecontainer(https://hub.docker.com/r/smithchemwisc/Wikiisalsoprovidedtherethatcoversthetypicalusageandametamorpheus).TheMetaMorpheusWindowsGraphicalglossaryofterminology.UserswithquestionsorexperiencingUserInterface(GUI)requiresa64-bitoperatingsystemandproblemscancontactusviatheIssuestaboftheGitHubpage.NETCore3.1.Thecommand-lineversionofMetaMorpheusoratouremailaddress(mm_support@chem.wisc.edu).supportsanyoperatingsystemthatsupports.NETCore,MetaMorpheususesaFastTreeBinaryClassifier(https://includingWindows,MacOS,andLinux.MetaMorpheuswww.nuget.org/packages/Microsoft.ML.FastTree/1.3.1)in-supportsparallelization,usingn-1availablelogicalprocessorscludedviatheNuGetpackage.FastTree’sbinary-classifica-bydefault.Usersarefreetoselectthenumberoflogicaltion-boostingframework’snaturalprobabilisticinterpretationprocessorsused.Aminimumof8GBofRAMisrecommended,buthigheramountsofRAMwillspeedupisexplainedinref20.Allanalyseswereperformedonatheperformance.Asimplesearchofaconventionalbottom-upcomputerrunningMicrosoftWindows10.0.19041witha64-runwithasingleprocessorcanbefinishedinamatterofafewbitIntelXeonCPUE5-2690v4@2.60GHzprocessorwith28minutes.Newusersareencouragedtotesttheinstallationofthreadsand128GBinstalledRAM.1998https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

2JournalofProteomeResearchpubs.acs.org/jprTechnicalNoteTable1.DefinitionsforAttributesUsedintheBinaryDecisionTreeattributedefinitionAbsoluteAverageFragmentMassErrorfromDifferencebetweentheaveragefragmenterror(ppm)foragivenPSMandtheaveragefragmenterrorforallPSMsMedianAmbiguityCountofPSMsmatchingasinglespectrumwithidenticalMetaMorpheusscore(±1×10−9)ComplementaryIonCountCountofcomplementaryfragmentionpairswhereN-andC-terminalpeptidefragmentsfromthesamebackbonecleavageareobservedDeltaScoreDifferenceinMetaMorpheusscoresbetweenthecurrentPSMandthenextbestscoringPSMHydrophobicityZ-ScoreThenumberofstandarddeviationsthecomputedhydrophobicity/mobilityaPSMdifferscomparedwithotherPSMselutingwithin2minFractionofSpectrumIntensityfromMatchedNormalizedfractionofspectrumintensityassignedtothematchedfragmentionsofthePSMPeaksPeptideContainsAminoAcidVariantIfthematchedpeptidecontainsadesignatedaminoacidvariantLongestFragmentIonSeriesCountofconsecutivepeptidebackbonecleavagesannotatedbyeitheranN-orC-terminalfragmentMissedCleavagesCountCountofmissedproteolyticcleavageeventsforthepeptidematchedinthePSMCountofModificationsCountofpeptidepost-translationalchemicalmodificationsMonoisotopicMassErrorDegreeofmissedmonoisotopicerrorbetweenexperimentalparentmassandcomputedtheoreticalmass(deconvolutionerror)PrecursorChargeDifferencetoPrecursorIntegerdifferencebetweenthechargestateoftheobservedPSMandthemodeforallPSMsChargeModePeptideSpectralMatchCountCountofPSMsforthesamefullpeptidesequenceincludinganymodificationsTotalMatchingFragmentCountCountofallmatchedfragmentionsFigure2.Verticalaxisreportsthefractionoftrue-positivesacrosstherangeofobservedvaluesfortheattributesusedintheBDT.Theunitsandrangesforthexaxesarearbitrarilychosentoallowthefullrangeoffractionstobeshownandshouldnotbeinterpreted.(SeetheSupportingInformationforanexplanationoftheaxesofeachattribute.)Note:Agraphforthepeptidevariantfeatureisnotshown.■ParametersRESULTSANDDISCUSSIONFourteendifferentattributes(Table1)areusedintheprocess.BinarySearchAlistofattributesandtheirdefinitionsisprovidedinTable1.BinaryclassifiersdivideacollectionofobjectsintotwogroupsPlots(Figure2)showhowthefractionoftrue-positivetarget(e.g.,TrueandFalse).ThisisaccomplishedbyapplyingaPSMstototalPSMsvariesacrosstherangeofrespectivevaluesseriesofchallengesthatmaximizeseparationbetweentheobservedinonebottom-upsearch.(SeetheBottom-UpVignettesection.)Thefractionoftrue-positivetargetmatchesgroups.Eachchallengeisappliedonlyoncealongthebranchofvariesforthe14differentattributes.Thefractionvariesthetree.TheorderofchallengesischosenautomaticallybythestronglyforTotalMatchingFragmentCount,Intensity,PSMalgorithmtomaximizethegroupseparationateachstep.HereCount,ComplementaryIonCount,andDeltaScore.Thegrouponeisconfidenttargetmatches,whereasgrouptwoisfractionvariesweaklybutmeasurablyfortheremainingdecoymatches.Thedistributionofthetwogroupsintheattributes.leavesofthetreeprovidesthegroupassignmentprobability.InConstructionoftheModelandPlacementwithinthethiswork,thefractionofgrouptwomatchesineachoftheWorkflowleavesprovidestheprobabilitythatanymemberplacedinthatThecreationandapplicationoftheBDTforcomputingtheleafbyapplicationofthemodelisanincorrectmatch(FigurePEPforeachPSMoccursafterthesearchiscompletedandall1).PSMshavebeenassigned.EachPSMisassignedaseparateq1999https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

3JournalofProteomeResearchpubs.acs.org/jprTechnicalNoteaTable2.MetricsofModelPerformancemetricdefinitionwhattolookforAccuracyTheproportionofcorrectpredictionswithatestdataset.ItistheratioofthenumberofThecloserto1.00,thebettercorrectpredictionstothetotalnumberofinputsamplesAreaunderCurveMeasurestheareaunderthecurvecreatedbysweepingthetrue-positiverateversustheThecloserto1.00,thebetterfalse-positiverateAreaunderPrecision−Areaunderthecurveofaprecision−recallcurve,ameasureofthesuccessofpredictionThecloserto1.00,thebetterRecallCurvewhentheclassesareimbalancedF1ScoreF1scoreistheharmonicmeanoftheprecisionandrecallThecloserto1.00,thebetterLogLossLogarithmiclossmeasurestheperformanceofaclassificationmodelwherethepredictionThecloserto0.00,thebetterinputisaprobabilityvaluebetween0.00and1.00LogLossReductionTheadvantageoftheclassifieroverarandompredictionRangesfrom−infand1.00,where1.00isperfectpredictionsand0.00indicatesmeanpredictionsPositivePrecisionTheproportionofcorrectlypredictedpositiveinstancesamongallofthepositiveThecloserto1.00,thebetterpredictionsPositiveRecallTheproportionofcorrectlypredictedpositiveinstancesamongallofthepositiveinstancesThecloserto0.00,thebetter.NegativePrecisionTheproportionofcorrectlypredictednegativeinstancesamongallofthenegativeThecloserto0.00,thebetterpredictionsNegativeRecallTheproportionofcorrectlypredictednegativeinstancesamongallofthenegativeThecloserto0.00,thebetterinstancesCountofAmbiguousPeptideassignmentswiththesameMetaMorpheusscoreresolvedthroughapplicationofHighernumbersarebetterPeptidesRemovedtheBDTaPortionsofthistextwereadaptedfromhttps://github.com/dotnet/docs/blob/master/docs/machine-learning/resources/metrics.md,theoriginalsourceoftheBDTalgorithmusedinMetaMorpheusBDT.valuedependingonitsMetaMorpheusscorerank.TheqvaluepeptidestohavethesameMetaMorpheusscore(±1×10−9)istheFDRforallPSMswithaMetaMorpheusscoreatorforasinglespectrum.WerefertothisasanambiguousPSM,aboveaspecifiedscorethreshold.Thisqvalueisusedtoselectandwereportalltheoreticalpeptidesequencesintheoutputthemembersofthetrainingsets(seelater).WecomputeaforthePSMseparatedbythe“|”character.PSMambiguities16PEP-derivedqvalue,referredtohereafterasthePEPqvalue,canarisefromtargetordecoypeptides.WeusetheBDTaftertrainingiscompletedandthemodelshavebeenappliedmodeltoresolvemanyoftheseambiguities.AseparatePEPistothePSMs.ThePEPqvalueistheaverageofallindividualcomputedforeachpeptidepossibilityintheambiguousPEPvaluesforagroupofPSMsorpeptidesdowntoandassignment.WheneverthePEPforapeptidepossibilityinanincludingthecurrentPSMorpeptideaftersortingallPSMsambiguousassignmentisatleast5%lowerthantheother21andpeptidesindescendingorderbythePEP.ThePEPqpossibilities,thatambiguousassignmentischosenasthemostvalueisacomparablemetrictothetraditionalqvalue.likelyassignment,andtheotherpossibilitiesareremoved.ThusTrainingandTestingwhereastheCountofAmbiguousPeptidesRemovedisaWeemployedacross-validationapproach22fortrainingandmetricofthemodel,theactualresolvingofambiguitiesisatestingofthemodel.TrainingandtestingsetsofPSMsarevaluablefeatureoftheBDTclassification.Severalexamplechosenrandomly.(Eachtrainingsetcontains75%ofthetotalvignettesaredescribedasfollows.NumericalperformancePSMs,upto1million.)ThetrainedmodelisthenappliedtometricsfortheseexamplesarecollectivelyreportedinTable3.andevaluatedontheremaining25%ofPSMs.ThisprocessisBottom-UpVignetterepeatedatotaloffourtimessuchthatnoPSMisevaluatedTheexperimentalprocedureforthegenerationofthedatasetusingamodelthatincludeditwiththetraining.Targethits9forthebottom-upvignettewaspreviouslyreported.Datawerewithqvalue<0.01areusedascorrectmatches,anddecoyhitsareusedasincorrectmatches.ThereisnooverlapbetweentheTable3.FiguresofMeritforSearchVignettesmembersofthetrainingandtestingsets.Trainingoccursinasingleroundwith400treesintheensemble,whichisthenon-defaultfortheFastTreeBinaryClassifier.specificModelPerformanceMetricsbottom-uptop-down(HLA)proteogenomicThemodeldevelopedduringthetrainingphaseisappliedtoAccuracy0.99410.99590.99930.9919AreaundertheCurve0.99950.99970.99950.9967thetestset.TheperformanceofthemodelisreportedusingAreaunder0.99930.99990.99900.9973severaldifferentmetrics(Table2).TheCountofAmbiguousPrecision−RecallPeptidesRemovedisaspecialfeatureofMetaMorpheusCurverequiringfurtherexplanation.TheMetaMorpheussearchF1Score0.99360.99700.99730.9911comparesallspectraagainstalltheoreticalpeptides/proteo-LogLoss0.02970.03050.00770.0475formswhoseintactmassesagreewithinsomespecifiedLoglossReduction0.97020.96640.98610.9523tolerance(e.g.,10ppm).TheMetaMorpheusscoreforeachPositivePrecision0.99420.99630.99700.9926potentialmatchiscomputed.TheintegervalueofthePositiveRecall0.99310.99760.99770.9897MetaMorpheusscoreisthecountofmatchedfragmentions.NegativePrecision0.99390.99490.99970.9913ThedecimalportionoftheMetaMorpheusscoreistheNegativeRecall0.99500.99230.99960.9938spectrumintensityfractionaccountedforbythematchedCountofAmbiguous210506115786fragmentions.ItisnotunusualformultipleuniquetheoreticalPeptidesRemoved2000https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

4JournalofProteomeResearchpubs.acs.org/jprTechnicalNotederivedfromatrypsindigestof107humanJurkatcells.20141210_QEp7_MiBa_SA_HLA-I-p_MM15samples1−4,Peptideswerefractionatedoff-linebyhigh-pHreverse-phaseAandB(14files).Thefollowingsearchsettingswereused:liquidchromatographypriortotheLC−MS/MSanalysisonaprotease=nonspecific;maximummissedcleavages=19;nanoACQUITYLCsystem(Waters,Milford,MA)interfacedminimumpeptidelength=8;maximumpeptidelength=20;withaThermoScientificLTQOrbitrapVelosmassinitiatormethioninebehavior=variable;fixedmodifications=spectrometer.AllmassspectrometryrawfilesarefreelycarbamidomethylonC,carbamidomethylonU;variableavailableontheMassIVEplatform(https://massive.ucsd.modifications=OxidationonM;maxmodsperpeptide=2;edu;ID:MSV000083304;Files:12-18-17_fract1−10).maxmodificationisoforms=1024;precursormasstolerance=ThedataanalysiswasperformedusingMetaMorpheus±6.0000PPM;productmasstolerance=±20.0000PPM;version0.0.313.Thefollowingsearchsettingswereused:reportPSMambiguity=true.Thehumansearchdatabaseprotease=trypsin;maximummissedcleavages=2;minimumcontained20379nondecoyproteinentriesdownloadedfrompeptidelength=7;maximumpeptidelength=unspecified;UniProton2021-01-08inFASTAformat,including0initiatormethioninebehavior=variable;fixedmodifications=contaminantsequences.carbamidomethylonC,carbamidomethylonU;variableThetotaltimetoperformtheSearchtaskon14spectramodifications=oxidationonM;maxmodsperpeptide=2;file(s)was219.52min.ThetimetoperformtheBDTanalysismaxmodificationisoforms=1024;precursormasstolerance=was186s.Thefinalsearchtallieswere138450targetPSMs±5.0000PPM;productmasstolerance=±20.0000PPM;and17789peptidesatqvalue<0.01.PEPqvalueswerethenreportPSMambiguity=true.Thecombinedsearchdatabasecomputed.Thesearchyielded127958PSMsand21313contained20379nondecoyproteinentries,including0peptidesatPEPqvalue<0.01,adecreaseof7.6%andancontaminantsequences.ThedatabasewasobtainedinXMLincreaseof19.8%,respectively.Atotalof115ambiguousformatfromUniProt,downloaded2021-01-12,andcontainedpeptidesweredisambiguated.annotatedPTMs,whichareautomaticallydetectedwithInthisnonspecificsearch,thenumberofPSMsat1%FDRMetaMorpheus.ThetotaltimetoperformtheSearchtaskdecreaseduponusingtheBDT,whereasthenumberofuniqueon10spectrafile(s)was9.0min.Thetimetoperformthepeptidesincreased.ThereareanumberofpossibleBDTanalysiswas67s.Thefinalsearchtallieswere88484explanationsforthisbehavior.Inanonspecificsearch,thetargetPSMsand32621peptidesatqvalue<0.01.PEPqsearchspaceoftheoreticalpeptidesinboththeforwardtargetvalueswerethencomputed.Thesearchyielded92802PSMsdatabaseandthereversedecoydatabaseisveryhighcomparedand34506peptidesatPEPqvalue<0.01,increasesof4318withatypicalsearchwithspecificproteolyticcleavagesites.(4.9%)and1885(5.8%),respectively.Atotalof210Thissignificantlylowersthesensitivity,whichweobserveasaambiguouspeptidesweredisambiguated.high-cutoffMetaMorpheusscoreat1%peptideFDR.SuchAsecondanalysiswasperformedusinganin-house-createdlargedatabasesarealsooftenpronetoahighfalse-positiverate.entrapmentdatabase(20379proteinentries)inadditiontoWehypothesizethatseveralmedium-tolow-scoringPSMsare,thehumandatabase.Thisentrapmentdatabasewascreatedbyinfact,false-positives,whichtheBDTfiltersout;theBDTfixingthepositionoflysineandarginineresiduesandtakesintoconsiderationmanyadditionalfacetsofthePSMrandomizingtheremainingaminoacidsonaprotein-by-comparedwithsimplyrankingbytheMetaMorpheusscore.Inproteinbasis.TheN-terminalmethioninewasalsopreservedcontrast,forthepeptides,whichallherehaveahighwhenpresent.AnnotatedPTMsfoundintheoriginaldatabaseMetaMorpheusscoreandsopresumablyarenotfalse-positives,wereshiftedtonewpositionsalongwiththeircorrespondingarenotfilteredoutbytheBDTalgorithm.aminoacid.UsingarandomizedversionofthetargetdatabaseAspreviouslydescribed,asecondanalysiswasperformedforentrapmentispreferredovertheuseofadatabaseforanotherorganism.23Thisentrapmentanalysiswasusedtousinganentrapmentdatabaseconstructedsimilarlytotheevaluatetheperformance,asallPSMsassignedtoentrapmentaforementionedentrapmentdatabase;however,noPTMswerepeptidesarepresumablyfalse-positives.Thesecondsearchincludedinthesearch.Therefore,noPTMswereincludedinyielded85180PSMs,including380false-positiveentrapmenttheentrapmentdatabase.Thesecondsearchyielded117109PSMs(0.45%),and31426peptides,including155false-PSMs,including584false-positiveentrapmentPSMs(0.50%),positiveentrapmentpeptides(0.49%).AftertheBDTanalysisand15090peptides,including73false-positiveentrappedwasperformed,thesevalueschangedto90524PSMs,peptides(0.48%).AftertheBDTanalysiswasperformed,theseincluding294false-positiveentrapmentPSMs(0.32%),andvalueschangedto100433PSMs,including193false-positive33668peptides,including185false-positiveentrapmententrapmentPSMs(0.19%),and17459peptides,including72peptides(0.55%).Pleasenotethatevaluationshereandinfalse-positiveentrapmentpeptides(0.41%).TheBDTanalysisthevignettesthatfollowwereperformedusingasingledecreasedthenumberofidentificationsby16676PSMsentrapmentdatabase.Therefore,theresultsdonotrepresent(14.2%)andincreasedthenumberby2369peptides(15.7%)theaverageresultsthatwouldhavebeenobtainedhadwewhilereportingfewerentrappedfalse-positivePSMsandarepeatedtheexperiment10ormoretimes,eachusingasimilarnumberofentrappedfalse-positivepeptides.separatelycraftedanduniqueentrapmentdatabase.Top-DownVignetteHLAVignetteDataforthetop-downvignettearefromastudyofmouse25Thefollowingvignettedemonstratestheapplicationofthemitochondria.Allmass-spectrometryrawfilesarefreelyBDTtopeptidesidentifiedinanonspecificsearchofhumanavailableontheMassIVEplatform(https://massive.ucsd.edu;HLApeptides,obtainedfromastudyperformedbyBassani-ID:MSV000082366).Thefilesincluded08-02-and08-03-24Sternbergandcolleagues.Thedatasetusedherecanbe17_B9_myoblast_Afractions1−12,reps1and2(12files).obtainedfromthePRIDErepositoryusingtheidentifierThedataanalysiswasperformedusingMetaMorpheusversionPXD004894.Thedatafilesusedhereinclude20141208-and0.0.313.2001https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

5JournalofProteomeResearchpubs.acs.org/jprTechnicalNoteThefollowingsearchsettingswereused:protease=top-Thefollowingsearchsettingswereused:protease=trypsin;down;maximummissedcleavages=2;minimumpeptidemaximummissedcleavages=2;minimumpeptidelength=7;length=7;maximumpeptidelength=unspecified;initiatormaximumpeptidelength=unspecified;initiatormethioninemethioninebehavior=variable;fixedmodifications=0;behavior=variable;fixedmodifications=carbamidomethylonvariablemodifications=0;maxmodsperpeptide=2;maxC,carbamidomethylonU;variablemodifications=oxidationmodificationisoforms=1024;precursormasstolerance=±onM;maxmodsperpeptide=2;maxmodificationisoforms=10.0000PPM;productmasstolerance=±20.0000PPM;1024;precursormasstolerance=±5.0000PPM;productmassreportPSMambiguity=true.Themousesearchdatabase,tolerance=±20.0000PPM;reportPSMambiguity=true.ThedownloadedfromUniProtinXMLformaton2021-01-12,combinedsearchdatabasecontained77534nondecoyproteincontained17051nondecoyproteinentries,including0entries,including0contaminantsequences.Thetotaltimetocontaminantsequences.ThetotaltimetoperformtheSearchperformtheSearchtaskon10spectrafile(s)was18.72min.taskon12spectrafile(s)was14.7min.ThetimetoperformtheBDTanalysiswas2.22min.TheThetimetoperformtheBDTanalysiswas20s.Thefinalfinalsearchtallieswere88849targetPSMsand32671searchtallieswere11365targetPrSMsand809proteoformsatpeptidesatqvalue<0.01.PEPqvalueswerethencomputed.qvalue<0.01.PEPqvalueswerethencomputed.ThesearchThesearchyielded93331PSMsand34777peptidesatPEPqyielded11724PrSMsand873proteoformsatPEPqvaluevalue<0.01,increasesof8882(5.0%)and2106(6.4%),<0.01,increasesof359(3.2%)and64(7.9%),respectively.Arespectively.Atotalof786ambiguouspeptidesweretotalof506ambiguousproteoformsweredisambiguated.disambiguated.Becausethisisaproteogenomicsearch,weAspreviouslydescribed,asecondanalysiswasperformedwereinterestedinidentifyingpeptideswithaminoacidusinganin-house-createdentrapmentdatabaseconstructedinvariants.Herewefound449variantPSMsatq<0.01.Afterasimilarfashiontothehumanentrapmentdatabase,exceptapplyingtheBDT,wefound455variantPSMsatPEPqvalueusingthemouseproteinsequencedatabaseasinput.The<0.01withanoverlapbetweenthesetsof431.Intermsofsecondsearchyielded11077PrSMs,including35false-variant-containingpeptides,wefound190atq<0.01and193positiveentrapmentPrSMs(0.32%),and808proteoforms,afterusingtheBDTatPEPqvalue<0.01with183peptidesincluding1false-positiveentrapmentproteoform(0.12%).overlappingthetwosets.AftertheBDTanalysiswasperformed,thesevaluesincreasedAspreviouslydescribed,asecondanalysiswasperformedto11463PrSMs,including26false-positiveentrapmentusinganadditionalhumanentrapmentXMLformatdatabase.PrSMs(0.23%),and861proteoforms,including9false-Thesecondsearchyielded85584PSMs,including376false-positiveentrapmentproteoforms(1.05%).TheBDTanalysispositiveentrapmentPSMs(0.44%),and31641peptides,increasedthenumberofidentificationsby386PrSMs(3.5%)including150false-positiveentrapmentpeptides(0.47%).and53proteoforms(6.6%)whilereportingdecreasedAftertheBDTanalysiswasperformed,thesevaluesincreasedentrappedPrSMsandanincreaseinentrappedproteoforms,to91692PSMs,including219false-positiveentrapmentPSMsalthoughstill∼1%false-positives.(0.24%),and34181peptides,including132false-positiveentrapmentpeptides(0.39%).Therefore,theBDTanalysisProteogenomicsVignetteincreasedthenumberofidentificationsby6108PSMs(7.1%)Thebottom-upvignettedatawasusedforthisanalysis.Aand2540peptides(8.0%)whilereportingadecreaseinthe26numberofentrappedfalse-positives.proteogenomicdatabasewascreatedwithSpritz(https://github.com/smith-chem-wisc/Spritz).InputforSpritzwasEffectofIndividualComponentsobtainedfromwww.ncbi.nlm.nih.govusingthefollowingExampleresultsfrombottom-up,nonspecificpeptide,andtop-identifiers:SRR791578,SRR791579,SRR791580,downsearchescanbeseeninthevignettesabove.ThemodelsSRR791581,SRR791582,SRR791583,SRR791584,constructedforthebottom-upandnonspecificpeptideSRR791585,andSRR791586.Sequenceswerecomparedsearchesused13attributes,skippingthevariantattribute,asagainstEnsemblArchiveRelease82.Theproteomicsdatanovariantpeptideswereincludedinthedatabase.WewereanalysiswasperformedusingtheSpritz-generatedsample-interestedindeterminingtheeffectsofindividualparametersspecificproteogenomicdatabasewithMetaMorpheusversiononthecompletemodel.Thiswasaccomplishedbyperforming0.0.313.13separatebottom-upsearchesusing12attributesandTheJurkatproteogenomicdatabasewasconstructedusingskippingtheoneunderexamination.AbarplotreportingtheSpritzversion0.1.3.Thepaired-endRNAsequencingdataAccuracyforall13searches,labeledwiththemissingattribute,usedfordatabaseconstructionwaspreviouslyobtainedand27isshowninFigure3A.accessedusingGSE45428inGEOSRA.Theworkflowfor28Next,weperformed12moresearchesofthesamedatafromdatabasecreationusingSpritzhasbeendescribedindetail.Inthebottom-upvignette,whereinweconstructedthemodelonebrief,genomicreferencesincludingthehumangenomeandfeatureatatime,beginningwiththeattributethathadthegenemodelfilesfromEnsemblversion82andknownhuman2930highestimpact(DeltaScore).OneattributewasaddedatavariationsitesaredownloadedfromdbSNP.Next,Skewertimeintheordershowninthebarplot(Figure3B).EachisusedtoremoveadaptersequencesfromtheRNAandfilteradditionalattributeimprovestheaccuracyofthemodel.outlow-qualityreads.ThereadsarethenalignedtothehumanreferencegenomeusingHISAT231beforethevariantanalysisComparisontoPercolator32,33isperformedusingtheGenomeAnalysisToolkit(GATK)ThePercolatoralgorithm(v.3.0.4,http://percolator.ms/)isa34version4.0.11.0.SnpEffhasbeenadaptedtoenablethesupportvectormachineusedtorerankPSMandpeptideannotationofdiscoveredvariantsinUniProtXML-formattedidentificationsusinguser-suppliedparameters.Percolatoralsodatabases.Followingvariantannotation,post-translationalreportspeptideposteriorerrorprobabilities.ItcanperformamodificationsaretransferredtotheproteogenomicdatabasesimilarroleastheBDT.TheabilitytoperformthecomparisonfromthehumanUniProtdatabase(downloaded2020-06-30).wasenabledbyaddinganewoutputtoMetaMorpheus,2002https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

6JournalofProteomeResearchpubs.acs.org/jprTechnicalNotePercolator.Inmostcases,theBDTincreasesthenumberofPSM,peptide,andproteoformidentificationsbeyondthenumbersreportedusingonlythetraditionalqvalue.Inaddition,itresolvesmanypeptideassignmentambiguitiesthatcouldnothavebeenresolvedusingonlytheMetaMorpheusscore.Infuturestudies,weplantointegratethecomputedpeptideposteriorerrorprobabilitiesintoMetaMorpheus’proteininference,whichshouldfurtherimprovetheconfidence■intheproteinidentification.ASSOCIATEDCONTENT*sıSupportingInformationTheSupportingInformationisavailablefreeofchargeathttps://pubs.acs.org/doi/10.1021/acs.jproteome.0c00838.Supplement1:ExplanationoftheaxesofFigure2(PDF)■AUTHORINFORMATIONCorrespondingAuthorLloydM.Smith−DepartmentofChemistry,UniversityofWisconsin-Madison,Madison,Wisconsin53706,UnitedStates;Phone:1-608-263-2594;Email:smith@chem.wisc.eduAuthorsFigure3.(top)AccuracyofBDTmodelstrainedwhenmissingthelabeledattribute.Whencomparedwiththefullaccuracy,thisMichaelR.Shortreed−DepartmentofChemistry,UniversitydemonstratesthelossinvaluewhennotincludingthelabeledofWisconsin-Madison,Madison,Wisconsin53706,Unitedattribute.(middle)AccuracyresultsofBDTsconstructedoneStates;orcid.org/0000-0003-4626-0863attributeatatime.TheDeltaScorehadthelargestimpactonRobertJ.Millikin−DepartmentofChemistry,Universityofaccuracy,asshowninthetopbarchart.Therefore,itwasthefirstWisconsin-Madison,Madison,Wisconsin53706,Unitedfeatureaddedtogeneratethedataforthemiddlebarchart.ThePSMStates;orcid.org/0000-0001-7440-3695counthadthesecondlargestimpactonaccuracyinthetopbarchart.LeiLiu−DepartmentofChemistry,UniversityofWisconsin-Therefore,itwasthesecondfeatureaddedtogeneratethedataforthemiddlebarchart.Thiswasrepeatedforallattributes.EachadditionalMadison,Madison,Wisconsin53706,UnitedStates;addedattributeimprovestheaccuracy.(bottom)Thenumberoforcid.org/0000-0001-7097-1505PSMsfoundatPEPqvalue<0.01increaseswitheachaddedattribute.ZachRolfs−DepartmentofChemistry,UniversityofHeretheBDTtrainingbeganonlywiththeDeltaScoreattribute.Wisconsin-Madison,Madison,Wisconsin53706,UnitedThen,one-by-one,movingtoptobottom,weaddedthelabeledStates;orcid.org/0000-0002-4372-7133attributeandrecompletedthesearch.RachelM.Miller−DepartmentofChemistry,UniversityofWisconsin-Madison,Madison,Wisconsin53706,UniteddesignedspecificallytomeettheneedsofthePercolatorinputStates;orcid.org/0000-0003-1461-6386format.ThisPercolatorinputcontainsallofthesamefeaturesLeahV.Schaffer−DepartmentofChemistry,UniversityofandvaluesforeachPSMandpeptideusedbytheBDT.Wisconsin-Madison,Madison,Wisconsin53706,UnitedWerepeatherethevaluesobservedforthebottom-upStates;orcid.org/0000-0001-6339-9141vignettesearchwiththeadditionalentrapmentdatabaseforBrianL.Frey−DepartmentofChemistry,Universityofeaseofreadability.Thesearchyielded85180PSMs,includingWisconsin-Madison,Madison,Wisconsin53706,United380false-positiveentrapmentPSMs(0.45%),and31426States;orcid.org/0000-0002-0397-7269peptides,including155false-positiveentrapmentpeptidesCompletecontactinformationisavailableat:(0.49%).Percolatoranalysisoftheresultsyielded91593https://pubs.acs.org/10.1021/acs.jproteome.0c00838PSMs,including380false-positiveentrapmentPSMs(0.41%),and34715peptides,including276false-positiveentrapmentNotespeptides(0.80%).TheBDTcomputationtimewas68s,andthePercolatorcomputationtime(usingflags-Uand--search-Theauthorsdeclarenocompetingfinancialinterest.inputconcatenated)was52s.Thesetworesultsdemonstratethatforthisstandardtypeofsearch,boththepercolator■ACKNOWLEDGMENTSalgorithmandtheBDTperformcomparablywell.ThisworkwassupportedbyNIH−NIGMSgrantR35GM126914.R.M.M.wassupportedinpartbytheNIH■CONCLUSIONSChemistry−BiologyInterfaceTrainingGrant(T32TheadditionofaBDTtoMetaMorpheusprovidesmuchGM008505).R.J.M.wassupportedbyanNHGRItrainingneededstatisticalsupportforindividualpeptideandproteo-granttotheGenomicSciencesTrainingProgramformidentifications.Thecomputationtimeandperformance5T32HG002760.WealsothankAustinV.Carrforhisarecompetitivewithexistingstand-aloneprogramssuchasassistanceinthepreparationofthefigures.2003https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

7JournalofProteomeResearchpubs.acs.org/jprTechnicalNote■(21)Choi,H.;Ghosh,D.;Nesvizhskii,A.I.StatisticalvalidationofREFERENCESpeptideidentificationsinlarge-scaleproteomicsusingthetarget-(1)Wenger,C.D.;Coon,J.J.Aproteomicssearchalgorithmdecoydatabasesearchstrategyandflexiblemixturemodeling.J.specificallydesignedforhigh-resolutiontandemmassspectra.J.ProteomeRes.2008,7(1),286−92.ProteomeRes.2013,12(3),1377−86.(22)Granholm,V.;Noble,W.S.;Kall,L.Across-validationscheme(2)Shortreed,M.R.;Wenger,C.D.;Frey,B.L.;Sheynkman,G.M.;formachinelearningalgorithmsinshotgunproteomics.BMCBioinf.Scalf,M.;Keller,M.P.;Attie,A.D.;Smith,L.M.GlobalIdentification2012,13(Suppl16),S3.ofProteinPost-translationalModificationsinaSingle-PassDatabase(23)Granholm,V.;Noble,W.S.;Kall,L.OnusingsamplesofSearch.J.ProteomeRes.2015,14(11),4714−20.knownproteincontenttoassessthestatisticalcalibrationofscores(3)Li,Q.;Shortreed,M.R.;Wenger,C.D.;Frey,B.L.;Schaffer,L.assignedtopeptide-spectrummatchesinshotgunproteomics.J.V.;Scalf,M.;Smith,L.M.GlobalPost-TranslationalModificationProteomeRes.2011,10(5),2671−8.Discovery.J.ProteomeRes.2017,16(4),1383−1390.(24)Bassani-Sternberg,M.;Braunlein,E.;Klar,R.;Engleitner,T.;(4)Solntsev,S.K.;Shortreed,M.R.;Frey,B.L.;Smith,L.M.Sinitcyn,P.;Audehm,S.;Straub,M.;Weber,J.;Slotta-Huspenina,J.;EnhancedGlobalPost-translationalModificationDiscoverywithSpecht,K.;Martignoni,M.E.;Werner,A.;Hein,R.;Busch,D.B.;MetaMorpheus.J.ProteomeRes.2018,17(5),1844−1851.Peschel,C.;Rad,R.;Cox,J.;Mann,M.;Krackhardt,A.M.Direct(5)Millikin,R.J.;Solntsev,S.K.;Shortreed,M.R.;Smith,L.M.identificationofclinicallyrelevantneoepitopespresentedonnativeUltrafastPeptideLabel-FreeQuantificationwithFlashLFQ.J.humanmelanomatissuebymassspectrometry.Nat.Commun.2016,ProteomeRes.2018,17(1),386−391.7,13404.(6)Lu,L.;Millikin,R.J.;Solntsev,S.K.;Rolfs,Z.;Scalf,M.;(25)Schaffer,L.V.;Rensvold,J.W.;Shortreed,M.R.;Cesnik,A.J.;Shortreed,M.R.;Smith,L.M.IdentificationofMS-CleavableandJochem,A.;Scalf,M.;Frey,B.L.;Pagliarini,D.J.;Smith,L.M.NoncleavableChemicallyCross-LinkedPeptideswithMetaMor-IdentificationandQuantificationofMurineMitochondrialProteo-pheus.J.ProteomeRes.2018,17(7),2370−2376.formsUsinganIntegratedTop-DownandIntact-MassStrategy.J.(7)Lu,L.;Riley,N.M.;Shortreed,M.R.;Bertozzi,C.R.;Smith,L.ProteomeRes.2018,17(10),3526−3536.M.O-PairSearchwithMetaMorpheusforO-glycopeptidecharacter-(26)Cesnik,A.J.;Miller,R.M.;Ibrahim,K.;Lu,L.;Millikin,R.J.;ization.Nat.Methods2020,17(11),1133−1138.Shortreed,M.R.;Frey,B.L.;Smith,L.M.Spritz:AProteogenomic(8)Rolfs,Z.;Millikin,R.J.;Smith,L.M.AnAlgorithmtoImproveDatabaseEngine.J.ProteomeRes.2020,DOI:10.1021/acs.jproteo-theSpeedofSemi-andNon-SpecificEnzymeSearchesinProteomics.me.0c00407.Curr.Bioinf.2021,15(9),1065−1074.(27)Sheynkman,G.M.;Shortreed,M.R.;Frey,B.L.;Smith,L.M.(9)Miller,R.M.;Millikin,R.J.;Hoffmann,C.V.;Solntsev,S.K.;Discoveryandmassspectrometricanalysisofnovelsplice-junctionSheynkman,G.M.;Shortreed,M.R.;Smith,L.M.ImprovedProteinpeptidesusingRNA-Seq.Mol.CellProteomics2013,12(8),2341−53.InferencefromMultipleProteaseBottom-UpMassSpectrometry(28)Cesnik,A.J.;Miller,R.M.;Ibrahim,K.;Lu,L.;Millikin,R.J.;Data.J.ProteomeRes.2019,18(9),3429−3438.Shortreed,M.R.;Frey,B.L.;Smith,L.M.Spritz:AProteogenomic(10)Storey,J.D.;Tibshirani,R.StatisticalsignificanceforDatabaseEngine.bioRxiv2020,2020.06.08.140681DOI:10.1101/genomewidestudies.Proc.Natl.Acad.Sci.U.S.A.2003,100(16),2020.06.08.140681.9440−5.(29)Sherry,S.T.;Ward,M.;Sirotkin,K.dbSNP-databaseforsingle(11)Kall,L.;Storey,J.D.;MacCoss,M.J.;Noble,W.S.Posteriornucleotidepolymorphismsandotherclassesofminorgeneticerrorprobabilitiesandfalsediscoveryrates:twosidesofthesamevariation.GenomeRes.1999,9(8),677−679.coin.J.ProteomeRes.2008,7(1),40−4.(30)Jiang,H.;Lei,R.;Ding,S.W.;Zhu,S.Skewer:afastand(12)Keller,A.;Nesvizhskii,A.I.;Kolker,E.;Aebersold,R.Empiricalaccurateadaptertrimmerfornext-generationsequencingpaired-endstatisticalmodeltoestimatetheaccuracyofpeptideidentificationsreads.BMCBioinf.2014,15,182.madebyMS/MSanddatabasesearch.Anal.Chem.2002,74(20),(31)Kim,D.;Paggi,J.M.;Park,C.;Bennett,C.;Salzberg,S.L.5383−92.Graph-basedgenomealignmentandgenotypingwithHISAT2and(13)Anderson,D.C.;Li,W.;Payan,D.G.;Noble,W.S.AnewHISAT-genotype.Nat.Biotechnol.2019,37(8),907−915.algorithmfortheevaluationofshotgunpeptidesequencingin(32)DePristo,M.A.;Banks,E.;Poplin,R.;Garimella,K.V.;proteomics:supportvectormachineclassificationofpeptideMS/MSMaguire,J.R.;Hartl,C.;Philippakis,A.A.;delAngel,G.;Rivas,M.spectraandSEQUESTscores.J.ProteomeRes.2003,2(2),137−46.A.;Hanna,M.;McKenna,A.;Fennell,T.J.;Kernytsky,A.M.;(14)Kall,L.;Canterbury,J.D.;Weston,J.;Noble,W.S.;MacCoss,Sivachenko,A.Y.;Cibulskis,K.;Gabriel,S.B.;Altshuler,D.;Daly,M.M.J.Semi-supervisedlearningforpeptideidentificationfromshotgunJ.Aframeworkforvariationdiscoveryandgenotypingusingnext-proteomicsdatasets.Nat.Methods2007,4(11),923−5.generationDNAsequencingdata.Nat.Genet.2011,43(5),491−8.(15)Kall,L.;Storey,J.D.;MacCoss,M.J.;Noble,W.S.Assigning(33)McKenna,A.;Hanna,M.;Banks,E.;Sivachenko,A.;Cibulskis,significancetopeptidesidentifiedbytandemmassspectrometryusingK.;Kernytsky,A.;Garimella,K.;Altshuler,D.;Gabriel,S.;Daly,M.;DePristo,M.A.TheGenomeAnalysisToolkit:aMapReducedecoydatabases.J.ProteomeRes.2008,7(1),29−34.frameworkforanalyzingnext-generationDNAsequencingdata.(16)Kall,L.;Storey,J.D.;Noble,W.S.Non-parametricestimationGenomeRes.2010,20(9),1297−303.ofposteriorerrorprobabilitiesassociatedwithpeptidesidentifiedby(34)Cingolani,P.;Platts,A.;Wang,L.L.;Coon,M.;Nguyen,T.;tandemmassspectrometry.Bioinformatics2008,24(16),i42−8.Wang,L.;Land,S.J.;Lu,X.;Ruden,D.M.Aprogramforannotating(17)The,M.;MacCoss,M.J.;Noble,W.S.;Kall,L.Fastandandpredictingtheeffectsofsinglenucleotidepolymorphisms,SnpEff:AccurateProteinFalseDiscoveryRatesonLarge-ScaleProteomicsSNPsinthegenomeofDrosophilamelanogasterstrainw1118;iso-2;DataSetswithPercolator3.0.J.Am.Soc.MassSpectrom.2016,27iso-3.Fly2012,6(2),80−92.(11),1719−1727.(18)Halloran,J.T.;Zhang,H.;Kara,K.;Renggli,C.;The,M.;Zhang,C.;Rocke,D.M.;Kall,L.;Noble,W.S.SpeedingUpPercolator.J.ProteomeRes.2019,18(9),3353−3359.(19)Holzinger,A.DataMiningwithDecisionTrees:TheoryandApplications.OnlineInf.Rev.2015,39(3),437−438.(20)Burges,C.J.C.FromRankNettoLambdaRanktoLambdaMART:AnOverview;MicrosoftResearchTechnicalReportMSR-TR-2010-82;MicrosoftResearch:Redmond,WA,2010.https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf(accessed2021-02-19).2004https://dx.doi.org/10.1021/acs.jproteome.0c00838J.ProteomeRes.2021,20,1997−2004

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
最近更新
更多
大家都在看
近期热门
关闭