资源描述:
《Simulating Text With Markov Chains in Python – Towards Data Science》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ApplausefromLudovicBenistantand60othersBenShaverFollowDataScienceImmersivestudent @GADCDec22·3minreadSimulatingTextWithMarkovChainsinPythonInmylastpost,IintroducedMarkovchainsinthecontextofMarkovchainMonteCarlomethods.Thispostisasmalladdendumtothatone,dem
2、onstratingonefunthingyoucandowithMarkovchains:simulatetext.PixabayAMarkovchainisasimulatedsequenceofevents.Eacheventinthesequencecomesfromasetofoutcomesthatdependononeanother.Inparticular,eachoutcomedetermineswhichoutcomesarelikelytooccurnext.InaMarkovcha
3、in,alloftheinformationneededtopredictthenexteventiscontainedinthemostrecentevent.ThatmeansthatknowingthefullhistoryofaMarkovchaindoesn’thelpyoupredictthenextoutcomeanybetterthanonlyknowingwhatthelastoutcomewas.Markovchainsaren’tgenerallyreliablepredictors
4、ofeventsinthenearterm,sincemostprocessesintherealworldaremorecomplexthanMarkovchainsallow.Markovchainsare,however,usedtoexaminethelong-runbehaviorofaseriesofeventsthatarerelatedtooneanotherby xedprobabilities.Foranysequenceofnon-independenteventsintheworl
5、d,andwherealimitednumberofoutcomescanoccur,conditionalprobabilitiescanbecomputedrelatingeachoutcometooneanother.Oftenthissimplytakestheformofcountinghowoftencertainoutcomesfollowoneanotherinanobservedsequence....Togenerateasimulationbasedonacertaintext,co
6、untupeverywordthatisused.Then,foreveryword,storethewordsthatareusednext.Thisisthedistributionofwordsinthattextconditionalontheprecedingword.InordertosimulatesometextfromDonaldTrump,let’suseacollectionofhisspeechesfromthe2016campaignavailablehere.Firstimpo
7、rtnumpyandthetext lecontainingTrump’sspeeches:importnumpyasnptrump=open('speeches.txt',encoding='utf8').read()Then,splitthetext leintosinglewords.Notewe’rekeepingallthepunctuationin,sooursimulatedtexthaspunctuation:corpus=trump.split()Then,wede neafunctio
8、ntogiveusallthepairsofwordsinthespeeches.We’reusinglazyevaluation,andyieldingageneratorobjectinsteadofactually llingupourmemorywitheverypairofwords:defmake_pairs(corpus):foriinrange(len(corpus)-1):yield(corpus[i],co