欢迎来到天天文库
浏览记录
ID:7296215
大小:1.15 MB
页数:22页
时间:2018-02-10
《python data analysis analyzing textual data》由会员上传分享,免费在线阅读,更多相关内容在工程资料-天天文库。
1、AnalyzingTextualDataandSocialMediaInthepreviouschapters,wefocusedontheanalysisofstructureddata,mostlyintabularformat.Inreality,plaintextisthemostpredominantformofdataavailabletoday.Textanalysisappliesanalysisofwordfrequencydistributions,patternrecognition,tagging,linkandassociationanalysis,sentimen
2、tanalysis,andvisualization.WewillanalyzetextwiththePythonNaturalLanguageToolkit(NLTK)library.NLTKcomeswithacollectionofsampletextscalledcorpora.Asmallexampleofnetworkanalysiswillalsobecovered.Thefollowingtopicswillbediscussedinthischapter:•InstallingNLTK•Filteringoutstopwords,names,andnumbers•Theba
3、g-of-wordsmodel•Analyzingwordfrequencies•NaiveBayesclassification•Sentimentanalysis•Creatingwordclouds•SocialnetworkanalysisAnalyzingTextualDataandSocialMediaInstallingNLTKNLTKisaPythonAPIfortheanalysisoftextswritteninnaturallanguages,suchasEnglish.NLTKwascreatedin2001andwasoriginallyintendedasatea
4、chingtool.InstallNLTKwiththefollowingcommand:$sudopipinstallnltk$pipfreeze
5、grepnltknltk==2.0.4Asusual,wewillchecktheinstallationwithanewversionofthepkg_check.pyfile.Thefollowingimportstatementisrequired:importnltkIfeverythingworks,weshouldgetaresultsimilartothefollowing:nltkversion2.0.4nltk.appDESC
6、RIPTIONchartparser:ChartParserchunkparser:Regular-ExpressionChunkParsercollocations:Findcollocationsintextconcordance:Partnltk.ccgDESCRIPTIONFormoreinformationseenltk/doc/contrib/ccg/ccg.pdfPACKAGECONTENTSapichartcombinatorlexiconDATABackwardApplication7、Theseperformsimplepatternmatchingonsentencestypedbyusers,andrespondwithautomaticallygnltk.chunkDESCRIPTIONClassesandinterfacesforidentifyingnon-overlappinglinguisticgroups(suchasbasenounphrases)inunrestrictedtext.Thisnltk.classifyDESCRIPTIONClassesandinterfacesforlabelingtokenswithcategorylabels(or8、"classlabels").Typically,labelsarerepresentedwithstrinltk.clusterDESCRIPTIONThismodulecontainsanumberofbasicclusteringalgorithms.Clusteringdescribesthetaskofdiscoveringgroupsofsimilaritenltk.corpusnltk.draw
7、Theseperformsimplepatternmatchingonsentencestypedbyusers,andrespondwithautomaticallygnltk.chunkDESCRIPTIONClassesandinterfacesforidentifyingnon-overlappinglinguisticgroups(suchasbasenounphrases)inunrestrictedtext.Thisnltk.classifyDESCRIPTIONClassesandinterfacesforlabelingtokenswithcategorylabels(or
8、"classlabels").Typically,labelsarerepresentedwithstrinltk.clusterDESCRIPTIONThismodulecontainsanumberofbasicclusteringalgorithms.Clusteringdescribesthetaskofdiscoveringgroupsofsimilaritenltk.corpusnltk.draw
此文档下载收益归作者所有