资源描述:
《基于贝叶斯理论的中文垃圾邮件过滤算法.研究》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、兰州交通大学硕士学位论文AbstractWiththerapidpopularizationofInternet,e-mailhasbecomeoneoftheprimarymeansofcommunication.Butmuchattentionisalsoarousedbythefloodofspam,spamnotonlywasteuser’stimeandenergy,useupalotofnetworkbandwidthandstorageresources,theyalsobringpotentialsecureproblemsofnetworkandinform
2、ation.Thereforespamfilteringisasubjectwithimportantpracticalsignificance.Content-basedspamfilteringtechnologyisanimportantanti-spamtechnology,whichatpresentismainlybasedonwordsfiltering,rule-basedtechniquesandstatisticallearningmethods.NaïveBayesalgorithmwhichisbasedonprobabilityandstatisti
3、cshasbeenwidelyusedintheareaofspamfilteringforitssimplicity,efficiencyandaccuracy.However,italsohasshortcomings,suchasitcannotbeappliedtochinesee-mailfilteringcommendably,doesnottakeintoaccounttheriskofmiscarriageofjustice,shouldnottakeincrementallearning.Analyzestheclassificationdifference
4、sbetweenenglishandchineseemails,discussesthechinesee-mailpre-processingtechnology,includinge-mailanalysis,chinesewordsegmentationandfeatureselection,thenapplyNaïveBayesianalgorithmtochinesee-mailfiltering.Misclassifyinglegitimatemailasspamwillleadtogreaterlossofusers,thetraditionalBayesiana
5、lgorithmdoesnottakeintoaccountofthisdifference.Introducedtheideaofminimizingtheloss,aleastriskBayesianalgorithmisproposed,Thealgorithmcanachieveuser'spurposebyadjustingthevalueoflossweight.Becauseoftheshortageofinformationstorage,Bayesianclassifierwilleasilymaketheclassificationofnewemailsi
6、ncorrectly,iftheseincorrectlylabeledemailsareaddedtotheBayesclassifierearly,itwillreducetheperformanceofBayesianclassifier.Inaddition,traditionalBayesianclassifierwillcostalotoftimetolearnallemailsagain.Forresolvingtheseproblems,anincrementallearningalgorithmbasedonuser'sfeedbackisputforwar
7、d,thealgorithmisbasedonleastriskBayesianclassifier,inordertolearnnewsamplestomodifytheclassifierandgivesthecalculatingformulaforincrementallearning.ThealgorithmproposedinthispaperisimplementedusingJAVAlanguage,theexperimenobtainsasetofpreferableparameter