资源描述:
《面向维吾尔语文本的改进后缀树聚类》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、.面向维吾尔语文本的改进后缀树聚类摘要:针对后缀树聚类选取基类时,基类短语出现信息不规范、重复和冗余的问题,提出了一种改进后缀树聚类算法。该算法首先以短语互信息算法改进基类的选取,选出遵守维吾尔语语法规则的基类短语;然后,利用短语归并算法对选取的重复基类短语进行归并;最后,在前两步的工作基础上,利用短语去冗余算法处理冗余的基类短语。实验证明,与传统后缀树聚类(stc)相比,改进后缀树聚算法的全面率、准确率都得到了提高。这表明,改进算法有效地改善了聚类效果。关键词:维吾尔语;后缀树;互信息;归并;冗余improvedsuffixtr
2、eeclusteringforuyghurtextzhaixian.min1*,tiansheng.wei2,yulong3,fengguan.jun41.collegeofinformationscienceandengineering,xinjianguniversity,urumqixinjiang830046,china;2.collegeofsoftware,xinjianguniversity,urumqixinjiang830008,china;3.networkcenter,xinjiangun
3、iversity,urumqixinjiang830046,china;4.collegeofhumanities,xinjianguniversity,urumqi-..xinjiang830046,chinaabstract:inordertosolvetheproblemsofirregular,repetitionandredundancyofinformationintheprocessofselectingthebaseclassphrases,animprovedsuffixtreeclustering(stc)m
4、ethodisproposed.firstly,phrasemutualinformationalgorithmisputforwardtochoosethebaseclassphrasesabidingbyuyghurgrammar.secondly,inordertoreducetherepeatedbaseclassphrase,thephrasereductionalgorithmbasedonuyghurgrammarisproposed.thirdly,onthebasisofthefirsttwosteps,thephr
5、aseredundancyalgorithmbasedonuyghurgrammarisconstructedtoremoveredundantphrase.theexperimentalresultsshowthatthismethodimprovestherecallandtheprecisioncomparedstc.thisindicatesthattheimprovedalgorithmcanenhanceclusteringresultseffectively.inordertosolvetheproblemsofnon.
6、standard,repetitionandredundancyofinformationintheprocessofselectingthebaseclassphrases,animprovedsuffixtreeclustering(stc)methodwasproposed.firstly,phrasemutualinformationalgorithmwasputforwardtochoosethebaseclassphrasesabidingbyuyghurgrammar.secondly,inordertoreduceth
7、erepeatedbaseclassphrase,thephrasereduction-..algorithmbasedonuyghurgrammarwasproposed.thirdly,onthebasisofthefirsttwosteps,thephraseredundancyalgorithmbasedonuyghurgrammarwasconstructedtoremoveredundantphrase.theexperimentalresultsshowthatthismethodimprovestherecalland
8、theprecisioncomparedwithstc.thisindicatesthattheimprovedalgorithmcanenhanceclusteringperformanceeffectively.k