欢迎来到天天文库
浏览记录
ID:52982695
大小:688.32 KB
页数:34页
时间:2020-04-05
《序列模式挖掘及应用.pdf》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、序列模式挖掘及其应用研究摘要序列模式挖掘是数据挖掘研究的一个重要的研究领域。目前,成熟的序列模式挖掘算法主要有三类:基于Apriori性质的候选码生成-测试的算法;基于垂直格式的候选码生成-测试的算法;基于投影数据库的模式增长算法。近年来,序列模式挖掘在分布式环境下的应用的研究逐渐成为热点,提出了各种算法。本文介绍序列模式挖掘算法及各自的优缺点和在分布式环境下的应用,在此基础上发现了分布式环境下站点之间局部模式子树的传输存在问题。本文提出了分布式环境下基于叶子节点传输的序列模式挖掘方法LMSP(le
2、af-basedminingofsequentialpatterns),即在生成全局L2序列模式的过程中,各站点传输局部L2子树时只传输局部子树的叶子节点的序列以及所有节点的支持度计数,在选举站点上再根据接收到的子树信息将局部L2子树还原。接着又简单地提出约减的树结构的传输,除根节点外的每个节点都只记录相对其父节点的后缀序列。实验结果表明,LMSP算法性能优于FDMSP算法。文章最后简单的介绍了序列模式挖掘的实际应用。关键词:数据挖掘;序列模式;分布式算法;数据传输AbstractSequentia
3、lpatternminingisanimportantdomainofdatamining.Nowtherearethreetypesofmaturealgorithmsofsequentialpatternsmining:Apriori-basedalgorithmsbycandidatesequencegenerating-and-testing;verticalformatdatabasebasedalgorithmsbycandidatesequencegenerating-and-test
4、ing;projectiondatabasebasedalgorithmswithbypattern-growth.Inrecentyears,miningofsequentialpatternsindistributedenvironmentisbecominghottopic,andsomealgorithmshavebeenproposed.Inthispaper,threealgorithmsofsequentialpatternminingandadvantagesanddisadvant
5、agesofthemareintroduced,andthentheapplicationsofsequentialpatternminingalgorithmsindistributedenvironment.Sincethis,wefindaproblemoflocalpatternsubtreetransportationfromonesitetoanotherindistributedenvironment.Inthispaper,weproposealeaf-basedalgorithmi
6、ndistributedenvironment,LMSP(leaf-basedminingofsequentialpatterns),onlytransporttheleafnodesequencesandallthesupportcountsofthelocalL2subtree,whileeverysitetransportingthelocalL2subtreetopollingsiteinthecourseofglobalL2patternsgenerating.Atpollingsite,
7、wegetthelocalL2subtreebackfromreceivedsubtreemessage.Andwealsoproposetransportationofreductionsubtreesimply,allthenodes(excepttheroot)registeronlysuffixaccordingtoitsparentinsteadoftheentiresequence.TheexperimentsshowthatthealgorithmLMSPoutperformsthea
8、lgorithmFDMSP.Thelastpartofthispaper,wesimplyintroducetheapplicationsofsequentialpatternmining.Keywords:datamining;sequentialpattern;distributedalgorithm;datatransportation目录1.引言..........................................................
此文档下载收益归作者所有