资源描述:
《text classification using string kernelsnew》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、TextClassificationusingStringKernelsHumaLodhiJohnShawe-TaylorNelloCristianiniChrisWatkinsDepartmentofComputerScienceRoyalHolloway,UniversityofLondonEgham,SurreyTW200EX,UKfhuma,john,nello,chriswg@dcs.rhbnc.ac.ukJune,2000AbstractWeintroduceanovelkernelforcompa
2、ringtwotextdocuments.Thekernelisaninnerproductinthefeaturespaceconsistingofallsubsequencesoflengthk.Asubsequenceisanyorderedsequenceofkcharactersoccur-ringinthetextthoughnotnecessarilycontiguously.Thesubsequencesareweightedbyanexponentiallydecayingfactorofth
3、eirfulllengthinthetext,henceemphasisingthoseoccurrenceswhichareclosetocontiguous.Adi-rectcomputationofthisfeaturevectorwouldinvolveaprohibitiveamountofcomputationevenformodestvaluesofk,sincethedimensionofthefeaturespacegrowsexponentiallywithk.Thepaperdescrib
4、eshowdespitethisfacttheinnerproductcanbeefficientlyevaluatedbyadynamicprogrammingtechnique.Apreliminaryexperimentalcomparisonoftheperformanceofthekernelcomparedwithastandardwordfeaturespacekernel[4]ismadeshowingencouragingresults.1IntroductionStandardlearning
5、systems(likeneuralnetworksordecisiontrees)operateonin-putdataaftertheyhavebeentransformedintofeaturevectorsx;::;x2Xfrom1`ReceivedMay15,20001anndimensionalspace.Therearecases,however,wheretheinputdatacannotbereadilydescribedbyexplicitfeaturevectors:forexampl
6、ebiosequences,images,graphsandtextdocuments.Forsuchdatasets,theconstructionofafeatureextrac-tionmodulecanbeascomplexandexpensiveassolvingtheentireproblem.Aneffectivealternativetoexplicitfeatureextractionisprovidedbykernelmethods.Kernel-basedlearningmethodsus
7、eanimplicitmappingoftheinputdataintoahighdimensionalfeaturespacedefinedbyakernelfunction,i.e.afunctionreturningtheinnerproductbetweentheimagesoftwodatapointsinthefeaturespace.Thelearningthentakesplaceinthefeaturespace,providedthelearningalgorithmcanbeentirely
8、rewrittensothatthedatapointsonlyappearinsidedotproductswithotherdatapoints.Severallinearalgorithmscanbeformulatedinthisway,forclustering,classi-ficationandregression.Themosttypicalexampleofkernel