欢迎来到天天文库
浏览记录
ID:39753025
大小:279.24 KB
页数:12页
时间:2019-07-10
《An adaptive k-Nearest Neighbor Text Categorizatioin Strategy》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AnAdaptivek-NearestNeighborTextCategorizationStrategyLIBAOLI,PekingUniversityLUQIN,TheHongKongPolytechnicUniversityandYUSHIWEN,PekingUniversity________________________________________________________________________kisthemostimportantparameterinatextcategorizationsystembasedonthek-nearestneighbor
2、algorithm(kNN).Toclassifyanewdocument,thek-nearestdocumentsinthetrainingsetaredeterminedfirst.Thepredictionofcategoriesforthisdocumentcanthenbemadeaccordingtothecategorydistributionamongtheknearestneighbors.Generallyspeaking,theclassdistributioninatrainingsetisnoteven;someclassesmayhavemoresample
3、sthanothers.Thesystem’sperformanceisverysensitivetothechoiceoftheparameterk.Anditisverylikelythatafixedkvaluewillresultinabiasforlargecategories,andwillnotmakefulluseoftheinformationinthetrainingset.Todealwiththeseproblems,animprovedkNNstrategy,inwhichdifferentnumbersofnearestneighborsfordifferen
4、tcategoriesareusedinsteadofafixednumberacrossallcategories,isproposedinthisarticle.Moresamples(nearestneighbors)willbeusedtodecidewhetheratestdocumentshouldbeclassifiedinacategorythathasmoresamplesinthetrainingset.Thenumbersofnearestneighborsselectedfordifferentcategoriesareadaptivetotheirsamples
5、izeinthetrainingset.Experimentsontwodifferentdatasetsshowthatourmethodsarelesssensitivetotheparameterkthanthetraditionalones,andcanproperlyclassifydocumentsbelongingtosmallerclasseswithalargek.Thestrategyisespeciallyapplicableandpromisingforcaseswhereestimatingtheparameterkviacross-validationisno
6、tpossibleandtheclassdistributionofatrainingsetisskewed.CategoriesandSubjectDescriptors:H.3.3[InformationStorageandRetrieval]:InformationSearchandRetrieval–Informationfiltering;H.3.4[InformationStorageandRetrieval]:SystemsandSoftware–Performanceevaluation(efficiencyandeffectiveness);I.2.6[Artifici
7、alIntelligence]:Learning-Analogies;I.5.1[PatternRecognition]:Models-Statistical;I.5.4[PatternRecognition]:Applications–Textprocessing;GeneralTerms:Algorithms,Experimentation,Measurement,PerformanceAdditionalKeyWordsand
此文档下载收益归作者所有