资源描述:
《Using Linear Algebra for Intelligent Information Retrieval 1995》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SIAMREVIEW()1995SocietyforIndustrialandAppliedMathematicsVol.37,No.4,pp.573-595,December1995OO5USINGLINEARALGEBRAFORINTELLIGENTINFORMATIONRETRIEVAL*MICHAELW.BERRYt,SUSANT.DUMAIS$,ANDGAVINW.O'BRIENtAbstract.Currently,mostapproachestoretrievingtextualmaterialsfromscient
2、ificdatabasesdependonalexicalmatchbetweenwordsinusers'requestsandthoseinorassignedtodocumentsinadatabase.Becauseofthetremendousdiversityinthewordspeopleusetodescribethesamedocument,lexicalmethodsarenecessarilyincompleteandimprecise.Usingthesingularvaluedecomposition(S
3、VD),onecantakeadvantageoftheimplicithigher-orderstructureintheassociationoftermswithdocumentsbydeterminingtheSVDoflargesparsetermbydocumentmatrices.Termsanddocumentsrepresentedby200-300ofthelargestsingularvectorsarethenmatchedagainstuserqueries.Wecallthisretrievalmeth
4、odlatentsemanticindexing(LSI)becausethesubspacerepresentsimportantassociativerelationshipsbetweentermsanddocumentsthatarenotevidentinindividualdocuments.LSIisacompletelyautomaticyetintelligentindexingmethod,widelyapplicable,andapromisingwaytoimproveusers'accesstomanyk
5、indsoftextualmaterials,ortodocumentsandservicesforwhichtextualdescriptionsareavailable.AsurveyofthecomputationalrequirementsformanagingLSI-encodeddatabasesaswellascurrentandfutureapplicationsofLSIispresented.Keywords,indexing,information,latent,matrices,retrieval,sema
6、ntic,singularvaluedecomposition,sparse,updatingAMSsubjectclassifications.15A18,15A48,65F15,65F50,68P201.Introduction.Typically,informationisretrievedbyliterallymatchingtermsindocu-mentswiththoseofaquery.However,lexicalmatchingmethodscanbeinaccuratewhentheyareusedtomat
7、chauser'squery.Sincethereareusuallymanywaystoexpressagivenconcept(synonymy),theliteraltermsinauser'squerymaynotmatchthoseofarelevantdocument.Inaddition,mostwordshavemultiplemeanings(polysemy),sotermsinauser'squerywillliterallymatchtermsinirrelevantdocuments.Abetterapp
8、roachwouldallowuserstoretrieveinformationonthebasisofaconceptualtopicormeaningofadocument.Latentsemanticindexing(LSI)[4]trie