资源描述:
《5.From Frequency to Meaning- Vector Space Models of Semantics》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、JournalofArticialIntelligenceResearch37(2010)141-188Submitted10/09;published02/10FromFrequencytoMeaning:VectorSpaceModelsofSemanticsPeterD.Turneypeter.turney@nrc-cnrc.gc.caNationalResearchCouncilCanadaOttawa,Ontario,Canada,K1A0R6PatrickPantelme@patrickpantel.comYahoo!LabsSunnyval
2、e,CA,94089,USAAbstractComputersunderstandverylittleofthemeaningofhumanlanguage.Thisprofoundlylimitsourabilitytogiveinstructionstocomputers,theabilityofcomputerstoexplaintheiractionstous,andtheabilityofcomputerstoanalyseandprocesstext.Vectorspacemodels(VSMs)ofsemanticsarebeginningt
3、oaddresstheselimits.ThispapersurveystheuseofVSMsforsemanticprocessingoftext.WeorganizetheliteratureonVSMsaccordingtothestructureofthematrixinaVSM.TherearecurrentlythreebroadclassesofVSMs,basedonterm{document,word{context,andpair{patternmatrices,yieldingthreeclassesofapplications.W
4、esurveyabroadrangeofapplicationsinthesethreecategoriesandwetakeadetailedlookataspecicopensourceprojectineachcategory.OurgoalinthissurveyistoshowthebreadthofapplicationsofVSMsforsemantics,toprovideanewperspectiveonVSMsforthosewhoarealreadyfamiliarwiththearea,andtoprovidepointersin
5、totheliteratureforthosewhoarelessfamiliarwiththeeld.1.IntroductionOneofthebiggestobstaclestomakingfulluseofthepowerofcomputersisthattheycurrentlyunderstandverylittleofthemeaningofhumanlanguage.Recentprogressinsearchenginetechnologyisonlyscratchingthesurfaceofhumanlanguage,andyett
6、heimpactonsocietyandtheeconomyisalreadyimmense.Thishintsatthetransformativeimpactthatdeepersemantictechnologieswillhave.Vectorspacemodels(VSMs),surveyedinthispaper,arelikelytobeapartofthesenewsemantictechnologies.Inthispaper,weusethetermsemanticsinageneralsense,asthemeaningofaword
7、,aphrase,asentence,oranytextinhumanlanguage,andthestudyofsuchmeaning.Wearenotconcernedwithnarrowersensesofsemantics,suchasthesemanticweborapproachestosemanticsbasedonformallogic.WepresentasurveyofVSMsandtheirrelationwiththedistributionalhypothesisasanapproachtorepresentingsomeaspe
8、ctsofnaturallanguagesemantics.The