资源描述:
《Statistical Language Models for Information Retrieval - A Critical Review.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、RFoundationsandTrendsinInformationRetrievalVol.2,No.3(2008)137–213c2008C.ZhaiDOI:10.1561/1500000008StatisticalLanguageModelsforInformationRetrievalACriticalReviewChengXiangZhaiUniversityofIllinoisatUrbana-Champaign,201N.Goodwin,Urbana,IL61801,USA,czhai@cs.uiuc.eduAbstractStatisticallanguage
2、modelshaverecentlybeensuccessfullyappliedtomanyinformationretrievalproblems.Agreatdealofrecentworkhasshownthatstatisticallanguagemodelsnotonlyleadtosuperiorempiricalperformance,butalsofacilitateparametertuningandopenuppossibilitiesformodelingnontraditionalretrievalproblems.Ingen-eral,statisti
3、callanguagemodelsprovideaprincipledwayofmodel-ingvariouskindsofretrievalproblems.Thepurposeofthissurveyistosystematicallyandcriticallyreviewtheexistingworkinapplyingstatisticallanguagemodelstoinformationretrieval,summarizetheircontributions,andpointoutoutstandingchallenges.1IntroductionThegoa
4、lofaninformationretrieval(IR)systemistorankdocumentsoptimallygivenaquerysothatrelevantdocumentswouldberankedabovenonrelevantones.Inordertoachievethisgoal,thesystemmustbeabletoscoredocumentssothatarelevantdocumentwouldideallyhaveahigherscorethananonrelevantone.ClearlytheretrievalaccuracyofanIR
5、systemisdirectlydeterminedbythequalityofthescoringfunctionadopted.Thus,notsurprisingly,seekinganoptimalscoringfunction(retrievalfunction)hasalwaysbeenamajorresearchchallengeininformationretrieval.Aretrievalfunctionisbasedonaretrievalmodel,whichformalizesthenotionofrelevanceandenablesustoderiv
6、earetrievalfunctionthatcanbecomputedtoscoreandrankdocuments.Overthedecades,manydifferenttypesofretrievalmodelshavebeenproposedandtested.Agreatdiversityofapproachesandmethodologyhasdeveloped,butnosingleunifiedretrievalmodelhasproventobemosteffective.Indeed,findingthesingleoptimalretrievalmodelhasb
7、eenandremainsalong-standingchallengeininformationretrievalresearch.138139Thefieldhasprogressedintwodifferentways.Ontheonehand,theoreticalmodelshavebeenproposedoftentomodelrelevancethroughinferences;representativemodelsincludethelogicmodels[27,1