资源描述:
《中度宫颈糜烂症状中度宫颈糜烂患课件》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、CS276ATextInformationRetrieval,Mining,andExploitationLecture724Oct2002StandardProbabilisticIRqueryd1d2dn…InformationneeddocumentcollectionmatchingIRbasedonLMqueryd1d2dn…Informationneeddocumentcollectiongeneration…Onenightinahotel,Isawthislatenighttalkshow
2、whereSergeyBrinpoppedonsuggestingthewebsearchtipthatyoushouldthinkofsomewordsthatwouldlikelyappearonpagesthatwouldansweryourquestionandusethoseasyoursearchterms–let’sexploitthatidea!FormalLanguage(Model)Traditionalgenerativemodel:generatesstringsFinitesta
3、temachinesorregulargrammars,etc.Example:IwishIwishIwishIwishIwishIwishIwishIwishIwishIwishIwish…*wishIwishStochasticLanguageModelsModelsprobabilityofgeneratingstringsinthelanguage(commonlyallstringsover∑)0.2the0.1a0.01man0.01woman0.03said0.02likes…themanl
4、ikesthewoman0.20.010.020.20.01multiplyModelMP(s
5、M)=0.00000008StochasticLanguageModelsModelprobabilityofgeneratinganystring0.2the0.01class0.0001sayst0.0001pleaseth0.0001yon0.0005maiden0.01womanModelM1ModelM2maidenclasspleasethyonthe0.00050.010.00010.00010.
6、20.010.00010.020.10.2P(s
7、M2)>P(s
8、M1)0.2the0.0001class0.03sayst0.02pleaseth0.1yon0.01maiden0.0001womanStochasticLanguageModelsAstatisticalmodelforgeneratingtextProbabilitydistributionoverstringsinagivenlanguageMP(
9、M)=P(
10、M)P(
11、M,)P(
12、M,)P(
13、M,)Unigramandhigher
14、-ordermodelsUnigramLanguageModelsBigram(generally,n-gram)LanguageModelsOtherLanguageModelsGrammar-basedmodels(PCFGs),etc.ProbablynotthefirstthingtotryinIR=P()P(
15、)P(
16、)P(
17、)P()P()P()P()P()P()P(
18、)P(
19、)P(
20、)Easy.Effective!UsingLanguageModelsinIRTreateachdocument
21、asthebasisforamodel(e.g.,unigramsufficientstatistics)RankdocumentdbasedonP(d
22、q)P(d
23、q)=P(q
24、d)xP(d)/P(q)P(q)isthesameforalldocuments,soignoreP(d)[theprior]isoftentreatedasthesameforalldButwecouldusecriterialikeauthority,length,genreP(q
25、d)istheprobabilityofq
26、givend’smodelVerygeneralformalapproachThefundamentalproblemofLMsUsuallywedon’tknowthemodelMButhaveasampleoftextrepresentativeofthatmodelEstimatealanguagemodelfromasampleThencomputetheobservationprobabilityP(
27、M())MLa