欢迎来到天天文库
浏览记录
ID:37658734
大小:58.32 KB
页数:8页
时间:2019-05-27
《Organizing encyclopedic knowledge based on the Web and its application to question answerin》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
OrganizingEncyclopedicKnowledgebasedontheWebanditsApplicationtoQuestionAnsweringAtsushiFujiiTetsuyaIshikawaUniversityofLibraryandUniversityofLibraryandInformationScienceInformationScience1-2Kasuga,Tsukuba1-2Kasuga,Tsukuba305-8550,Japan305-8550,JapanCREST,JapanScienceandishikawa@ulis.ac.jpTechnologyCorporationfujii@ulis.ac.jpAbstractOntheonehand,theirmethodisexpectedtoen-hanceexistingencyclopedias,wherevocabularysizeWeproposeamethodtogeneratelarge-scaleisrelativelylimited,andthereforethequantityprob-encyclopedicknowledge,whichisvaluablelemshasbeenresolved.formuchNLPresearch,basedontheWeb.Ontheotherhand,encyclopediasextractedfromtheWefirstsearchtheWebforpagescontain-Webarenotcomparablewithexistingonesintermsofingaterminquestion.Thenweuselin-quality.Inhand-craftedencyclopedias,termdescrip-guisticpatternsandHTMLstructurestoex-tionsarecarefullyorganizedbasedondomainsandtracttextfragmentsdescribingtheterm.Fi-wordsenses,whichareespeciallyeffectiveforhumannally,weorganizeextractedtermdescrip-usage.However,theoutputofFujii'smethodissimplytionsbasedonwordsensesanddomains.Inasetofunorganizedtermdescriptions.Althoughclus-addition,weapplyanautomaticallygener-teringisoptionallyperformed,resultantclustersareatedencyclopediatoaquestionansweringnotnecessarilyrelatedtoexplicitcriteria,suchaswordsystemtargetingtheJapaneseInformation-sensesanddomains.TechnologyEngineersExamination.Tosumup,ourbeliefisthatbycombiningextrac-tionandorganizationmethods,wecanenhanceboth1IntroductionquantityandqualityofWeb-basedencyclopedias.Motivatedbythisbackground,weintroduceanor-ReflectingthegrowthinutilizationoftheWorldWideWeb,anumberofWeb-basedlanguageprocessingganizationmodeltoFujii'smethodandreformalizethewholeframework.Inotherwords,ourproposedmethodshavebeenproposedwithinthenaturallan-guageprocessing(NLP),informationretrieval(IR)methodisnotonlyextractionbutgenerationofency-clopedicknowledge.andartificialintelligence(AI)communities.Asam-pleoftheseincludesmethodstoextractlinguisticSection2explainstheoveralldesignofourency-resources(FujiiandIshikawa,2000;Resnik,1999;clopediagenerationsystem,andSection3elaboratesSoderland,1997),retrieveusefulinformationinre-onourorganizationmodel.Section4thenexploressponsetouserqueries(Etzioni,1997;McCallumetamethodforapplyingourresultantencyclopediatoal.,1999)andmine/discoverknowledgelatentintheNLPresearch,specifically,questionanswering.Sec-Web(Inokuchietal.,1999).tion5performsanumberofexperimentstoevaluateInthispaper,mainlyfromanNLPpointofview,ourmethods.weexploreamethodtoproducelinguisticresources.Specifically,weenhancethemethodproposedbyFu-2SystemDesignjiiandIshikawa(2000),whichextractsencyclopedic2.1Overviewknowledge(i.e.,termdescriptions)fromtheWeb.Inbrief,theirmethodsearchestheWebforpagesFigure1depictstheoveralldesignofoursystem,containingaterminquestion,anduseslinguisticex-whichgeneratesanencyclopediaforinputterms.pressionsandHTMLlayoutstoextractfragmentsde-Oursystem,whichiscurrentlyimplementedforscribingtheterm.TheyalsousealanguagemodeltoJapanese,consistsofthreemodules:“retrieval,”“ex-discardnon-linguisticfragments.Inaddition,aclus-traction”and“organization,”amongwhichtheorga-teringmethodisusedtodividedescriptionsintoaspe-nizationmoduleisnewlyintroducedinthispaper.Incificnumberofgroups.principle,theremainingtwomodules(“retrieval”and “extraction”)arethesameasproposedbyFujiiandThefirstruleisbasedonJapaneselinguisticpatternsIshikawa(2000).typicallyusedfortermdescriptions,suchas“XtohaInFigure1,termscanbesubmittedeitheron-lineorYdearu(XisY).”Followingthemethodproposedoff-line.AreasonablemethodisthatwhilethesystembyFujiiandIshikawa(2000),wesemi-automaticallyperiodicallyupdatestheencyclopediaoff-line,termsproduced20patternsbasedontheJapaneseCD-ROMunindexedintheencyclopediaaredynamicallypro-WorldEncyclopedia(Heibonsha,1998),whichin-cessedinreal-timeusage.Ineithercase,oursystemcludesapproximately80,000entriesrelatedtovariousprocessesinputtermsonebyone.fields.Itisexpectedthataregionincludingthesen-Webrieflyexplaineachmoduleinthefollowingtencethatmatchedwithoneofthosepatternscanbeathreesections,respectively.termdescription.ThesecondruleisbasedonHTMLlayout.Inatyp-term(s)icalcase,aterminquestionishighlightedasaheadingwithtagssuchas
,Figure1:TheoveralldesignofourWeb-basedency-clopediagenerationsystem.3.itemizationtaggedwith
此文档下载收益归作者所有
举报原因
联系方式
详细说明
内容无法转码请点击此处