资源描述:
《中文网页语义标注_由句子到RDF表示_荆涛》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、计算机研究与发展ISSN10001239CN111777TPJournalofComputerResearchandDevelopment45(7):12211231,2008中文网页语义标注:由句子到RDF表示1,2111荆涛左万利孙吉贵车海燕1(吉林大学计算机科学与技术学院长春130012)2(吉林大学符号计算与知识工程教育部重点实验室长春130012)(jingtaocst@email.jlu.edu.cn)SemanticAnnotationofChineseWebPages:FromSentencestoRDFRepresentation
2、s111,21JingTao,ZuoWanli,SunJigui,andCheHaiyan1(CollegeofComputerScienceandTechnology,JilinUniversity,Changchun130012)2(MinistryofEducationKeyLaboratoryofSymbolicComputationandKnowledgeEngineering,JilinUniversity,Changchun130012)AbstractTheSemanticWebaimstoleveragetheWorldWideWebtoaWebofd
3、ata,wheremachinesareabletoprocessannotationsandrelationsbetweenresources,andwhereimplicitinformationcanbederivedfromutilizingontologiesandsharedvocabularies.TofulfillthevisionoftheSemanticWeb,amethodofautomaticsemanticannotationisneeded.Proposedinthispaperisamethodologyforsemanticannotati
4、onofChineseWebpages,whichisguidedbydomainontology.Thestatisticalmethodandthenaturallanguageprocessingtechnologyareemployed,andthemappingfromsentencestoRDFrepresentationsarerealizedthroughtheidentificationphaseandthegroupingphase.Themajortechnicalcontributionsare:thedomainlexiconconstructe
5、dbythestatisticalmethodratherthanthelinguisticontologyisusedastheexternaldomainknowledge;theexplicitpropertytypetaggingalgorithmisusedtorecognizebothinstancesandpropertiescontainedinsentencestofacilitaterelationextraction;afterbuildingdependencytreesordependencyforestsofsentences,theident
6、ifiedinstancesandpropertiescanbegroupedintoRDFstatementsaccordingtothedependencyrelationshipamongChinesewords.Theexperimentalresultshowsthatcomparedwiththesemanticannotationmethodbasedonthegrammaticalrelationshipofsubjectverbobject,thismethodissignificantlymoreeffective.Keywordsnatural
7、languageprocessing;dependencyrelationship;typetagging;relationextraction;ontology摘要语义网远景的实现需要自动化的语义标注方法.提出了一种在领域本体指导下,针对中文网页的语义标注方法.运用统计学方法与自然语言处理技术,以文档中句子为处理对象,采取识别和组合两个阶段来完成句子向RDF表示的映射.它具有以下特点:以统计方法获得领域相关词汇,构造领域词汇标注列表作为外部领域知识,降低对通用语言本体的依赖;显式的属性类型标注方法识别出句子中表达关系的词