资源描述:
《MULTI-PARAGRAPH SEGMENTATION OF多段分割 说明文》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ToApearinACL'94,LasCruces,NMMULTI-PARAGRAPHSEGMENTATIONOFEXPOSITORYTEXTMartiA.HearstComputerScienceDivision,571EvansHallUniversityofCalifornia,BerkeleyBerkeley,CA94720andXeroxPaloAltoResearchCentermarti@cs.berkeley.eduAbstractdemarcation.Thispaperpresentsfully-implem
2、entedal-gorithmsthatuselexicalcohesionrelationstopartitionThispaperdescribesTextTiling,analgorithmforparti-expositorytextsintomulti-paragraphsegmentsthatre-tioningexpositorytextsintocoherentmulti-paragraph
ecttheirsubtopicstructure.Becausethemodelofdis-discourseunits
3、whichre
ectthesubtopicstructureofcoursestructureisoneinwhichtextispartitionedintothetexts.Thealgorithmusesdomain-independentlex-contiguous,nonoverlappingblocks,Icallthegeneralicalfrequencyanddistributioninformationtorecog-approachTextTiling.Theultimategoalistonotonly
4、nizetheinteractionsofmultiplesimultaneousthemes.identifytheextentsofthesubtopicalunits,buttolabelTwofully-implementedversionsofthealgorithmarede-theircontentsaswell.Thispaperfocussesonlyonthescribedandshowntoproducesegmentationthatcorre-discoveryofsubtopicstructure,l
5、eavingdeterminationspondswelltohumanjudgmentsofthemajorsubtopicofsubtopiccontenttofuturework.boundariesofthirteenlengthytexts.Mostdiscoursesegmentationworkisdoneatanergranularitythanthatsuggestedhere.However,forINTRODUCTIONlengthywrittenexpositorytexts,multi-paragra
6、phseg-Thestructureofexpositorytextscanbecharacterizedmentationhasmanypotentialuses,includingtheim-asasequenceofsubtopicaldiscussionsthatoccurintheprovementofcomputationaltasksthatmakeuseofdis-contextofafewmaintopicdiscussions.Forexample,atributionalinformation.Forexa
7、mple,disambiguationpopularsciencetextcalledStargazers,whosemaintopicalgorithmsthattrainonarbitrary-sizetextwindows,istheexistenceoflifeonearthandotherplanets,canbee.g.,Yarowsky(1992)andGaleetal.(1992b),andal-describedasconsistingofthefollowingsubdiscussionsgorithmsth
8、atuselexicalco-occurrencetodeterminese-(numbersindicateparagraphnumbers):manticrelatedness,e.g.,Schutze(1993),mightbenetfromusingw