资源描述:
《ABSTRACT Extracting Statistical Data Frames from Text》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、ExtractingStatisticalDataFramesfromTextJishengLiang,KrzysztofKoperski,ThienNguyen,andGiovanniMarchisioInsightfulCorporation1700WestlakeAveN,Suite500Seattle,WA98109[jliang,krisk,thien,giovanni]@insightful.comABSTRACTatimeandberathercomputationallyexpensive.Itincludestechniques
2、likelexicalanalysis,multiwordphrasegrouping,senseWepresentaframeworkthatbridgesthegapbetweennaturaldisambiguation,part-of-speechtagging,anaphoraresolution,andlanguageprocessing(NLP)andtextmining.Centraltothisisaroledetermination.TheargumentsagainstNLParethatitiserror-newappro
3、achtotextparameterizationthatcapturesmanyprone,andNLPoutput(i.e.parsetrees)containstoomuchinterestingattributesoftextusuallyignoredbystandardindices,linguisticdetail,noiseanduncertaintytoprovideaworkingliketheterm-documentmatrix.BystoringNLPtags,thenewknowledgebasefordataanal
4、ysisormining.Failuretoaccountforindexsupportsahigherdegreeofknowledgediscoveryandsemanticandsyntacticvariationsacrossadocumentcollectionpatternfindingfromtext.Theindexisrelativelycompact,hasledtodisappointingresultswhentryingtousefineindexingenablingdynamicsearchofarbitraryre
5、lationshipsandeventsinstructuresderivedfromalinguisticparser.largedocumentcollections.WecanexportsearchresultsinInformationExtraction(IE)isamethodologyemployedasaformatsanddatastructuresthataretransparenttostatisticalanalysistoolslikeS-PLUS®.Inanumberofexperiments,weprecursor
6、totextminingespeciallyinbioinformatics[4][5].IEdemonstratehowthisframeworkcanturnmountainsofappliesNLPtechniquestoextractpredefinedsetsofentities,unstructuredinformationintoinformativestatisticalgraphs.relationships,andpatternsofinterestfromdocuments.IEsystems,likethosedevelo
7、pedintheMUC[6]andACE[7]Keywordsprograms,arelimitedintheirpowerofinformationdiscovery.Textmining,naturallanguageprocessing,NLP,statisticaldataFirst,theyemploypre-determinedtemplatesorrulesets.Second,frames,dataanalysis,visualizationtheydonotindexeverythinginacorpus,butonlywhat
8、theyarepreprogrammedtofind.Theaimoftextminingshouldbetofind1.INTRODU