欢迎来到天天文库
浏览记录
ID:40878128
大小:158.25 KB
页数:5页
时间:2019-08-09
《Abstract_Adaptive_Filtering_for_Efficient_Record_Linkage》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AdaptiveFilteringforEfficientRecordLinkageLifangGuRohanBaxterCSIROICTCentreCSIROICTCentreGPOBox664GPOBox664CanberraACT2601,AustraliaCanberraACT2601,AustraliaLifang.Gu@csiro.auRohan.Baxter@csiro.auAbstractBlocking/RecordComparisonComparisonTheprocessofi
2、dentifyingrecordpairsthatrepresentSearchingPairsVectorsthesamereal-worldentityinmultipledatabases,com-Recordsmonlyknownasrecordlinkage,isoneoftheimportantRecordDecisioninitialstepsinmanydataminingapplications.RecordStandardisationPairsModellinkageofm
3、illionsofrecordsisacomputationallyex-MatchingRecordsStatuspensivetask.VariousblockingmethodshavebeenusedinrecordlinkagesystemstoreducethenumberofrecordDatapairsforcomparison.AgoodblockingkeyiscriticaltothesuccessofaMeasurementblockingmethodandwillide
4、allyresultinlotofsmallblocks.However,inpractice,therearealmostalwaysFigure1:Informationflowdiagramofarecordlinkagelargeblocksnomatterhowgoodtheblockingkeyis.system.Forexample,whenblockingonsurnameforanAnglo-Celticpopulation,SmithandTaylorarepopulousan
5、dresultinverylargeblocksizes.Theefficiencyofaandgovernmentadministration.Theseapplicationscanblockingmethodishinderedbytheselargeblockssincebeclassedas‘administrative’,becauserecordlinkageistheresultingnumberofrecordpairsisdominatedbyusedtomakedecision
6、sandtakeactionsregardinganthesizesoftheselargeblocks.Inthispaper,wepresentindividualentity.afilteringalgorithmtopost-processlargeblockstoInmanydataminingprojectsitisnecessarytoenhancetheblockingefficiency.collateinformationaboutanentityfrommorethanExper
7、imentalresultsshowthatourfilteringalgo-onedatasource.Ifauniqueidentifierorkeyoftherithmcanreducethenumberofrecordpairsproducedentityofinterestisavailableinallofthedatasourcesbythestandardblockingmethodby88%onasmalltobelinked,conventional‘join’operation
8、scanbeusedreal-worlddataset.Thealgorithmalsoreducestheforrecordlinkage,whichassumeserror-freeidentifyingnumberofrecordpairsgeneratedbya3-passstandardfieldsandlinksrecordsthatexactlymatchontheseblockingmethodby50%onseveralsynthetictestdataidentifyingfie
此文档下载收益归作者所有
点击更多查看相关文章~~