欢迎来到天天文库
浏览记录
ID:54369611
大小:282.67 KB
页数:7页
时间:2020-04-30
《一种基于字词结合的汉字识别上下文处理新方法.pdf》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、第39卷第7期计算机研究与发展VoI.39,No.72002年7月JOURNALOFCOVPUTERRESEARCHANDDEVELOPVENTJuIy2002!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!一种基于字词结合的汉字识别上下文处理新方法李元祥!"丁晓青!吴佑寿!!(清华大学电子工程系智能技术与系统国家重点实验室北京100084)"(解放军理工大学气象学院南京211101)(Lyx@ocrserv.ee.tsinghua.edu.cn)摘要根据字、词
2、信息之间的互补性,提出一种字、词结合的上下文处理方法.在单字识别的基础上,首先利用前向-后向搜索算法在较大的候选集上进行基于字bigram模型的上下文处理,在提高文本识别率的同时可提高候选集的效率;然后在较小的候选集上进行基于词bigram模型的上下文处理.该方法在兼顾处理速度的同时,可有效地提高文本识别率.脱机手写体汉字文本(约6.6万字)识别中的实验表明:经字bigram模型处理,文本识别率由处理前的81.58%提高至94.50%,文本前10选累计正确率由94.33%提高到98.25%;再经词bigram模型处理,文本识别率进一步提高至9
3、5.75%.关键词汉字识别,语言模型,上下文处理,前向-后向搜索算法,候选集效率中图法分类号TP391ANOVELMETHODBASEDONINTEGRATINGCHARACTERSWITHWORDSFORCONTEXTUALPROCESSINGOFCHINESECHARACTERRECOGNITIONLIYuan-xiang!",DINGxiao-oing!,andWUYou-Shou!!(StateKeyLaboratoryofIntelligentTechnologyandSystems,DepartmentofElectronicEn
4、gineering,TsinghuaUniuersity,Beijing100084)"(MeteorologyCollege,PLAUniuersityofScienceandTechnology,Nanjing211101)AbstractAccordingtothecompIementaritybetweenChinesecharactersandChinesewords,anoveIcontextuaIprocessingmethodisputforward,whichintegratescharacter-basedIanguage
5、modeIwithword-basedIanguagemodeI.OnthebasisofisoIatedcharacterrecognition,character-basedbigrampost-processingusingforward-bacl-wardsearchisfirstexecutedonbigcandidatesets,whichimprovesboththerecognitionrateofdocument(RRD)andtheefficiencyofcandidatesets(theaccumuIatedrecogn
6、itionrateofthetoptencandidatesisgreatIyboost-ed).Then,word-basedbigrampost-processingisexecutedonsmaIIcandidatesetstofurtherimprovetheRRD.ThismethodeffectiveIyimprovestheRRDwhiIegivingattentiontotheprocessingspeedinthemeantime.Experi-mentaIresuItsonoff-IinehandwrittenChines
7、edocuments(about66000characters)demonstratetheeffectivenessofthenoveImethod:character-basedbigrampost-processingimprovestheRRDto94.50%from81.58%RRDbeforepost-processing,andtheaccumuIatedrecognitionrateofthetoptencandidatesboostsfrom94.33%to98.25%.The95.75%RRDisobtainedafter
8、word-basedbigrampost-processing.原稿收到日期:2001-05-15;修改稿收到日期:2002-03-14本课题得到国家自然科学基金(
此文档下载收益归作者所有