资源描述:
《ICTCLAS分词API---C++》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、ICTCLAS分词API---C++1字符编码enumeCodeType{CODE_TYPE_UNKNOWN,//typeunknown,systemwillautomaticallydetectCODE_TYPE_ASCII,//ASCIICODE_TYPE_GB,//GB2312,GBK,gb18030CODE_TYPE_UTF8,//UTF-8CODE_TYPE_BIG5//BIG5};Jni中定义为int型,分别对应如下:(0:编码未知,系统将会自动识别)(1:ASCII)(2:gb2312、GBK、gb18030)(3:UTF-8)(4:BIG5)2IC
2、TCLAS_InitboolICTCLAS_Init(constchar*pszInitDir=NULL);参数pszInitDir:初始化路径始化路径,应包含配置文件(Configure.xml)和词典目录(Data目录)以及授权文件(user.lic).如果这些文件及目录在系统运行当前目录下,此参数可以为null。示例:if(!ICTCLAS_Init()){printf("Initfails");return-1;}else{printf("ok");}3ICTCLAS_ExitboolICTCLAS_Exit();返回值成功返回true;否则返回f
3、alse。ICTCLAS_Exit();4ICTCLAS_ImportUserDictunsignedintICTCLAS_ImportUserDict(constchar*sFilename,eCodeTypeeCT)ReturnValueThenumberoflexicalentryimportedsuccessfullyParameterssFilename:TextfilenameforuserdictionaryeCT:CharacterencodingtypeRemarksTheICTCLAS_ImportUserDictfunctionworkspr
4、operlyonlyifICTCLAS_Initsucceeds.ThetextdictionaryfileforamtseeUser-definedLexicon.Youonlyneedtoinvokethefunctionwhileyouwanttomakesomechangeinyourcustomizedlexiconorfirstusethelexicon.Afteryouimportonceandmakenochangeagain,ICTCLASwillloadthelexiconautomaticallyifyousetUserDict"on"int
5、heconfigurefile.WhileyouturnUserDict"off",user-definedlexiconwouldnotbeapplied.unsignedintnItems=ICTCLAS_ImportUserDict("userdict.txt",CODE_TYPE_GB);5ICTCLAS_ParagraphProcessintICTCLAS_ParagraphProcess(constchar*sParagraph,intnPaLen,eCodeTypeeCt,intbPOStagged,char*sResult);ReturnValue
6、Returnthepointerofresultbufferandthelengthoftheresult.ParameterssParagraph:ThesourceparagraphnPaLen:ThelengthoftheparagrapheCodeType:ThecharactercodingtypeofthestringbPOStagged:JudgewhetherneedPOStagging,0fornotag;1fortagging;default:1.sResult:Theprocessingresultschar*sSentence="当西班牙人
7、捧起大力神杯时,西班牙首相萨帕特罗也激动不已";intnPaLen=strlen(sSentence);char*sRst=0;//用户自行分配空间,用于保存结果;sRst=(char*)malloc(nPaLen*6);//建议长度为字符串长度的6倍。intnRstLen=0;nRstLen=ICTCLAS_ParagraphProcess(sSentence,nPaLen,CODE_TYPE_GB,1,sRst);printf("Theresultis:%s",sRst);6ParagraphProcessAt_pstRstVecICTCLAS_Par
8、agrap