资源描述:
《MapReduce数据分析实战1》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、MapReduce数据分析实战–李立松一、单机测试单机测试与程序执行headtest.log
2、pythonmap.py
3、pythonred.py一、将文件上传到集群/bin/hadoopfs-copyFromLocaltest.log/hdfs/三、运行mapred/bin/hadoopjarcontrib/streaming/hadoop-streaming-0.20.203.0.jar-file/path/map.py-file/path/red.py-mapper/path/map.py-reducer/path/red.py-inputforma
4、tcom.hadoop.mapred.DeprecatedLzoTextInputFormat-input/path/test.log-output/path/hadoopjarhadoop-streaming-1.0.3.jar-numReduceTasks0-filerand.pl-mapperrand.pl-input/opt/test/in/-output/opt/chepai/实现distinct一、日志格式:{0E3AAC3B-E705-4915-9ED4-EB7B1E963590}{FB11E363-6D2B-40C6-A096-95D89
5、59CDB92}{06F7CAAB-E165-4F48-B32C-8DD1A8BA2562}{B17F6175-6D36-44D1-946F-D748C494648A}{06F7CAAB-E165-4F48-B32C-8DD1A8BA2562}{B17F6175-6D36-44D1-946F-D748C494648A}B11E363-6D2B-40C6-A096-95D8959CDB9217F6175-6D36-44D1-946F-D748C494648AE3AAC3B-E705-4915-9ED4-EB7B1E96359046F7CAAB-E165-4
6、F48-B32C-8DD1A8BA2562(distinctcount)--mepimportsysdebug=True#debug=falseifdebug:lzo=0else:lzo=1forlineinsys.stdin:try:flags=line[1+lzo:-2]#xxxstr=flags+'t'+'1'printstrexceptException,e:printe(distinct)--red#!/usr/bin/pythonimportsysres={}#声明字典forlineinsys.stdin:try:flags=line
7、[:-1].split('t')iflen(flags)!=2:continuefield_key=flags[0]ifres.has_key(field_key)==False:res[field_key]=[0]#对字典声明列表res[field_key][0]=1#某key的累计行数exceptException,e:passforkeyinres:printkey(count)--red#!/usr/bin/pythonimportsyslastuid=""num=1forlineinsys.stdin:uid,count=line[:-1].
8、split('t')iflastuid=="":lastuid=uidiflastuid!=uid:num+=1lastuid=uidprintnum(groupby)实现log:stat_date,version,ip20120206,2.12,192.168.1.1结果:20120201,2.16,192.168.1.120120201,2.12,192.168.1.1120120201,2.16,192.168.1.120120201,2.12,192.168.1.3120120201,2.15,192.168.1.120120201,2.12,
9、192.168.1.2120120201,2.12,192.168.1.320120201,2.16,192.168.1.1420120206,2.12,192.168.1.120120206,2.12,192.168.1.1420120201,2.16,192.168.1.120120207,2.12,192.168.1.1120120201,2.12,192.168.1.220120201,2.15,192.168.1.1120120207,2.12,192.168.1.120120206,2.12,192.168.1.320120201,2.16,
10、192.168.1.120120206,2.12,192.168.1.12012