资源描述:
《java-httpurlconnection爬虫程序-0913》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、Android总结孙沛林Java-HttpURLConnection抓取网络数据(2016-9-13)项目:JavaSpiderDemo环境:MyEclipse8.5导包第8页Android总结孙沛林源码JavaMyConn.javaimportjava.io.BufferedReader;importjava.io.InputStreamReader;importjava.net.*;importorg.jsoup.Jsoup;importorg.jsoup.nodes.Document;importorg.jsoup.nodes.Element;
2、importorg.jsoup.select.Elements;第8页Android总结孙沛林/***独立的解析一篇CSDN博客*提取出:标题,分类,内容*@author孙沛林**/publicclassMyConnextendsThread{publicMyConn(){}publicMyConn(Stringsurl){this.surl=surl;}privateStringsurl;//文章的网址@Overridepublicvoidrun(){getHTML();}//客户端的浏览器类型publicstaticString[]UserAge
3、nt={"Mozilla/5.0(jsoup)",//PC端的浏览器//以下都是手机端的浏览器"Mozilla/5.0(Linux;U;Android2.2;en-us;NexusOneBuild/FRF91)AppleWebKit/533.1(KHTML,likeGecko)Version/4.0MobileSafari/533.2","Mozilla/5.0(iPad;U;CPUOS3_2_2likeMacOSX;en-us)AppleWebKit/531.21.10(KHTML,likeGecko)Version/4.0.4Mobile/7B5
4、00Safari/531.21.11","Mozilla/5.0(SymbianOS/9.4;Series60/5.0NokiaN97-1/20.0.019;Profile/MIDP-2.1Configuration/CLDC-1.1)AppleWebKit/525(KHTML,likeGecko)BrowserNG/7.1.18121",//http://blog.csdn.net/yjflinchong"Nokia5700AP23.01/SymbianOS/9.1Series60/3.0","UCWEB7.0.2.37/28/998","NOKI
5、A5700/UCWEB7.0.2.37/28/977","Openwave/UCWEB7.0.2.37/28/978","Mozilla/4.0(compatible;MSIE6.0;)Opera/UCWEB7.0.2.37/28/989"};/***根据一个网址,获取该页面的HTML源码*@paramsurl*/publicvoidgetHTML(){try{//1.创建URL对象第8页Android总结孙沛林URLurl=newURL(surl);//2.获取HttpUrlConnection对象HttpURLConnectionconn=(Ht
6、tpURLConnection)url.openConnection();//2.5伪装成浏览器conn.setRequestProperty("User-Agent",//浏览器的类型UserAgent[1]);//3.获取服务器的响应码intresponseCode=conn.getResponseCode();System.out.println("responseCode="+responseCode);//4.判断响应码是否正常,HttpURLConnection.HTTP_OK=200if(responseCode==HttpURLCon
7、nection.HTTP_OK){StringBuffersb=newStringBuffer();//HTML的容器StringreadLine;//每一行的临时存放//5.处理响应流,必须与服务器响应流输出的编码一致"UTF-8"BufferedReaderresponseReader=newBufferedReader(newInputStreamReader(conn.getInputStream(),"UTF-8"));//6.循环读取流中的行while((readLine=responseReader.readLine())!=null)
8、{sb.append(readLine).append("");}//while//7.关闭流respo