欢迎来到天天文库
浏览记录
ID:16134049
大小:31.07 KB
页数:26页
时间:2018-08-08
《社交网络数据采集算法的设计(软件工程课程设计报告)》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、社交网络数据采集算法的设计(软件工程课程设计报告)软件工程课程设计社交网络数据收集算法的设计摘要随着互联网的发展,人们正处于一个信息爆炸的时代。社交网络数据信息量大、主题性强,具有巨大的数据挖掘价值,是互联网大数据的重要组成部分。一些社交平台如Twitter、新浪微博、人人网等,允许用户申请平台数据的采集权限,并提供了相应的API接口采集数据,通过注册社交平台、申请API授权、调用API方法等流程获取社交信息数据。但社交平台采集权限的申请比较严格,申请成功后对于数据的采集也有限制。因此,本文采用网络爬虫的方式,利用社交账户模拟登录社交平台,访问社交平台的网页信息,并在爬虫任务执
2、行完毕后,及时返回任务执行结果。相比于过去的信息匮乏,面对现阶段海量的信息数据,对信息的筛选和过滤成为了衡量一个系统好坏的重要指标。本文运用了爬虫和协同过滤算法对网络社交数据进行收集。关键词:软件工程;社交网络;爬虫;协同过滤算法目录摘要·······················································································-2-目录·················································································
3、······-3-课题研究的目的········································································-1-1.1课题研究背景································································-1-2优先抓取策略--PageRank·························································-2-2.1PageRank简介···································
4、·······························-2-2.2PageRank流程··································································-2-3爬虫····················································································-4-3.1爬虫介绍········································································-4-3.1.
5、1爬虫简介·····································································-4-3.1.2工作流程····································································-4-3.1.3抓取策略介绍······························································-5-3.2工具介绍····················································
6、····················-6-3.2.1Eclipse········································································-7-3.2.2Python语言·································································-7-3.2.3BeautifulSoup·······························································-7-3.3实现············
7、··································································-8-3.4运行结果········································································-9-4算法部分·············································································-10-4.1获取数据的三种途
此文档下载收益归作者所有