资源描述:
《基于大数据挖掘的tmt行业情绪指数的编写》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、上海交通大学MBA学位论文基于大数据挖掘的TMT行业情绪指数的编写基于大数据挖掘的TMT行业情绪指数的编写摘要互联网应用的渗透不断上升使得网上的信息量爆发性增长,与此同时,通过对于互联网上的文本挖掘,产生了不同于原有投资分析方法的新的框架。应用市场情绪指标的投资策略应运而生。本文首先分析对比了传统行业的分析框架以及新兴行业的分析框架,并对现有的市场情绪分析方法就行了梳理和总结。其次,分析和分析爬虫算法以及网页数据结构,设计了网络爬虫程序,对互联网上的用户讨论信息进行了抓取,为指数的构建提供了数据基础。再次,通过设定主题讨论数量、主流
2、媒体新闻报道量、资金流入占比以及板块内涨停版数目四个参数作为情绪指标的代理变量,对其进行处理和合成,构建了情绪指标的模型。并以此设计了投资策略。最后运用构建完成的指标,对智能家居以及3D打印两个TMT行业的板块进行了实证分析。验证了情绪指标在投资决策过程中的实践意义。关键词:文本挖掘,量化投资,网络爬虫,情绪指标II··上海交通大学MBA学位论文基于大数据挖掘的TMT行业情绪指数的编写RESEARCHOFSENTIMENTINDICATORINTMT
INDUSTRYBASEDONBIGDATAABSTRACTAsthepenetr
3、ationrateoftheInternetcontinuouslygrowsup,InformationgeneratedontheInternetincreasesexplosively.Atthesametime,thetextminingtechnologyprovideusabrandnewmethodofinvestment,whichlargelydiffersfromtheinvestmentframeworkweareusing.Withthistechnology,strategiesusingmarketsen
4、timentindexaredeveloped.First,thispapercomparedtheanalysisframeworkofthetraditionalindustriesandemergingindustries.Alsowesummarizedtheexistinganalysisframeworkofmarketsentiment.Secondly,weanalysisandresearchofwebcrawleralgorithmanddatastructureanddesignawebcrawlerprogr
5、amcapturetheinternetdata,whichprovidesthebasicdataforindicatorconstruction.Thirdly,weprocessfourproxyvariable:quantityofUGCdiscussion,thenumberofnewsreports,capitalinflowsofthesectorandthenumberofstocksthattriggeredtheraisinglimitandcombinethemtothenewsentimentindicato
6、r.Basedonthisindicator,wedesignedaninvestmentstrategy.Finally,weusethesentimentindicatoroftwoTMTindustries,smarthomesectorand3Dprintingsector,toverifythepracticaleffect.Theempiricalresultprovetheeffectivenessofthesentimentindicator.KEYWORDS:textmining,quantitativeinves
7、tment,webcrawler,sentimentindicatorIII··上海交通大学MBA学位论文基于大数据挖掘的TMT行业情绪指数的编写目录第一章绪论····················································································21.1分析背景和意义········································································21.2分析的内容与方法··········
8、···························································21.3文章结构··················································