资源描述:
《Analyzing Data with Python Presentation.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SarahGuidoANALYZINGDATAWITH@sarah_guidoReonomyPYTHONOSCON2014ABOUTMEDatascientistatReonomyUniversityofMichigangraduateNYCPythonorganizerPyGothamorganizerABOUTTHISTALKBird’s-eyeoverview:notcomprehensiveexplanationofthesetools!Takedatafromstart-to-fi
2、nishPreprocessing:PandasAnalysis:scikit-learnAnalysis:nltkDatapipeline:MRjobVisualization:matplotlibWhatnext?WHYPYTHON?SomanytoolsPreprocessing,analysis,statistics,machinelearning,naturallanguageprocessing,networkanalysis,visualization,scalabilit
3、yCommunitysupport“Easy”languagetolearnBothascriptingandproduction-readylanguageFROMPOINTATOPOINT…X?Howtofindthebesttool(s)?The90/10ruleSimpleisbetterthancomplexWHYICHOSETHESETOOLSAvailableresourcesDocumentation,tutorials,books,videosEaseofuse(wi
4、thagrainofsalt)CommunitysupportandcontinuousdevelopmentWidelyusedPREPROCESSINGTheimportanceofdatapreprocessingAKAwrangling,munging,manipulating,andsoonPreprocessingisalsogettingtoknowyourdataMissingvalues?Categorical/continuous?Distribution?PANDAS
5、DataanalysisandmodelingSimilartoRandExcelEasy-to-usedatastructuresDataFrameDatawranglingtoolsMerging,pivoting,etcPANDASKeepeverythinginPythonCommunitysupport/resourcesUseforpreprocessingFileI/0,cleaning,manipulation,etcCombinablewithothermodule
6、sNumPy,SciPy,statsmodel,matplotlibPANDASFileI/OPANDASFindingmissingvaluesPANDASRemovingmissingvaluesPANDASPivotingPANDASOtherthingsStatisticalmethodsMerge/joinlikeSQLTimeseriesHassomevisualizationfunctionalityMACHINELEARNINGApplicationofalgori
7、thmsthatlearnfromexamplesRepresentationandgeneralizationUsefulineverydaylifeEspeciallyusefulindataanalysisMACHINELEARNINGSupervisedlearningClassificationandregressionUnsupervisedlearningClusteringanddimensionalityreductionSCIKIT-LEARNMachinelearn
8、ingmoduleOpen-sourceBuilt-indatasetsGoodresourcesforlearningSCIKIT-LEARNScikit-learn:yourdatahastobecontinuousHere’swhatoneobservation/labellookslike:SCIKIT-LEARNTransformcategoricalvalues/labelsSCIKIT-LEARNClassifi