《Clustering and connectivity problems in complex networks.pdf》由会员上传分享,免费在线阅读,更多相关内容在应用文档-天天文库。
ThePennsylvaniaStateUniversityTheGraduateSchoolCLUSTERINGANDCONNECTIVITYPROBLEMSINCOMPLEXNETWORKSADissertationinIndustrialEngineeringandOperationsResearchbyUshanandiniRaghavan°c2008UshanandiniRaghavanSubmittedinPartialFul¯llmentoftheRequirementsfortheDegreeofDoctorofPhilosophyAugust2008 333611333361132008 ThedissertationofUshanandiniRaghavanwasreviewedandapproved¤bythefollowing:SoundarKumaraAllenE.Pearce/AllenM.PearceProfessorofIndustrialEngineeringThesisCo-Advisor,Co-ChairofCommitteeR¶ekaAlbertAssociateProfessorofPhysicsThesisCo-Advisor,Co-ChairofCommitteeDennisLinDistinguishedProfessorofStatisticsandSupplyChainManagementTaoYaoAssistantProfessorofIndustrialEngineeringArvindRangaswamyJonasH.AnchelProfessorofMarketingRichardKoubekPeter&AngelaDalPezzoProfessorofIndustrialEngineeringHeadoftheDepartmentofIndustrialEngineering¤Signaturesareon¯leintheGraduateSchool. AbstractNetworkedsystemspervadethisworld.Fromsocialrelationshipsbetweenpeopletochemicalinteractionsbetweenbio-moleculestowiredorwirelesscommunicationsbetweentechnologicaldevices,networksplayarole.Inrecentyearstherehasbeenawidespreadinterestinthestudyofthestructuralorganizationinsuchcomplexnetworks.Thefocushereisontheorganizationoflinks(whoisconnectedtowhom)inthenetworkanditse®ectsondynamicsandbehaviorofthenetworkedsystem.Traditionallycomplexnetworksweremodeledasrandomgraphsthatwerein-troducedbyPaulErd}osandAlfr¶edR¶enyi(E-Rrandomgraphs)[1].Herepairsofnodesinanetworkarelinkedwithauniformprobabilityp.Howeverinthere-centyearsempiricalobservationsfromawiderangeoflarge-scalenetworkssuchasthemovieactorcollaborationnetwork,scienti¯cco-authorshipnetworks,protein-proteininteractionmapsofcellsandorganisms,neuralnetworks,theWorldWideWeb(WWW)andothersrevealedpropertiesthatdeviatefromthepropertiesofE-Rrandomgraphs.Forexample,manyreal-worldnetworkshaveapower-lawde-greedistribution,highdegreeoflocalorder,assortativeordis-assortativemixingpatternsandtightlyinterlinkedgroupscalledclustersasopposedtoE-RrandomgraphsthathaveaPoissondegreedistribution,noparticularlocalorderormix-ingpatternsorclusteredstructures.Thisimpliesthatinmanysocial,biologicalandtechnologicalnetworkstheorganizationoflinks(whoisconnectedtowhom)followspeci¯cprinciples(whichisnotrandom)leadingtotheobservednetworkproperties.Alotofresearchhasbeendoneinrecentyearstounderstandvari-ousstructuralpropertiesofcomplexnetworksandinidentifyingtheorganizingprinciplesde¯ningthem.Inthisthesiswefocusontwoproblemsofstructuralorganizationinnetworks,namelyclusteringandconnectivity.Thepresenceofclustershasbeenobservedinmanyreal-worldnetworks(forexample,thepresenceofgroupsorcommunitiesinsocialnetworks,functionalmod-iii ulesinbiologicalnetworksandwebpagesonsimilartopicsintheWWW)Therehasbeenalotofe®ortrecentlyinde¯ning,identifyingandextractingtheseclus-tersfromcomplexnetworkstobetterunderstandtheirstructuralorganization.Toidentifyclusteredstructuresinnetworkswedevelopanalgorithmthatusesaconsensusformationmechanismtoextractdenselyinterlinkedgroups.Toourknowledgesuchamechanismhasnotbeenpreviouslyusedtoidentifyclustersinnetworksandtherebycontributingtoanotherwayofde¯ningthem.Oneofthemainadvantagesofthisalgorithmisitsabilitytoruninanear-lineartimeandhenceitsapplicabilitytoverylargenetworks.Alsounlikemostoftheexistingalgorithmswedonotrequireanyaprioriinformationsuchasthenumberandsizesoftheclusterspresentinthenetwork.Byapplyingthealgorithmonac-torcollaboration,scienti¯cco-authorship,protein-proteininteractionandWorldWideWebnetworkswehighlightthatthereexistsmultiplesigni¯cantclusteredstructuresrevealingtherichnessoftheorderspresentinthem.Further,thisresultsupportsthepresenceofoverlappingclustersinsocialandbiologicalnetworks[2].Aconnectednetworkisoneinwhichitispossibletotravelfromanynodetoanyothernodeviathelinks.Fortheconnectivityproblemweconsideraclassofnetworksthatinitiallycontainnolinks.Ourgoalistocreateandorganizelinksinthenetwork,whichislimitedbycertainconstraints,toobtainnetworkconnectivity.WesolvetheconnectivityprobleminthedomainofWirelessSensorNetworks(WSNs).Thatis,theconstraintsthatcurbthecreationoflinksaregivenbythephysicallimitsoftinywirelesssensordevices.InWSNsthenodesaresensordevicesandthelinksarethewirelesscommunicationsbetweenthesedevices.Theconstrainthereisthatthedegreeofanynode(numberofcommunicationsofanode)shouldbestrictlyboundedandcannotincreasewithanincreasingnetworksize(N).Unlikepreviousresultsthatinsistedthatthesmallestdegreerequired,toobtainconnectivity,isafunctionofNandincreaseswithN,weshowthatitispossibletoputaboundonthedegree.Inspeci¯c,weshowthatifwecantoleratecertainnodesbeingdisconnectedtheneverynodeisrequiredtocommunicateonlytoits¯veclosestneighbors(organizationprinciple)toobtainagiantconnectedcomponentinanetworkofanysizeN.iv TableofContentsListofFiguresviiiListofTablesxAcknowledgmentsxiChapter1Introduction11.1Networks.................................31.1.1Naturalnetworks........................31.1.2Engineerednetworks......................51.1.3Sizeandcomplexity.......................61.1.4Scienti¯candEngineeringinterests..............81.2Researchobjectivesandcontributions.................91.2.1Problem1:Findingclusteredstructuresinnaturalcomplexnetworks.............................91.2.1.1Approachandcontributions.............111.2.1.2Equivalenceofclustersandcommunities......131.2.2Problem2:ConnectivityofWirelesssensornetworks....131.2.2.1Approachandcontributions.............151.3Thesisorganization...........................16Chapter2Propertiesandmodelsofcomplexnetworks-Anoverview182.1Networks:De¯nition..........................192.2Propertiesofnaturalcomplexnetworks................202.2.1Averageshortestpathlength..................21v 2.2.1.1Connectedcomponents................222.2.2Localorder...........................232.2.3Degreedistribution.......................242.2.4Mixingpatterns.........................262.2.5Clusters.............................272.3Modelsofnaturalcomplexnetworks.................292.3.1Randomgraphs.........................292.3.1.1Transitionprobabilitiesandgiantconnectedcom-ponents........................302.3.1.2Non-uniformrandomgraphs.............332.3.2Small-worldnetworks......................342.3.3Evolutionarymodelsofscale-freenetworks..........352.4Dynamicalprocessesonnaturalnetworks...............362.4.1Networkresilience........................362.4.2Spreadingprocessanddi®usion................392.4.3Searchandinformationretrieval................402.5Clusteredstructures..........................412.6Wirelesssensornetworks........................412.6.1Powere±ciency,reducedinterferenceandconnectivity...432.6.2Spatialdistributionofsensornodes..............452.6.3Modelsofwirelesssensornetworks..............462.6.3.1PoissonBooleanmodel................462.6.3.2Poissonrandomconnectionmodel..........472.6.3.3Nearestk-neighborsmodel..............472.7Summary................................48Chapter3Findingclusteredstructuresinnaturalcomplexnetworks503.1De¯nitionsandpreviouswork.....................513.2Communitydetectionusinglabelpropagation............573.2.1Multipleclusteredstructures..................623.2.2Robustnessofthesolutions..................633.2.3Aggregate............................663.3Validationoftheclusterdetectionalgorithm.............683.4Timecomplexity............................693.5Summaryanddiscussion........................69Chapter4ConnectivityofWSNs734.1BackgroundontheconnectivityprobleminWSNs..........73vi 4.1.1Criticaltransmissionradius..................744.1.2Criticalnumberofneighbors..................744.1.3Giantcomponentsandconnectivity..............754.2Backgroundontopologycontrolprotocols..............774.3ConnectivityofWSNs.........................784.3.1Criticalnumberofneighbors..................794.3.2Simulationresults........................814.3.3Energyconsumptioninthenearestneighborsmodel.....814.3.4Stronglyconnectedcomponents................834.4Summaryanddiscussion........................83Chapter5Conclusionsandfuturework855.1ProblemI:Findingclusteredstructuresinnaturalnetworks.....865.1.1Resultsandcontributions...................865.1.2Futurework...........................875.1.2.1Networkmodeling..................875.1.2.2Opinionformationinnetworks...........885.1.2.3Clustersandusergroupsinonlinesocialnetworks.895.1.2.4Citationnetworksandimpactfactors........915.1.2.5ClustersinCallgraphs................925.2ProblemII:ConnectivityofWSNs..................955.2.1Resultsandcontributions...................955.2.2Futurework...........................965.2.2.1WSNarchitectureandclustering..........965.3Closingremarks.............................97Bibliography109vii ListofFigures1.1Networktopologies...........................71.2Clusteredstructureinafriendshipnetwork..............101.3Identifyingclustersandclusteredstructuresinagivennetwork...111.4Connectivityandnetworkconstruction................141.5ApictorialillustrationofWSNs....................152.1Typesofnetworks............................202.2EvolutionofE-Rrandomgraphs...................212.3DegreedistributioninE-Rrandomgraphs..............242.4Degreedistributioninnaturalnetworks................252.5GiantconnectedcomponentsinE-Rrandomgraphs.........322.6Small-worldnetworks..........................352.7Networkresiliencetorandomandtargetedattacks..........382.8Componentsofasensornode.....................422.9TopologycontrolinWSNs.......................443.1Anillustrationofadendrogram....................533.2Consensusformationindenselyconnectednetworks.........573.3Oscillationsinbi-partitegraphs....................583.4Clusteredstructuresinafriendshipnetwork.............603.5ClusteredstructuresinaUScollegefootballnetwork........613.6Similaritiesbetweenmultiplesolutions................643.7Robustnessoftheclusteredstructures.................653.8Aggregatinglabelsfromtwosolutions.................663.9Similaritiesbetweenaggregatesolutions................673.10Distributionofclustersizesinnaturalnetworks...........714.1GiantcomponentsinWSNs......................824.2GiantstronglyconnectedcomponentsinWSNs...........835.1RelationshipbetweenclustersandusergroupsintheYoutubesocialnetwork..................................91viii 5.2Sparselyinterconnectedstar-likeclusters...............935.3Star-likeclustersinGcall........................94ix ListofTables2.1Averagepathlengthsinnaturalnetworks...............222.2Localordercoe±cientsinnaturalnetworks..............232.3Mixingpatternsinnaturalnetworks..................262.4Modularityinnaturalnetworks....................28x AcknowledgmentsIamgreatlyindebtedtomanypeoplewhohavemademystayatPennStateanenrichingexperience.Ithankthesupportofmyco-advisorDr.SoundarKumarawhoseconstantencouragementandunderstandinghascarriedmethroughthegraduateschool.Ihavegreatlyadmiredtheideasthathebringstothetable,andatthesametimeIhaveenjoyedthefreedomheo®ershisstudentstopursuetheirownideas.Ialsothankthesupportofmyco-advisorDr.R¶ekaAlbert,whohasbeenanexemplaryresearcher.Herpatientanswerstomyignorantquestionsmademyresearche®ortscomfortable.IamalsogratefultothesupportofmycommitteemembersDr.DennisLin,Dr.ArvindRangaswamyandDr.TaoYaoandIhavegainedmuchfromtheirthoughtfulsuggestionsonmyresearchwork.Ialsogreatlyenjoyedandlearnedfrommywork/conversationswithDr.AmitNanavatiandDr.SougataMukherjeaofIBMIndiaResearchLabs.Thecommitment,integrityandcheerfulnessofstudentsinDr.Kumara'sLISQlabandinDr.Albert'sgroupatthePhysicsdepartmenthasalwaysleftmeinawe.Ifeelluckytohaveworkedbythesideofsuchwonderfulpeople,andthelessonslearntwillstickwithmeforlife.Yunho'sandSeogchan'scommitmenttowork,Hari'screativesolutionstoanyproblem(academicorotherwise),Nathan'sandSeokchan'senterprise,Changsoo'sandSeunKi'shardwork,thecommitmentanddisciplineshownbyClaire,Juilee,RanranandSongintheirresearch,abreathoffreshairandcommitmentfromthenewsetofstudents(Behedad,Rodrigo,Jie,JungWoon,Supreet,Prashanth,Keerati,Satama...)andthemanyfriendlychatswithAtishaandManinithatIenjoyedinthelabhaveallbeenagreatlearningexperience.IwouldliketothankShak,Anu,Malliga,PP,TolaniandHariformakingsurethattherewasneveradullmomentinmylifeatPennState.TheovernighttripstoNJ,themanytripstocinema5,premierandIP,playinghideandseekatBellefontinthenight,chain-cutatBryceJordonwiththefungultgang(Vissu,Sai,Potineni,...),animpromptuhikedownthevistapoint,campingatBaldEagle'spark,thedesi-at-homedancepartieswithAnu'staporisongcollections,Malliga'spassionxi fortennis(Ste±Grafinparticular),Shak'svigor,independenceandinabilitytounderstandthegameofcards,Tolani'stiramisuandHari'strustedfriendshiphavekeptmesaneandsafefarawayfrommyhomeinChennai,India.IamindebtedtomyfriendsVidhya,Padma,Daisy,Barghavie,SrilakshmiandSuganthifromthegoodolddaysofmystudentlifeatIIT-Madras.Iamextremelygratefultothemforopeninganeartomyproblemsandpatientlyencouragingmeevenintimesoftheirownstruggles.IhavedrawngreatstrengthsfromtheexemplarylivesofmyparentsT.S.RaghavanandPadmaRaghavan.Theirsimplicity,humilityandunwaveringdisciplinehasplayedabigpartinshapingmylifeatPennStateandforthisandmoreIamalwaysindebtedtothem.Iwouldalsoliketothankmylategrandmotherkuppu"patti,mybrotherSudarshan,mysister-in-lawKavithaandmylittlenieceNee-lashawhohaveshownloveanda®ectionandalwaysmanagedtokeepmeingoodspirits.Ialsogreatlyenjoyedtheenthusiasmandsupportshownbymyparents-in-lawMrs.&Mr.Parthasarathy.LastbutnottheleastIwouldliketothankmyhusbandVijaywhohasamongotherthingsstressedtheimportanceofwritingtheAcknowledgementsectiontothethesis.Thishasindeedbeenoneofthemostenjoyableandreminiscingpartofwritingthedissertation.Thankyou.xii Tomyparentsxiii Chapter1IntroductionWhateveryoudomayseeminsigni¯cant,butitismostimportantthatyoudoit."|MahatmaGandhiWhydoessomeinnovationcapturetheimaginationofasocietywhileothersdonot?Howdopeopleformopinionsandhowdoesconsensusemergeinanorgani-zation?Howtocapturetheopinionsandvotesofpeopleduringelectionyears?Whatarethefundamentalsofnatureandhowdocellsandorganismsevolveandsurvive?Whatmakesacell'sfunctionsrobustandadaptabletoitsenvironment?HowcanwemakeresourcesharingthroughtheInternetsecure?Inthisinfor-mationage,howdoweasusersquickly¯ndrelevantinformationfromtheWorldWideWeb?Howdoweguardtechnologicalinfrastructuresthatformthebackboneofourdaytodaybusiness,frommaliciousattacks?Howcanwesenseandpreventforest¯resatanearlystage?Howdoweputtousesensordevicestodetectforest¯res?Howcanweuseautonomoussensornodestomonitordangerousterrainsandlargechemicalplants?Theseareonlyafewquestionstheanswerstowhichwilla®ectthelivesofpeopleandthesocietyweliveinsigni¯cantly.Scienceandengineeringintheir 2overalle®orttoaddresstheseissueshavecreatedmanydi®erentavenuesofre-search;NetworkSciencebeingoneamongthem.Networkscienceisthestudyofsystemsmainlyusingtheirnetworkstructureortopology.Thenodes(vertices)andlinks(edges)ofsuchnetworksaretheentities(people,bio-molecules,web-pagesandsensordevices)andtheinteractions(friendships,chemicalreactions,hyperlinksandcommunicationsrespectively)betweenentitiesrespectively.Peoplehaveopinionsoftheirown,buttheyalsoshapeopinionsbyinteract-ingandexchangingviewswiththeirfriendsandneighbors.Sociologistshavelongunderstoodthatanindividual'sbehaviorissigni¯cantlya®ectedbytheirsocialin-teractions[3,4].Itisnowwidelybelievedthatbiologicalfunctionsofcellsandtherobustnessofcellularprocessesariseduetotheinteractionsthatexistbetweenthecomponentsofvariouscells[5].Webpageswithcontentsandinformationrelatetootherwebpagesbymeansofhyperlinkscreatingacomplexweblikestructure;theWWW.Miniaturizedwirelesssensornodes,whichindividuallyhavelimitedcapa-bilities,achieveanoverallsensingtaskbycommunicatingandsharinginformationwithothernodes[6,7].Avastamountofresearchinrecentyearshasshownthatorganizationoflinks(whoisconnectedtowhom)inanetworkandthetopologicalpropertiescarrysigni¯cantinformationaboutthebehaviorsofthesystemitrepresents[8,4].Fur-thermore,thetopologicalpropertieshaveahugeimpactontheperformancesofprocessessuchasinformationdi®usion,opinionformation,search,navigationandothers.Organizationoflinksinlarge-scalenaturalnetworkswasoriginallyconsideredtoberandom[8,4,1].Butempiricalobservationsintherecentpasthavere-vealedtopologicalpropertiesinawiderangeofsocial,biologicalandtechnologicalnetworksthatdeviatefromrandomness[9,8,4,10].Thatis,naturalnetworksthatappearinnatureandwhoseevolutionislargelyuncontrolled(self-organized)havespeci¯corganizingprinciplesleadingtovariouspropertiesorordersintheirtopology.Thisobservationhassparkedaninterestinthescienti¯cstudyofnet-worksandnetworkmodeling,includingthedesiretoengineerman-madesystemstomimicthebehaviorsofnature.Inthisthesiswecontributetothescienti¯candengineeringaspectsinthestudyofnetworks. 31.1NetworksAsexplainedabove,complexsystemsaremodeledasnetworkstounderstandandoptimizeprocessessuchasformationofopinions,resourcesharing,informationretrieval,robustnesstoperturbationsetc.Thefollowingaresomeoftheexamplesofsystemsandtheirnetworkrepresentations.1.1.1NaturalnetworksAnaturalnetworkisarepresentationofasystemthatispresentinnatureorhasevolvedoveraperiodoftimewithoutanycentralizedcontrol.Examplesinclude;1.Movieactorcollaborations:Thisnetworkconsistsofmovieactorsasnodesandedgesrepresenttheappearanceofpairsofactorsinthesamemovie.Itisagrowingnetworkthathadabout225,226nodesand13,738,786edgesin1998[11].Interestsinthisnetworkincludethestudyofsuccessfulcollaborations(whatkindofcastingmakesamoviesuccessful?)[12]andthefamousBaconnumberexperimenttostudyhowotheractorsarelinkedtoKevinBaconthroughtheircastingroles[13].2.Scienti¯cco-authorship:Inthisnetwork,thenodesarescientistsorre-searchersandanedgeexistsbetweenscientistsiftheyhavecollaboratedtogetherinwritingapaper.Newman[14,15,16]studiedscienti¯cco-authorshipnetworksfromfourdi®erentareasofresearch.Theinformationwasobtainedinanautomatedwayfromfourdi®erentdatabasesMEDLINE,PhysicsE-printarchive,SPIRESandNCSTRLthathasacollectionofallthepapersandtheirauthorsinareasofbiomed,physics,high-energyphysicsandcomputersciencerespectively.Oneofthesenetworks,formedfromtheMedlinedatabasefortheperiodfrom1961to2001,had1,520,251nodesand2,163,923edges.Developingmetricstoquantifythescienti¯cproductivityorcumulativeimpactofascientistgivenhis/hercollaborationsisoneproblemofinterestinco-authorshipnetworks[17,18].TheErd}osNumberproject,whichmotivatedtheBaconnumber,isapopularexperimentthatisusedinthestudyofoptimalco-authorshipstructuresofsuccessfulscientists[19]. 43.TheInternet:TheInternetisanetworkofcomputersanddevicesconnectedbywiredorwirelesslinks.ThestudyoftheInternetiscarriedoutattwodi®erentlevels,namely,attherouterlevelandatthelevelofautonomoussystems[8,20].Attherouterlevel,eachrouterisrepresentedasanodeandthephysicalconnectionsbetweenthemastheedgesinthenetwork.Intheautonomoussystemslevel,everydomain(InternetProviderSystem)isrepresentedasanodeandtheinter-domainconnectionsarerepresentedbytheedges.Thenumberofnodesattherouteranddomainlevelwere150,000in2000[20]and4000in1999[21]respectively.Theproblemofidentifyingandsharing¯lese±cientlyoverpeer-to-peernetworks(suchasGnutella[22])thatarebuiltovertheInternethasreceivedsigni¯cantattentioninrecentyears[23,24].4.WWW:TheWWWisanetworkofwebpageswherethehyperlinksbetweenthewebpagesarerepresentedbytheedgesinthenetwork.Itisagrowingnetworkthathadaboutonebillionnodesin1999[25]witharecentstudyestimatingthesizetobeabout11.5billioninJanuary2005[26].InformationretrievalfromWWWisaproblemofimmenseinterest.AlgorithmssuchasPageRank[27]ortheonesproposedbyKleinbergin[28],usethenetworkstructuretoextractwebpagesintheorderofrelevancetouserrequests.5.Neuralnetworks:Herethenodesareneuronsandanedgeconnectstwoneuronsifthereisachemicalorelectricalsynapsebetweenthem.WattsandStrogatz[8,11]studiedtopologicalpropertiesoftheneuralnetworkofnematodewormC.elegansconsistingof282neuronswithpairsofneuronsconnectedbythepresenceofeitherasynapseoragapjunction.Studyofneuralnetworksisimportantforunderstandinghowthebrainstoresandprocessesinformation[10].Whilewecanobservethatthisisdoneinanoptimalandrobustwayinneuralnetworkswearestillatlossinquantifyingthismechanism[10].6.Cellularnetworks:Herethesubstratesormoleculesthatconstituteacellarerepresentedasnodesandthepresenceofbio-chemicalinteractionsorregulatoryrelationshipsbetweenthemoleculesarerepresentedasedges[8].Amongothers,theinteractionsbetweenproteinmoleculesareimportantfor 5manybiologicalfunctions[29,5].Jeongetal.[5]havestudiedthetopologyofprotein-proteininteractionmapoftheyeastS.cerevisiathatconsistsof1870nodesasproteinsandconnectedby2240identi¯edinteractions.Usingthenetworkstructuretopredictpossible(previouslyunidenti¯ed)interac-tionsbetweenproteinmoleculeshasreceivedwidespreadattentionfromresearchers[30,31].1.1.2EngineerednetworksEngineerednetworksarethoseinwhichthenodesofthenetworkfollowapre-speci¯edsetofprotocolsbywhichthelinksareformed.Whetherthecontroliscentralizedorde-centralized,theorganizationisengineeredtoachievedesiredtopologicalproperties.Someexamplesfollow.1.Agent-basedsupplychainnetworks:Heresoftwareagentsthatareresponsibleforfunctionssuchassupplier,manufacturer,distributorandretailerarethenodesandthedirect°owofinformation/tasks/commoditiesbetweenentitiesarerepresentedbytheedgesinthenetwork.Thadakamallaetal[32]studiedthetopologicalpropertiesofamilitarysupply-chain(with10,000nodes[33])andproposedmechanismsbywhichthenodescanorganizeunderfunctionalconstraintstoproviderbetterperformances.2.WSNs:Herethenodesrepresentminiaturizedwirelesssensordevicesthatconsistofashort-rangedradiotranceiverandlimitedcomputationalcapa-bilities[7,6].Thoughindividualsensorshavelimitedcapacities,thetruevalueofthesystemisachievedbysharingresponsibilitiesandinformationthroughacommunicationinfrastructure[7].ThusanedgeinaWSNrepre-sentsthepresenceofacommunicationbetweentwonodes.ThenumberofnodesinaWSNcanvaryanywherebetweenafewhundredsorthousandstoevenmillionsdependingontheapplicationscenario.Thesensornodeswhendeployedinasensingregionwillself-organizetoestablishacommunicationtopology.Thereisconsiderableinterestindevelopingtopologycontrolpro-tocolsthatwillguidethisorganizationprocesstosupporttheglobalsensingtasks[6,34,35]. 61.1.3SizeandcomplexityAswecanseefromtheaboveexamples,thesizeofmanynetworksislarge-scale.Itisthusdi±culttocharacterizeanetworkbythestatesofindividualentities.Forexample,itiscumbersometorepresentaWWWasasetof11.5billionnodeswhere25nodeshave15hyperlinkseach,32nodeshave7hyperlinkseachandsoon.Insteadwerepresentthenetworkbyitsstatisticalproperties,thatis,theWWWisanetworkwith11.5billionnodesandadegree(numberoflinksoredgesincidentonanode)distributionP(k).P(k)istheprobabilitydistributionforadegreekonanygivennodeinthenetwork.Figure1.1showsexamplesoftwodi®erentnetworks,bothwith100nodesandapproximatelythesamenumberofedges,thatarecharacterizedbytwodi®erentdistributions,namelyPoisson(P(k)»e¡¸¸kwithparameter¸=2:5)andPower-law(P(k)»k¡°withparameter°=3).Aswecansee,aPoissonnetworklookstopologicallydi®erentfromapower-lawnetwork.InaPoissonnetworkalmostallnodeshaveadegreeclosetothemean(whichis2.5in¯gure1.1(a)),whileapower-lawnetworkontheotherhandischaracterizedbyafewnodeshavingaverylargedegreeandtherestofthenodesinthenetworkhavingarelativelysmalldegree.Similarlyonecancharacterizelarge-scalenetworksbymanyotherstatisticalpropertiessuchasaveragepathlengths(whichisanaverageoftheshortestpathsbetweenallpairsofnodes),averagenumberoftriangles(whichisanaverageofthenumberoftrianglestowhichagivennodebelongs)andothers[8,10].Inadditiontolargesizes,mostnaturalnetworksdisplayemergentpropertiesorsignsoforderatthemacroscopiclevelthatcannotbeobservedwhenthefocusisrestrictedtothepropertiesoftheindividualentities.Forexample,acommonlyobservedpropertyisthehighlyheterogeneousright-skeweddegreedistributionfoundinawiderangeofsocial,biologicalandtechnologicalnetworks.Thatis,thewayinwhichthesenetworksevolveorself-organize,onlyafewnodesgrabmostoftheedges(andhencehaveverylargedegrees)whilealargenumberofnodeshaveonlyafewedgesincidentonthem(asmalldegree).LetusconsidertheWWWforinstance.Whenanewwebpageiscreated,onaspeci¯ctopic,itismorelikelytohavehyper-referencestoalreadypopularwebpagesonthesametopic.Andsuchadisposition,collectivelyfromallwebpages,willenablealreadypopularwebpagestoreceivemoreandmorehyper-referencesleadingtohighlevelsofheterogeneity 7Figure1.1.(a)and(b)aretwodi®erentnetworkson100nodeseachandbothwithapproximatelythesamenumberofedges.Thenetworkin(a)ischaracterizedbyaPoissondegreedistribution(P(k)»e¡2:52:5k),whilethenetworkin(b)ischaracterizedbyapower-lawdegreedistribution(P(k)»k¡3)inthedegreedistribution.Inmostcasesthisheterogeneityinnodedegreeiswellcapturedbyapower-lawdistribution,P(k)»k¡°,withtheparameter°usuallybetween2and3inmanynaturalnetworks[8].Anotherfeaturecommonlyobservedatthemacroscopiclevelisshortaveragepathlengthsorsmall-worldness.Small-world,acommonoccurrenceinsocialnetworks,impliesthatdespitetheirlargesizes,onanaverageonlyafewstepsarerequiredtotravelfromanynodetoanyothernodeusingtheedgesinthenetwork.Themovieactorcollaborationnetworkwithasizeof212,250nodesstudiedbyAlbertandBarabasi[36]hasanaveragepathlengthof4.54andapower-lawexponent°=2:3.ThemapoftheInternet 8attherouterlevelstudiedbyGovindan[21]has150,000nodeswithanaveragepathlengthof11and°=2:4.ThemetabolicnetworkoftheyeastE.ColistudiedbyJeongetal.,[5]has778nodeswithanaveragepathlengthof3.2and°=2:2.Seechapter2foramorethoroughreviewofvariousemergentpropertiesinnaturalnetworks.Amongmanyreasons,duetothepresenceofemergentpropertiesandtheself-organizingnatureofnaturalnetworks,theyaretermedascomplexnetworksintheNetworkScienceliterature[8,37].1.1.4Scienti¯candEngineeringinterestsInterestinthestudyofnaturalcomplexnetworkscanbebroadlyclassi¯edintotwoclasses,namelyscienti¯candengineering.Thescienti¯cinterestliesinun-derstandingthestructure,evolution,andpropertiesofnetworks,withaneventualgoalofengineeringmoree±cientprocessesonthesenetworks.Theengineeringinterest,ontheotherhand,liesindevelopingmoree±cientalgorithmsand¯ndingoptimalparameterstobettercontroltheprocessestakingplaceonsuchnetworks[8,10,4].Withanincreasingunderstandingonthestructuralorganizationleadingtoemergentproperties,arichliteratureofcomplexnetworkmodelsthatcanmimicsuchpropertieshasdeveloped[8,10,4,11].Thesenetworkmodelsthenformthebasisonwhichprocessessuchasdiseasepropagation,informationdi®usion,search,navigationandothersarestudiedandanalyzed.Someoftheinterestingquestionsthatcanbeansweredusingacombinationofbothaspectsofthisresearchinclude1)howtocontrolthespreadofdiseasesinalargeclassofpeopleinterconnectedbyphysicalcontacts,2)howtostudy,maintain,andcontrolthedi®usionofinforma-tioninWWW,and3)howtobetteridentifytargetsfordrugdiscoveryinmetabolicnetworks?Inparallelthereisalsoconsiderableinterestinengineeringnetworkssuchassupplychainsandminiaturizedwirelesssensors,where,bycontrollingtheinteractionsbetweenentitiesdesiredbehaviorsareachieved[32,6,38,39]. 91.2ResearchobjectivesandcontributionsInthisthesiswecontributetoboththeabovediscussedaspectsofinterestinthestudyofcomplexnetworks.Wesolvetwodi®erentproblems,wherein1)theobjectiveistodetectandidentifyclustersinnaturalcomplexnetworks,therebyaidingthescienti¯cstudyoftheorganizingprinciplesinnaturalnetworksandin2)theobjectiveistoengineertheconnectivityofaWSNbycontrollingthecommunicationsatindividualsensornodes.Researchintherecentpastfocusedonunderstandingtheevolutionandorgani-zationofnaturalnetworksandthee®ectofnetworktopologyonthedynamicsandbehaviorsofthesystem[8,4,10].Findingclusteredstructuresinnetworksisan-othersteptowardsunderstandingthecomplexsystemstheyrepresent.Clusteredstructuresariseinnaturalnetworksbecauseofthewaylinksareorganized.Inspe-ci¯c,itiscommonto¯ndgroupsofnodesbeingdenselyinterconnectedwithinthegroupandsparselyinterconnectedtootherpartsinthenetwork(see¯gure1.2).Inthisthesiswedevelopaformalmechanismbywhichonecanextractclustersfromcomplexnetworksinane±cientmanner.Wefurtheranalyzeandhighlightsomeoftheinterestingfeaturesoftheclusteredstructuresobtainedfromawiderangeoftestnetworks.Connectivity,thoughnotasurprisingemergentpropertyinnaturalnetworks,isneverthelessanimportantone.Inalmostallnaturalnetworksalargefractionofnodesformwhatistermedasthegiantconnectedcomponentorsimplygiantcomponent[8,1].Thatis,mostpairsofnodes(ifnotall)inthenetworkarecon-nectedeitherdirectlybyanedgeorthroughintermediarynodesformingchainsofedges.Hence,allauthors,whenconstructingnetworkmodelstorepresentnaturalnetworksgivecarefulconsiderationstoensurethepresenceofagiantcomponent[8,10].Usingthescienti¯cstudyandanalysisofnodeinteractionsleadingtogiantcomponentsinnetworks,weengineertheconnectivityofWSNs.1.2.1Problem1:FindingclusteredstructuresinnaturalcomplexnetworksAclusterinanetworkisagroupofsimilarnodesthatisdissimilarfromtherestofthenetwork.Itisusuallythoughtofasagroupwithinwhichthenodesare 10Figure1.2.Clusters(identi¯edbynodeswiththesamecolor)presentinafriendshipnetworkofmembersinaKarateclub.Herealinkconnectingtwonodesrepresentsfriendshipbetweenthetwomembers.Whentheclubsplitintotwofactionsduetoleadershipissues,eachmember(nodes)joinedoneofthetwofactions.Clustersinthisnetworkhighlightsthesetwofactions.denselyinter-connectedandsparselyconnectedtootherpartsofthenetwork(see¯gure1.3)[40,4,3].Thereisnouniversallyacceptedde¯nitionforacluster,butitiswellknownthatmostnaturalnetworksdisplayclusteredstructures(see¯gure1.2).Clusterdetectioninnaturalnetworkshasdiverseapplications.AclusterintheWWWnetworkindicatesasimilarityamongnodesinthegroup.Hence,ifweknowthecharacteristicsofthecontentsinasmallnumberofwebpages,thenitcanbeextrapolatedtootherwebpagesinthesamecluster.Clustersinsocialnetworkscanprovideinsightsaboutcommoncharacteristicsorbeliefsamongpeoplewithinacommunitythatmakesthemdi®erentfromotherclusters(orcommunities).Inbio-molecularinteractionnetworks,segregatingnodesintofunctionalmodulescanhelpidentifytherolesorfunctionsofindividualmolecules[41].Further,inmanylarge-scalenaturalnetworks,clusterscanhavedistinctpropertieswhicharelostintheircombinedanalysis[8]. 11Figure1.3.Anillustrationofclustersandclusteredstructuresinagivennetwork.Thegoalofaclusterdetectionalgorithmis,givenanetwork,topartitionthenodesintodenselyinterlinkedgroupscalledclusters.Suchapartitioniscalledtheclusteredstructureofthegivennetwork.1.2.1.1ApproachandcontributionsTherehasbeenconsiderablee®ortrecentlyinde¯ning,detectingandidentifyingclustersinnaturalnetworks[42,43,44,40,41,45,46,28,47,2,48].Aclusterdetectionalgorithmisexpectedtodothefollowing(see¯gure1.3):²Given:AnetworkG(N;M)onNnodesandMedges.²Output:1.Thenumberofclusters(c)presentinthenetwork2.Apartitioningofnodesintocclusters.Inthisthesisweinvestigateasimplelabelpropagationalgorithmthatusesthenetworkstructurealoneasitsguideandrequiresneitheroptimizationofapre-de¯nedobjectivefunctionnorpriorinformationabouttheclusters.Inouralgorithmeverynodeisinitializedwithauniquelabelandateverystepeachnodeadoptsthelabelthatmostofitsneighborscurrentlyhave.Inthisiterativeprocess 12denselyconnectedgroupsofnodesformaconsensusonauniquelabeltoformclusters.Uniquenessandcontributionsofthisclusterdetectionmethodareasfollows,1.Weproposeanovelwayofde¯ningandidentifyingclustersinnetworks.Inparticularweidentifyconsensusgroupsinagivennetworkastheclustersofthenetworkandtheclusteredstructureasthepartitionofnodesintoconsensusgroups.Toourknowledgesuchaconsensusformationmechanismhasnotbeenpreviouslyusedforclusteridenti¯cationandanalysis.2.Wevalidatethealgorithmbyapplyingittonetworkswhoseclusteredstruc-turesareknown.3.Wefurthershowthatthealgorithmidenti¯es,notjustone,butmultiplesigni¯cantclusteredstructuresinvarioussocial,biologicalandtechnologicalnetworks.Thisisunlikemostoftheexistingalgorithmsthat¯ndonlythesinglebestsolution(clusteredstructure)ofagivennetwork.4.Weshowthatallthemultipleclusteredstructuresidenti¯edbytheproposedalgorithmaresigni¯cantlymodularandrobusttoperturbations.Thisimpliesthatthesolutionsarenotanartifactofthealgorithmbutre°ectthetruenatureoftheclusteredstructurespresentinthenetworks,supportingtheexistenceofoverlappingclusters[2].5.Alsounlikemostoftheexistingalgorithmsourproposedalgorithmdoesnotrequireapriorinformationregardingthenumberandsizesoftheclus-terspresentinthenetwork.Inparticularwewillshowthattheclustersizedistributionsinsocial,biologicalandtechnologicalnetworksarehighlyhet-erogeneousandhaveapower-lawlikebehavior.Thispower-lawdistributionofclustersizesisingoodagreementwithpreviousresults[49,50].6.Wedemonstratethatourlabelpropagationalgorithmtakesanearlineartimeandhenceitiscomputationallylessexpensivethanmostofthealgo-rithmsdesignedtodate.Inspeci¯c,weusethisalgorithmto¯ndclusteredstructuresinverylargesocialnetworkssuchasthemovieactorcollaboration 13network(with374,511nodesand15,014,850edges)andanonlinesocialnet-workYoutube(with1,157,827nodesand2,990,443edges).Toourknowledgenootherclusterdetectionmethodhasbeensuccessfullyappliedonsuchlargenetworks.1.2.1.2EquivalenceofclustersandcommunitiesAclusterinanetwork,asde¯nedinthisthesis,istermedasacommunityintheNetworkScienceliterature.Thatis,thede¯nitionofaclusterandacommunityisoneandthesame.Thetermcommunityisusedtoavoidconfusionwithanothertermclusteringcoe±cientthatiscommonlyusedinthisareaofresearch[47].Theclusteringcoe±cientisameasureofcliquishnessde¯nedatanode[51].Supposeagivennodexisconnectedtokothernodesinthenetwork.Then,therecanexistk(k¡1)atmost(=K)possibleedgesbetweenallpairsofx'sneighbors.Andamong2thesesupposethatonlyEpairsofedgesexistinthenetwork.Then,theclusteringcoe±cientatnodexistheratioE.Thustheclusteringcoe±cientisameasureKofhowdenselytheneighborsofxareconnected.Acommunityontheotherhandrepresentsadenselyconnectedgroupofnodesthatcanbeofvaryingsizesandnotfocusedatthelocalneighborhoodofanindividualnode(see¯gure1.2).Inthisthesis,wewillusethetermclustertostandforadenselyconnectedgroupofnodesinagivennetworkthatcanbeofvaryingsize.Itisinthismannerthatthetermclusteriscommonlyandwidelyusedinthe¯eldofIndustrialEngineeringandOperationsResearch.Thetermclusteringcoe±cientasde¯nedinthepreviousparagraphwillbecalledaslocalordercoe±cientandsparinglyusedinthisthesis.1.2.2Problem2:ConnectivityofWirelesssensornetworksConnectivityandnetworkconstructionareimportantproblemsincomplexnet-worksandWSNsinparticular.Firstlyconnectivityisoneoftheparametersusedindeterminingtheperformanceofanetworktonavigationandinformationdi®u-sionamongothers.In¯gure1.4(a)wecanseethatitispossibletonavigatefromanynodetoanyothernodeusingtheedgesofthenetworkunlike¯gure1.4(b)and¯gure1.4(c).Thereforeinformationdi®usestotheentirenetworkin¯gure1.4(a)unlike¯gure1.4(b)and¯gure1.4(c). 14Figure1.4.Examplesofnetworkswithdi®erentconnectivitypropertiesandrobustness.Thenetworkin(a)isconnectedandhenceaMsgoriginatingatanynodecanreachanyothernodeinthenetwork.Whereasin(b)and(c)aMsgcanonlydi®usetoapartofthenetworkindicatedbydashedcircles.Removalofnodexinthenetworkin(b)leadstotheremovalofalledgesincidentonx.Asaresultthenetworkbreaksdownintomanydisconnectedcomponents.Howeverforthenetworkin(c)failureofanynodewillstillmaintaintheconnectivityintherestofthenetwork.Secondly,whennodexfails,thenetworkstructurein¯gure1.4(b)breaksdownintomanydisconnectedcomponentsandlosesitsconnectivityproperty.Whereasin¯gure1.4(c)thefailureofanynodewillcontinuetomaintaintheconnectivityintherestofthenetwork.Similarlyweveryoftencomeacrossproblemswhereweneedtodeterminewhichnetworkstructuresarerobustandwhicharevulnerabletofailures[8,32].Whichtopologyisbetter?WhichnetworkisbestsuitedforWSNoperationsandcommunications?[6].Toengineersuchoptimalnetworktopologies,networkconstructionhasemergedasanimportanttool.WewillsolvetheproblemofengineeringnetworkconnectivityinthedomainofWSNs.Connectivityisoneofthemostbasicandstandardrequirementsforacommunicationtopology.AconnectedWSNconsistsofcommunicationpathsconnectinganypairofnodesinthenetwork,eitherdirectlyorinmultiplehops.Thesensornodesarespatiallydistributedandagivensensorcancommunicate(havelinks)tonodesthatarewithinitscommunicationrange(see¯gure1.5).While 15Figure1.5.Apictorialrepresentationofspatiallydistributedsensornodesandtheircommunicationlinks.Eachnodecancontrolitscommunicationradiusorrangeandhencehaslinkstoonlythosenodeswiththerange.connectivityistheobjective,thedesignofsensornodesposeconstraintsthatcurbthecreationofcommunicationlinks[6].Twosuchconstraintsarelimitedpowerandreducedinterference.Theseconstraintsnecessitatethatthenumberofcommunicationlinks(orde-gree)ateachnodetobeminimal[38,6].1.2.2.1ApproachandcontributionsGivenasetofNsensornodes,weposetheconnectivityproblemasfollows:²Objective:minfmaxidig²subjectto:Networkconnectivitywherediisthedegreeonnodei.Inthisthesisweassumethesensornodesareuniformlyrandomlydistributedinatwodimensionalunitsquareandeverynodecanadjustitstransmissionrange(CISCOIEEE802.11a/b/gwirelesscardscanvarytheirtransmitpowerbetween1mWto100mW)toaccommodateonlytherequireddegree(see¯gure1.5).Weshowthatifwecantoleratesomenodesbeingdisconnected,itispossibletogetverye±cientboundsonthedegreesofthenodes. 16Thatis,everynodeisrequiredtocommunicatewithjust¯veclosestsensorstoobtainagiantconnectedcomponentintheWSN.BysolvingtheoptimizationproblemwecontributetotheliteratureonWSNsinparticularandtoNetworkScienceingeneral.Ourcontributionsareasfollows,1.Thevalueoftheoptimaldegree(=5)hassigni¯cantimplicationsinthedesignoftheWSNnetworktopologyandprotocols.Unlikepreviousresultswhichinsistedthattheoptimaldegreeisafunctionofthenumberofnodes(N)andincreaseswithN[52,53,54,55,56,57],ourresultshowsthatwecaneasilyboundthedegreetoascalarvalue5whichisindependentofN.2.Weachievethisoptimalboundeddegreebyweakeningtheconnectivityre-quirement.Thatis,weconstructnetworksthathavegiantconnectedcom-ponentsandarenotrequiredtobeentirelyconnected.3.Thisimpliesthattheinterferenceatanynodeinthenetworkisboundedanddoesnotgrowwithincreasingnetworksizes[58,38,6].4.Oursisthe¯rstanalyticalproofontheexistenceofaboundednodedegreetomaintainconnectivityinWSNs.5.Further,protocolsdesignedforWSNmaintenanceandcommunicationscanbestandardizedtoapplicationsofvaryingsizes.1.3ThesisorganizationTherestofthethesisisorganizedasfollows.Chapter2broadlydiscussestheconceptofnetworks,networkpropertiesandnetworkmodels.Webeginthischap-terwithade¯nitionofnetworksandtheircomponents.Thisisfollowedbyareviewofpropertiesofnaturalcomplexnetworksinsection2.2,modelsofnaturalnetworksinsection2.3andabriefreviewofvariousdynamicalprocessestakingplaceonthesenetworksinsection2.4.Intheendofthisdiscussionwereiteratethesigni¯canceof¯ndingclusteredstructuresinnaturalcomplexnetworks.Inthesecondhalfofchapter2wefocusspeci¯callyonWSNs.Insection2.6weintroducethecomponentsofsensornodesandhowWSNsareformed.Insection2.6.1we 17elaborateonpowere±ciencyandinterferencesinWSNsandhowthisa®ectstheproblemofconnectivity.Wefollowthiswithadetailedlookatsensornetworkmodelsinsections2.6.2and2.6.3andidentifywhichamongthesenetworkmodelswewillusetosolveourproblemofconnectivityinWSNs.Chapter3isdedicatedtoanin-depthanalysisof¯ndingclusteredstructuresinnaturalcomplexnetworks.Webegin(section3.1)withacomprehensivere-viewofpriorworkrelatedto¯ndingclusteredstructures.Insection3.2wegivedetailsoftheproposedlabelpropagationalgorithmandanalyzethequalityofresultsobtainedfromtestnetworks.Insection3.3wevalidatethelabelpropaga-tionalgorithmusingnetworkswheretheclustersarealreadyknown.Insection3.4weevaluatethealgorithm'stimecomplexity.Insection3.5wecomparetheperformanceandqualityofresultswithalgorithmsalreadyexistinginliterature.Chapter4isdedicatedtoanin-depthanalysisoftheconnectivityprobleminWSNs.Hereagainwebeginthechapter(section4.1)withacomprehensivereviewofpriorworkonnetworkconnectivityinWSNs.Insection4.2wereviewsomeoftheprotocolsusedinsensornetworkstomaintainitstopology/connectivity.Insection4.3wederivetheconditionsforoptimalconnectivityandsupportitwithsimulationresults.Wealsoevaluatethepowere±ciencyofoptimalnetworkanddiscusshowtheoptimalparameterschangeforstrongconnectivity.We¯nallyconcludeinchapter5withasummaryofresults,contributionsanddetailsofalistofproblemsidenti¯edforfutureresearch. Chapter2Propertiesandmodelsofcomplexnetworks-AnoverviewIwanttogoonlivingevenaftermydeath!AndthereforeIamgratefultoGodforgivingmethisgift,thispossibilityofdevelopingmyselfandofwriting,ofexpressingallthatisinme.Icanshakeo®everythingifIwrite;mysorrowsdisappear,mycourageisreborn."|AnneFrankManynaturalnetworksarealreadyorganizedeveniftheirorganizationisnotaprioriknownorspeci¯ed.Becauseoftheexistingorganization,thesenetworksdis-playspeci¯cproperties.Whilewecanobservethesepropertiesinnaturalnetworks,thescienti¯cinterestisinidentifyingtheprinciplesbehindsuchanorganization.Forthisidenti¯cation,weconsidervariousclassesofnetworkmodelsinwhichwealreadyknowtheprinciplesoforganizationandstudyhowwelltheirpropertiesmatchwiththoseofnaturalnetworks.Aswewilldiscussinthischapter,theorganizationoflinksbetweennodesinnaturalnetworkswasoriginallyconsideredtoberandom[8,4].However,recentempirical¯ndingsshownetworkfeatures(suchasaright-skewedheterogeneousdegreedistribution,small-worldness,highdegreeoflocalorder)thatdeviatefromrandomness[9,11,47].Thischapterbeginswithareviewofmanystatisticalpropertiesusedtoidentifyordersandcharacterizenetworks[8,4].Wewillthenreviewsomeofthenetworkmodelsandevaluatethembasedonhowwelltheir 19propertiesmatchwiththoseofnaturalnetworks.Thiswillbefollowedbyabriefoverviewofnetworkdynamicsandprocessesthatarestudiedonthesenetworkmodels.Intheendofthisdiscussionwewillclarifytheimportanceof¯ndingclusteredstructuresinnetworksandhowthey¯tintothescienti¯cstudyofnaturalcomplexsystems.Inthesecondpartofthischapter,wewillnarrowdownourfocustoWSNs.Here,unlikenaturalnetworks,theorganizationoflinkshastobeengineered.Thatis,ourprimarilygoalistohelpsensornodestoidentifytheircommunicationsinordertoachievecertainfundamentaltopologicalproperties,namelyconnectivity.Webeginthispartofthechapterwithabriefintroductiontosensornodesandtheircomponents.Wewillthenreviewtwofactors,namelypowerandinterferencethata®ectourchoiceofnetworkmodels.ThiswillbefollowedbyareviewofcurrentlyusedWSNnetworkmodelsandinspeci¯c,thenearestk-neighborsnetwork.Wewillfurtherdiscusshowweachieveourobjectivebyoptimizingtheparameterkinthismodel.Beforewebeginwiththedetailswewillbrie°yintroducethede¯nitionofanetworkanditscomponents.2.1Networks:De¯nitionAnetworkconsistsofasetofnodesVandasetofedgesE.Anedgee2Econnectsexactlytwonodesve;ue2V.ueisthensaidtobeaneighborofveandvice-versa.Ingeneralitisnotrequiredforve6=ueandinthecasewhenve=ue,theedgeeiscalledaloop.Also,therecanbetwoedgese1;e22E(ormore)thatconnectthesamepairsofnodesinV.Inthiscasethenetworkissaidtocontainmultipleedges(see¯gure2.1).Ifedgeshavedirectionsassociatedwiththem,thenthenetworkiscalledadirectednetwork.LetDbethesetofedgesinadirectednetwork,thenad2D,connectingud;vd2V,issaidtobeoriginatingfromudanddirectedtowardsvd(see¯gure2.1).Anetworkingeneralcanhavebothundirectedanddirectededgesalongwithmultipleedgesandloops.Inthisthesis,wewillonlyconsidernetworksthatdonotcontainmultipleedgesorloops.Inadditionallnetworksweconsiderwillhaveonlyundirected 20Figure2.1.Anillustrationofnetworkswithdi®erentkindsofedges.(a)representsanetworkthatcontainsdirected,undirectedandmultipleedgesandloops.Inthisthesisweassumeallnetworkstobeundirectedasdepictedin(b)unlessotherwisementioned.Theonlyotherkindofnetworksthatweconsiderisadirectednetworkasshownin(c).edgesunlessotherwisementioned.InafewcasessuchastheWWWonecanconsiderdirectionsassociatedwithedges,originatingfromawebpageanddirectedtowardsanotherwebpagethatisbeinghyper-referenced.Aswewillsee,directededgesformanintegralpartofWSNnetworkmodelsandwewillconsideradirectednetworkmodeltosolvetheconnectivityproblemonWSNs.2.2PropertiesofnaturalcomplexnetworksErd}osandR¶enyiproposedarandomgraphmodeltorepresentnetworksthathavenoapparentstructureorpatternintheirorganization(E-Rrandomgraphs)[8,1].Thatis,givenasetofNnodesandaparameterplyingbetween0and1,everypairofnodesinthenetworkisconnectedwithaprobabilityp(see¯gure2.2).Thismodel,proposedinthe1950swasconsideredagoodrepresentationofnaturalcomplexnetworksandmanyresultsexistthatcancapturethecharacteristicsandpropertiesofthismodelfordi®erentvaluesofp[1].However,withanincreasingavailabilityofdataonlarge-scalenetworkssuchasactorcollaborations,scienti¯cco-authorship,WWWandidenti¯edinteractionsbetweenbio-moleculesinvariousorganisms,empiricalevidenceshowordersinnet-worksthatdeviatefromrandomness[8,51].Inspeci¯c,themacroscopicpropertiesofnaturalnetworksshowsigni¯cantdeviationsfromthepropertiesofE-Rrandomgraphs. 21Figure2.2.Evolutionofrandomgraphsastheconnectionprobabilitypincreasesfrom0.Notethataspincreasesthenumberofedgesinthenetworkalsoincreases.Inthelastfewyearsmanystatisticalpropertieshavebeenusedtoidentifyordersandcharacterizecomplexnetworks.Thefollowingareafewsuchpropertiesthatareprominentlyusedintheliterature.2.2.1AverageshortestpathlengthGivenapairofnodesu;vinanetwork,apathexistsbetweenthem,ifwecan¯ndnodesu=x0;x1;x2;:::;xn¡1;xn=vsuchthatthereisanedgebetweenxiandxi+1,andxi6=xj8i6=j.Thelengthofsuchapathisn.Ashortestpathbetweenuandvisanypathwiththesmallestnamongallsuchpossiblepaths.Forexample,v3;v5;v4;v7isapathoflength3betweenv3andv7inthenetworkin¯gure2.1(b)andv3;v4;v7isthepathofshortestlength2betweenv3andv7.Theaverageshortestpathlengthofanetworkisde¯nedastheaverageofallsuchshortestpathlengthsbetweeneverypairnodesinthenetwork.Letd(u;v)denotethelengthoftheshortestpathbetweennodesuandv.Thentheaverageshortestpathlengthloftheentirenetworkisde¯nedas2Xl=d(u;v)N(N¡1)u6=v2VInthecasewhenthereisnopathbetweentwonodes(forexample,betweenv1andv3in¯gure2.1(b))theshortestpathlengthisconsideredtobe1.Anetworkissaidtobesmall-worldif,despiteitslargesize,theaverageshortestpathlengthisrelativelysmall[51].Thepopularnotionofsixdegreesofseparationinsocialacquaintancenetworksisaparticularinstanceofthesmall-worldconcept. 22Table2.1.ComparisonofaveragepathlengthslrealinnaturalnetworkswithlrandofE-Rrandomgraphsofthesamesizes.NetworkSize 5:1774log(N).Whileif'N<0:074log(N),thenetworkisalmostsurelydisconnectedasN!1.Thisimpliesthattomaintaintheconnectivityofanetworkthenumberofneighborscannotbeboundedasthenumberofnodesincreases.4.1.3GiantcomponentsandconnectivityTheproblemofconnectivityhasalsobeenwellresearchedfromthepointofviewofgiantcomponents[133,134,112,135,69,111,110,114,136].Thisisunlikealltheworksdescribedabovewhichrequireeverynodetobepresentinonecom-ponent,thatis,anentirelyconnectednetwork.TheworksbyMeesterandRoy[69],Penrose[111],Quantanilla[110,114]andothersisconcernedwiththekind 76ofgiantcomponentthatarisesbykeepingtheradiusr¯xedandvaryingtheden-sityofnodes¸(InallthesecasesthenodedistributionfollowsaPoissonpointprocessX).Theyshowthatasthedensityofthenodesincreases,foragivenr,thereexistsa¯nitethreshold¸cbeyondwhichauniquegiantcomponentalwaysexist.Similarly,inad-dimensionalunitcube,giventhenumberofnodesN,itispossibleto¯ndcriticalthresholdsNc(r)fortheappearanceofgiantcomponents[113].Furtherthecriticalexpectednumberofneighbors®ccanalsobecalculated[110,114,113]fortheappearanceofgiantconnectedcomponents.Inthiscasehow-ever,thenumberofnodesNis¯xedandtheaveragedegree®=r2NÁincreasesuntilagiantcomponentemerges.Theresultssofarsuggestthat®c¼4:51whend=2and®c¼2:7whend=3[113,110]inthe¯xedradiusmodels.Also,given1®cd+21N,thecriticaltransmissionradiusrc(N)=pf¡()gdfortheappearanceof¼N2pauniquegiantcomponent.Whend=2,r(N)issimply®c[113,136].c¼NItwasalsoshownthatthereexistsimilarcriticaldensitythresholds(¸c(g))fortherandomconnectionmodelswithconnectionfunctiong[69].Further,Franceschettietal[112]showedthatsquish-squashingaconnectionfunctiongintopanotherconnectionfunctionh=p:g(px)forsome0 k®thereexistsagiantcomponentalmostsurelyinthenearestk-neighborsmodel.Thenifkcexistsitmustbe·k®.Letrc(¸)bethecriticalradiusforconnectivityofa¯xedradiusnetworkwhosenodesaredistributedaccordingtoaPoissonpointprocessXofintensity¸inR2[113,136].Supposeeachnodeadjustsitstransmissionradiustoaccommodateadesirednumberofneighborsk.Thenadirectededgefromanodetowardsitskneighborsisformed.Letk®=inffkjinfi2Xfrijoutdegreeatallnodesinthenetwork=kg¸rc(¸)gk®isthenthesmallestksuchthatthesmallesttransmissionradiusrequiredatanynodetohavekoutgoingneighbors,isatleastrc(¸).Notethateventhoughkisindependentof¸,k®mightnotbe.Ifeachnodeadjustsitstransmissionradiustobeabletosendmessagestoatleastk®neighbors,therewillbeedgesinbothdirectionsbetweennodesthatarenomorethanadistancerc(¸)apart.Hencetherandomconnectionmodelwithaconnectionfunctiongde¯nedas8<1;8kxk·rc(¸)g(x)=:0;8kxk>rc(¸)becomesasubgraphofthenearestk®-neighborsnetwork.Thisisbecauseinnearestk®-neighborsnetworkeachnodehasatransmissionradiusofatleastrc(¸).Also,sincethe¯xedradiusnetworkhasagiantcomponent(bythede¯nitionofrc(¸))sodoesthenearestk®-neighborsnetwork.Therefore,foragivenX,and8k¸k®,thenearestk-neighborsmodelhasagiantconnectedcomponent.Todeterminethevalueofk®,weknowfrom[140]thatifWkistherandomvariableforthedistanceofthekthnearest-neighbor(k¸1)fromapointinX,thenk2k¡1¡¼¸w2theprobabilitydensityfunctionofWkisgivenby,f(wk)=2(¼¸)wkek=(k¡k211)!.ItimmediatelyfollowsthatE(Wk)=k(2k)!=(2k!)¸2.Inordertoobtaink®, 80letus¯rstcalculatetheP(Wk¸rc(¸)).Z122k¡¼¸w2dwkP(Wk¸rc(¸))=(¼¸wk)ek(4.1)rc(¸)(k¡1)!wkBychangingthevariableasx=¼¸w2,wegetkZ1xk¡1e¡xy=Xk¡1(¸¼r2(¸))ye¡¸¼rc2(¸)cP(Wk¸rc(¸))=dx=(4.2)¼¸rc2(¸)(k¡1)!y=0y!(Referto[65]fortherighthandsideequation.)Butwealsoknowthat,thecriticalaveragedegreeinthenetworkfora¯xedradiusmodeltopercolateisapproximately4.51[113,110].Whichmeansthatinordertoobtainanaveragedegreeof4.51weqqmustsetr(¸)=4:51.Notethatifwe¯xr=dinthe¯xedradiusnetworkc¼¸¼¸model,thentheexpecteddegreeonanodeinthenetworkisd.Substitutingforrc(¸)inequation(4.2),weget,y=Xk¡1y¡4:51(4:51)P(Wk¸rc(¸))=e(4.3)y!y=0andthisprobabilityisindependentof¸.Thusk®whichisnowthesmallestksuchthattheprobabilityinequation(4.3)is1isinfactindependentof¸.However,onlyask!1theaboveprobabilitytendsto1.Butweseethatevenforkaround10,thisprobabilityismorethan0.99andforkabout15itisarbitrarilycloseto1.Hencewecansafelyassumethatk=15yieldsanetworkinwhicheachnodehasatransmissionradiusofatleastrc(¸).Thusbyde¯nitionofrc(¸)thisnetworkwillhaveagiantconnectedcomponent.Requiringeverynodetohaveatleastatransmissionradiusofrc(¸)givesapessimisticestimateofkc.Ifontheotherhandwe¯ndthesmallestvalueforksuchthattheexpectedtransmissionradiionthenodesinthenetworkisatleastrc(¸),thenweneed,rk(2k)!4:51E(Wk)=1¸=rc(¸)(4.4)(2kk!)2¸2¼¸ 81andthisimpliesrk(2k)!4:51¸(4.5)(2kk!)2¼andweseethatk=5isthesmallestvalueforwhichtheaboveinequalityissatis¯ed(see¯gure4.1).Thisalsoimpliesthatk=4isthelargestvalueforwhichtheaboveinequalityisnotsatis¯ed.Henceforvaluesofkupto4,thenearestk-neighborsmodeldoesnothaveagiantcomponent(bythede¯nitionofrc(¸)).Furtherthisistrueirrespectiveofthedensityofnodesinthenetwork.Simulationresultsalsoagreethatfork=5andabovethenearestk-neighborsmodelofanydensity¸hasanunboundedconnectedcomponent.Whilefork<5thereexistsnogiantcomponents(see¯gure4.1).4.3.2SimulationresultsToverifytheaboveanalysisusingsimulation,we¯xedthesizeofthesensingregiontobeaunitsquare.We¯xedthesizeofthenetworktobeNandthenumberofneighborsaskineachsimulation.WevariedNfrom50to5000andkfrom3to6.Weran10experimentseachfora¯xedNandk.Theresultsfromtheseexperimentsareplottedasgraphsin¯gure4.1.Notethatfork=5approximately95%ofthenodesareinthegiantcomponent.Fork=4thesizeofthelargestcomponentscalessub-linearlywithN.Itwouldnotbepossibletoobtainboundednodedegreesifwechangetherequirementsforconnectivity.Thatis,insteadofgiantcomponentswelookforafullyconnectednetworkthenkshouldbeatleast5:1774log(N)[38,?].4.3.3EnergyconsumptioninthenearestneighborsmodelThesumoftransmissionradiusatallthenodesinaWSNisameasureoftheenergyconsumedinthenetwork.Inthissectionwewillshowthatforthesamenumberofneighbors,boththe¯xedradiusmodelandthenearestneighborsmodelconsumesapproximatelythesameamountofenergy.Toshowthis,weneedthefollowing.SupposewerestrictthePoissonpointprocessXofintensity¸inR2toa¯niteregion,sayaunitsquare.Then,letthenumberofnodesinthis¯niteregionbe 82Figure4.1.Foreach¯xedk,thegraphsshowhowthesizeofthelargestconnectedcomponentgrowsasthenumberofnodesNincreases.Notethatwhilethereexistsnogiantcomponentfork=3;4itappearssuddenlyfork=5.N.Thelengthofagraphisthesumofthelengthofitsedges.Henceforthekthnearestneighborgraphinwhicheachnodeisadjacenttoitskthclosestneighbor,thelengthLk;Nisgivenby[133],Xj=k1Lk;N1¡(j¡2)limN!1E(1)=1(4.6)N22¼2(j¡1)!j=1Thereforetheexpectedsumofthetransmissionradiionthesensornodesinthenearestk-neighborsmodelisthesameastheexpectedlengthofthekthnearestneighborgraph.Also,thesumofthetransmissionradii(qLr;N)ina¯xedradiusmodelisdNwheredisthedesiredconnectivity.Takingk=5andNsu±ciently¼N1largeinequation(4.6),weget,E(Lk;N)¼2:1809()2.Alsoforthesameconnec-¼N1tivity,thatistakingd=5,wehaveE(Lr;N)¼2:2361()2.Oncomparisonwe¼seethatfora¯xedNandk,Lr;N¼Lk;N. 83Figure4.2.Foreach¯xedk,thegraphsshowhowthesizeofthelargeststronglyconnectedcomponentgrowsasthenumberofnodesNincreases.Notethatwhilethereexistsnogiantcomponentforkupto3itsuddenlyappearsfork=4.4.3.4StronglyconnectedcomponentsInthenearestk-neighborsmodelweonlyconsideredthebi-directionaledgesandignoredthepresenceofunidirectionalones.Henceifwestudythenetworkfortheappearanceofgiantstronglyconnectedcomponents,thenthecriticalvalueforkis4(see¯gure4.2).Byastronglyconnectednetworkwemeanthatforanypairofnodesxandy,thereexistsadirectedpathbothfromxtoyandfromytox.4.4SummaryanddiscussionInthischapterweconsiderede±cientmaintenanceofconnectivityofawirelesssen-sornetwork.Inadditiontoenergye±cientvaluesforcriticaldegreeonthenodes,wealsoconsideredtheconstraintsandrequirementsfromtheviewoftopologycontrolprotocols.Wehaveshownthatwhennodesadjusttheirtransmissionradiustomaintaina¯xedout-degreek,then5isthecriticalthresholdbeyondwhichagiantcomponent 84existsalmostsurelyinthenetwork.Further,thisistrueirrespectiveofthenumberofnodesN.ThisimpliesthatthedegreesonthenodescanbeboundednomatterhowlargeNmaybe,enablingtheuseofalargenumberofwirelesssensornodestoformaWSNwithboundedinterferences.Wewillnowproceedtosummarizetheresultsandcontributionsofouroverallwork(clusteringandconnectivity)inthenextchapter. Chapter5ConclusionsandfutureworkWhenwetrytopickoutanythingbyitself,we¯ndithitchedtoeverythingelseintheuniverse."|JohnMuirInthisthesiswehavesolvedtwoproblemsonlarge-scalenetworks,clusteringandconnectivity.Inclusterdetection,givenasetofnodesandtheirinterconnections,theobjectivewastominethroughthenetworktoextractdenselyconnectedsub-groups.Inchapter3wehaveproposed,discussed,analyzedandveri¯edourclusterdetectionmethodonexamplesofnaturalnetworks.Fortheproblemofconnectivity,givenasetofnodes,wecreatedinterconnec-tionsbetweennodessothattheresultingnetworkisconnected,subjecttocertainconstraints.Inchapter4wherewediscussedtheproblemofconnectivity,withafocusonWSNs,weweakentheconnectivityrequirementtoobtaindesirablepa-rameters.Inspeci¯c,aboundednodedegreewasarequirementinWSNsandweshowedthatsuchaboundeddegreeinindeedobtainableforagiantconnectedcomponentinthenetwork.Inthischapterwehighlightsomeofthekeyresultsandcontributionsfromourresearchandalsodiscussdirectionsforfuturework. 865.1ProblemI:FindingclusteredstructuresinnaturalnetworksTheproposedmethodisaniterativelabelpropagationprocess.Hereeverynodeisinitializedwithauniquelabel.Ateachiteration,nodes,byobservingtheirneigh-borsadoptlabelsthatamaximumnumberofthemhave.Asaresultconsensusonauniquelabelemergesbetweendenselyconnectedgroupsofnodes.5.1.1Resultsandcontributions²Inthisthesiswehaveinvestigatedanewwayofidentifyingandde¯ningclustersasconsensusgroupsinnetworks.²Thetimecomplexityoftheproposedalgorithmisnearlinear(withrespecttothenumberofedgesM)andhenceitsapplicabilitytolargenetworks.Weshowedthatwecan¯ndclusteredstructuresfromverylargenetworkssuchasthemovieactorcollaborations(with374,511nodesand15,014,850edges)withamodularityindexQ¼0:5.Inaddition,wewillshowinsection5.1.2.3thatwewereabletosuccessfullyapplyouralgorithmto¯ndclusteredstructuresinalargeonlinesocialnetworkYoutube(1,157,827nodesand2,990,443edges)withthemodularityindexQ¼0:66.Furtherweshowhowonecanusetheseidenti¯edclustersinYoutubeandcomparethemwithUserGroupspresentinYoutube.Toourknowledge¯ndingandanalyzingclustersinsuchlargegraphswasnotpossiblebefore.²Thelabelpropagationprocessfallsundertheclassofrandomizedalgorithms.Thisisbecause,ifanode,whenobservingthelabelsonitsneighbors,¯ndsnoclearmajority(or¯ndsequalnumberofneighborsineachlabel),adoptsarandomlychosenlabel.²Consequently,di®erentrunsofthealgorithmonthesamenetworkyieldsdi®erentsolutions.²Inourtestnetworks(Friendshipnetwork,collegefootballnetwork,protein-proteininteraction,co-authorshipnetwork,WWWandmovieactorcollabo-rations),thealgorithmfoundmultiplepartitionsthatareallsimilarinnature. 87Inspeci¯c,weusedtheJaccard'sindextotestthesimilaritybetweentwosolutions/partitionsinanetworkandshowedthattheyhaveahighdegreeofsimilarity.²Inaddition,themodularitymeasureQwassigni¯cantlyhighwithatightrangeindicatingthesigni¯canceofthemultiplesolutionsfoundbythealgo-rithmonthesamenetwork.²Wefurthershowedthat,inourtestnetworks,thesolutionsfoundbythealgorithmremainsstrongtosmallperturbations.Thisimpliesthatthesesolutionsarenotanartifactofthealgorithm,butre°ectthenatureofthetrueclusterspresentinthesenetworks.²Multiplesolutionsinnaturalnetworksalsosupportsthepresenceofover-lappingclusters[2].Whilemostclusteringmethodsfocusonasinglebestsolution,wehaveshownthatinnaturalnetworksthereexistsmultiplesolu-tionsthataresigni¯cantlymodularandstrongagainstsmallperturbations.5.1.2Futurework5.1.2.1NetworkmodelingAsdiscussedinChapter2,naturalnetworksarealreadyorganized.Themannerofself-organizationinsocialorbiologicalnetworksortheWWWgivesrisetoaclusteredstructure.Understandingtheprinciplesbehindsuchanorganizationanditse®ectsondynamicalprocesssuchasdi®usion,searchandnavigation,functionalrequirementsetcisanunsolvedproblem[47].Theproblemliesintheinabilitytocharacterizeaself-organizationmechanismthatwillgiverisetoaclusteredstructureinlarge-scalenetworks.Forexample,wedonotyetknowwhycertainnetworkshaveamodularityindexQintherangeof0.4to0.5,whilesomehaveQintherangeof0.5to0.6andsoon.Researchinthisareaisstillinitsinfancyanditwillbeworthwhiletoidentifyhowaconsensusformationmechanismcancontributetonetworkmodelingprinciples. 885.1.2.2OpinionformationinnetworksOpinionformationisadynamicalprocesstakingplaceonanetworkandmorespeci¯callystudiedinsocialnetworks.Thegeneralideaofopiniondynamicsisthateachindividualcanhaveopinionsthatwillform/changebasedontheirinteractions[10].Inthesimplestform,anodecanhaveoneofthetwoopinions0or1andineachstepanodeadoptsanopinionofanyoneofitsneighbor.ThisiscalledastheVoter'smodel[10,141].Hencegivenanetworkandarandominitializationofopinions,theideaistostudyhowandwhenthe°uctuationsleadtoaconsensusofaopinionintheentirenetwork[141].Whileinitiallyopiniondynamicswasstudiedonregularnetworksorlattices,recentlythefocushasshiftedtowardsothernetworkmodelssuchassmall-worldandpower-lawnetworks.MorerecentlyWuandHuberman[141]studiedthevoter'smodelongeneralizedrandomgraphsofagivendegreedistribution[71].Theyshowedthatonecanobserveasocalledlocalitye®ectorane®ectinwhichopinionsareorganizedintosmallergroupswithouta®ectingthewholesocietyornetwork.Thelabelpropagationprocessandconsensusformationthatwehaveusedforourclusteringmethodissimilarinnature(thoughnotthesame)tothestudyofvoter'smodeldynamicsandopinionformationinnetworks.Unlikethecasewherethereareonlytwopossibleopinions,webeginwithacasewhereeverynodecanhaveitsownuniqueopinion.Alsounlikethevoter'smodel,weforceeachnodetoadopttheopinion(label)foundmostcommonlyamongitsneighbors.Wemakethefollowingobservations.²Opiniondynamicshasbeenwidelystudiedinvariouskindsofnetworkmod-els,includinginrandomizedpower-lawnetworks[141].However,noworksofarhaveaddressedopinionformationinnetworksthathasclusteredstruc-tures.Whilewedonotyethavenetworkmodelsthatcanmimicthecharac-teristicsofclustersinsocialnetworks,wecanneverthelessformtoymodels.Forexample,onecanconsideranetworkonNnodesandkclusters,witheachclustercontainingNnodes.IfwehaveMedges,thentheycanbekequallyandrandomlydistributedwithinthesekgroupsandaverysmallfractionoftheseMedgesinterconnectingnodesbetweengroups[47,66].²Conversely,opiniondynamicscanalsobeusedtoidentifyclustersinnet- 89works.Wehaveshownthroughourlabelpropagationprocessthatconsensus(oropinionformation)occursamongdenselyconnectedgroupsofnodesandhencee®ectivelyextractstheclusterspresentinnetworks.5.1.2.3ClustersandusergroupsinonlinesocialnetworksOnlinesocialnetworkingsitessuchasYoutube,Orkut,Flickr,FacebookandLive-JournalareamongthemostpopularsitesontheInternet.Thesesitesarearepositoryofsomeofthelargestdatasetsonsocialnetworks[142].IfweconsiderYoutubeforexample,anodeinthenetworkisaregistereduserofYoutubeandagivenusercanhaveregisteredlinks(friendships)withotheruserswithinYoutube.Oneofthemostinterestingfeatureofthesesocialnetworkingsitesisthattheyprovideadditionalinformationonauser'sinterest.Thatis,eachusercanasso-ciatethemselveswithwhatistermedasusergroupswithinonlinesocialnetworks[142].AgainletusconsiderYoutubeasanexample.Youtubeprimarilypromotesthebroadcastingofvideosandeachusercancreateandbroadcastvideosoftheirchoice.Alsoanyusercancreateand/orbelongtospecialinterestvideogroups(calledusergroups).Supposethatuserxhasspecialinterestinthetopicglobalwarming",thenxcancreateand/orjointheusergrouponglobalwarmingvideos.Afterjoiningthegroup,xcanaddhis/hercollectionofglobalwarmingvideostothegroupandholddiscussionswithothermemberswithinthegroup.AlsoanyuserofYoutubeirrespectiveofwhetherarenottheyarefriendsofxcanjointhisspecialinterestusergroup.ThevideosbroadcastedandtheusergroupsagivenuserisassociatedwithcanbeseenbyanyotheruserwithintheYoutubenetwork.AsimilarmechanismtakesplaceinallotheronlinesocialnetworksitessuchasOrkut,Flickr,FaceBookandLiveJournal[142].Asmentionedaboveoneoftheuniquefeatureofthedataononlinesocialnet-worksistheavailabilityofinformationonagivenuser'sinterests.Ourinterestinthisdatasetistoseeifthereareanysimilaritiesbetweenclustersandusergroups?Canwesaythatdenselyinterlinkedclustersofusershavesimilarinterests?Toanswerthesequestionswe¯rstappliedourproposedclusteringalgorithmontheYoutubedatasetconsistingof1,157,827usersand2,990,443links.WeobtainedclusteredstructuresfromtheYoutubenetworkwithamodularityindexQ¼0:66.ThishighvalueofQindicatesthatthepartitionofthenetworkobtainedbyour 90clusteralgorithmissigni¯cantandithasindeedfounddenselyinterlinkedgroups.Withtheinformationonclustersinthenetworkandusergroupswedidthefollowing,1.Withineachclusterweidenti¯edthesizeofthelargestusergroup.Thatis,amongthevarioususergroupsthateachusercanbelongto,weidenti¯edthesizeofthelargestusergroupwithinagivencluster.LetCU(i)bethefractionofnodesinclusterithatbelongtothelargestusergroup.(Notethatoneusercanbelongtomultipleusergroups).2.Withineachusergroupweidenti¯edthesizeofthelargestcluster.Thatis,wepartitionedthemembersofausergroupintotheirrespectiveclustersandidenti¯edthesizeofthelargestclusteramongthem.LetUC(j)bethefractionofnodesinusergroupjthatbelongthelargestcluster.InYoutubethereareabout30,087di®erentusergroupswithnumberofusersineachusergroupvaryingbetween1and7591.Thenumberofclustersidenti¯edbythealgorithmwasapproximatelyaround61,000andtheclustersizesvaryiedbetween3and250,428.Asmentionedaboveagivenusercanbelongtomultipleusergroupsornone.Therefore,tocalculateCU(i)inaclusteri,we¯rst¯lteredoutthoseusersintheclusterwhodonotbelongtoanyusergroup.WethencalculateCU(i)onthisreducedclusterthatonlycontainsuserswhobelongtoatleastoneusergroup.InYoutube,amongthe61,000clustersonlyabout11,177clusterscontainanon-zeronumberofuserswhobelongtousergroups.Theremaining49,823clustershadnomemberwhobelongedtousergroupsandarehencediscardedfromthecalculationofCU(i).Figure5.1showstherelationshipbetweenusergroupsandclusters.Atthispointwecannotconclusivelysayiftheyareoneandthesameortheyaretwodi®erentconcepts.Wecanseethatthereexistssomeclusterswithinwhichallusershaveatleastonecommoninterest(usergroup)whiletherearealsosomeclustersinwhichCU(i)isverysmallindicatingthediversityofinterestwithinthecluster.Similarlyifwelookatusergroups,therearesomeinwhichallmembersarefromthesamecluster,whiletherealsoexistsusergroupsinwhichmembersarefrommanydi®erentclusters.Howeverwewouldliketopointoutthatfromvisual 91Figure5.1.ThegraphsshowthefractionUC(j)andCU(i)intheYoutubenetworkacrossvarioususergroupsandclustersrespectively.inspectionwefoundalargenumberofusergroupshavingmembersfromthesamecluster,whiletheconverseisnottrue.Establishingtherelationshipbetweenclustersandusergroupsisoneoftheinterestingproblemsforfutureresearch.Itcanhelpusunderstandifclusters(communities)ofpeopleinsocialnetworksaresimilar?Itcanhelpustostudyifpeoplewithinclustershavesimilarinterests?Ifnot,howcanweidentifygroupsofpeoplewhowillhavesimilarinterests?5.1.2.4CitationnetworksandimpactfactorsAcitationnetworkisanetworkofpapers/articlesrepresentedasnodesanddi-rectededgesconnectingcitingpapersdirectedtowardsthecitedpapers[143,17].Citationshavebeenusedasatooltoidentifyresearchpapersandtoevaluatetheirintellectualmerit.TheScienceCitationIndex(SCI)usesthenumberofcitations 92apaperreceivesasameasureofitsintellectualmerit[143].Morerecently[144]aComprehensiveCitationIndex(CCI)hasbeenproposedthatmeasurestheimpactofapapernotjustbyitsimmediatecitationsbutasacumulativeimpactthroughpathwaysofcitations.Authorsin[144]assumethateverypaperequallydistributesaportionofitsin°uencetothepapersthatarebeingcited.Andbyaniterativeprocesstheseweightsaredistributedtoallnodesinthenetworkthroughdirectedpathways.In[144],theSCIandCCIconceptswereappliedtoacitationnetworkconsistingofpapersmostlyfromManagementSciencecollectedfromhttp://scholar.google.com.Thenetworkconsistedof288,404papersandtheircitationinterconnections.LetuscallthisnetworkGMS.InGMSamongthetop10papersrankedbyManagementScienceexpertsCCIperformedsuperiortoSCI.EvenwithabetterperformancefromCCIthereisstillroomforimprovement.OneoftheassumptionsofCCIistheequaldistributionofapaper'sin°uencetoallitsreferences.Itmaybeahardtaskjustifyingsuchadistributionandmaynotbetrueinreality.WeperformedasimpleexperimentinwhichwereplacedallthedirectededgesinGMSbyundirectedones.InthisversionofGMSweappliedthelabelpropagationalgorithmandfoundthatthenetworkwassigni¯cantlymodularwithQ¼0:69.Wealsoobservedthatingeneralpapersonsimilartopicswereclusteredtogether.Thiscouldimplythatsomeofthein°uencesapaperdistributestoitsreferencescanbebiasedtowardswithinclusterreferencesasopposedtoreferencesoutsideoftheclusterorvice-versa.Howtheseweightsshoulddi®er,byhowmuchandhowitwillimpacttheintellectualmeritofpapersisatopicforfutureresearch.5.1.2.5ClustersinCallgraphsDuringtheauthor'sinternshipatIBM-IndiaResearchLab(summer2007),thefocuswason¯ndingclustersparticularlyincallgraphs.Acallgraphisanetworkofphonenumbersrepresentedasnodesandinterconnectedbycompletedcallsbetweentwophonenumbers[82,145].Callgraphsaretakenoverdi®erenttimeintervalsandhencewhenanalyzingsuchgraphsitisimportanttomentioniftheedges(calls)arerecordedwithinadayorweekormonthandsoon.Oneoftheobjectivesduringtheinternshipwastoidentifytargetsegments.Thatis,identify 93Figure5.2.Examplesofstar-likegraphsformedbythecentralnodesu;vandwandtheirneighbors.Thelocalordercoe±cientsLu=0=Lv,whileLw=0:044.HoweverL0u>L0vsincedegreeofvishigherthanthedegreeofu.Alsonotethatdespitefewedgesrunningbetweenneighborsofw,L0w
此文档下载收益归作者所有