graph theory and complex networks

graph theory and complex networks

ID:30285917

大小:4.96 MB

页数:299页

时间:2018-12-28

上传者:U-14522
graph theory and complex networks_第1页
graph theory and complex networks_第2页
graph theory and complex networks_第3页
graph theory and complex networks_第4页
graph theory and complex networks_第5页
资源描述:

《graph theory and complex networks》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

GraphTheoryandComplexNetworksAnIntroductionMaartenvanSteen Copyright©2010MaartenvanSteenPublishedbyMaartenvanSteenISBN:978-90-815406-1-2Edition:1.Printing:01(April2010)AllrightstotextandillustrationsarereservedbyMaartenvanSteen.Thisworkmaynotbecopied,reproduced,ortranslatedinwholeorpartwithoutwrittenpermissionofthepublisher,exceptforbriefexcerptsinreviewsorscholarlyanalysis.Usewithanyformofinformationstorageandretrieval,electronicadaptationorwhatever,computersoftware,orbysimilarordissimilarmethodsnowknownordevelopedinthefutureisstrictlyforbiddenwithoutwrittenpermissionofthepublisher. ToMarielle,Max,andElke¨ CONTENTSPrefaceix1Introduction11.1Communicationnetworks.....................4Historicalperspective........................4FromtelephonytotheInternet..................6TheWebandWikis.........................81.2Socialnetworks...........................9Onlinecommunities........................9Traditionalsocialnetworks....................101.3Networkseverywhere.......................111.4Organizationofthisbook.....................132Foundations172.1Formalities..............................18Graphsandvertexdegrees.....................18Degreesequence...........................23Subgraphsandlinegraphs.....................282.2Graphrepresentations.......................31Datastructures...........................31Graphisomorphism........................332.3Connectivity.............................372.4Drawinggraphs...........................45Graphembeddings.........................45Planargraphs............................503Extensions553.1Directedgraphs...........................57Basicsofdirectedgraphs......................57v Connectivityfordirectedgraphs.................613.2Weightedgraphs..........................653.3Colorings...............................69Edgecolorings............................69Vertexcolorings...........................714Networktraversal794.1Eulertours..............................81ConstructinganEulertour.....................82TheChinesepostmanproblem..................874.2Hamiltoncycles...........................92PropertiesofHamiltoniangraphs.................92FindingaHamiltoncycle......................97OptimalHamiltoncycles......................1005Trees1055.1Background.............................107Treesintransportationnetworks.................107Treesasdatastructures.......................1095.2Fundamentals............................1125.3Spanningtrees............................1165.4Routingincommunicationnetworks...............119Dijkstra’salgorithm.........................120TheBellman-Fordalgorithm....................123Anoteonalgorithmicperformance................1276Networkanalysis1316.1Vertexdegrees............................133Degreedistribution.........................134Degreecorrelations.........................1366.2Distancestatistics..........................1406.3Clusteringcoefficient........................143Someeffectsofclustering.....................143Localview..............................144Globalview.............................1466.4Centrality..............................1507Randomnetworks1557.1Introduction.............................1577.2Classicalrandomnetworks....................158Degreedistribution.........................159Othermetricsforrandomgraphs.................162vi 7.3Smallworlds.............................1667.4Scale-freenetworks.........................172Fundamentals............................172Propertiesofscale-freenetworks.................178Relatednetworks..........................1818Moderncomputernetworks1858.1TheInternet.............................187Computernetworks.........................187MeasuringthetopologyoftheInternet..............1928.2Peer-to-peeroverlaynetworks...................195Structuredoverlaynetworks....................196Randomoverlaynetworks.....................2048.3TheWorldWideWeb........................212TheorganizationoftheWeb....................212MeasuringthetopologyoftheWeb................2149Socialnetworks2239.1Socialnetworkanalysis:introduction..............225Examples...............................225Historicalbackground.......................227Sociogramsinpractice:ateacher’said..............2319.2Somebasicconcepts........................234Centralityandprestige.......................234Structuralbalance..........................240Cohesivesubgroups........................246Affiliationnetworks.........................2529.3Equivalence.............................255Structuralequivalence.......................255Automorphicequivalence.....................258Regularequivalence........................259Conclusions261Mathematicalnotations267Index271Bibliography279vii PREFACEWhenIwasappointedDirectorofEducationfortheComputerSciencede-partmentatVUUniversity,IbecamepartlyresponsibleforrevitalizingourCScurriculum.Atthatpointintime,mathematicswasgenerallyexperi-encedbymoststudentsasdifficult,butevenmoreimportant,asbeingir-relevantforsuccessfullycompletingyourstudies.DespitenumerouseffortsfrommycolleaguesfromtheMathematicsdepartment,thisviewonmath-ematicshasneverreallychanged.ImyselfobtainedamastersdegreeinAppliedMathematics(andinparticularCombinatorics)beforeswitchingtoComputerScienceandgraduallymovingintothefieldoflarge-scaledis-tributedsystems.Myownresearchisbynaturehighlyexperimental,andbeingforcedtohandlelargesystems,bumpingintothetheoryandpracticeofcomplexnetworkswasalmostinevitable.Ialsoneverquitequitenjoyingmaterialon(combinatorial)algorithms,soIdecidedtorunanothertypeofexperiment.Theexperimentthateventuallyledtothistextwastoteachgraphthe-orytofirst-yearstudentsinComputerScienceandInformationScience.Ofcourse,Ineededtoexplainwhygraphtheoryisimportant,soIdecidedtoplacegraphtheoryinthecontextofwhatisnowcallednetworkscience.ThegoalwastoarousecuriosityinthisnewscienceofmeasuringthestructureoftheInternet,discoveringwhatonlinesocialcommunitieslooklike,obtainadeeperunderstandingoforganizationalnetworks,andsoon.Whiledoingso,teachinggraphtheorywasjustpartofthedeal.Noappropriatebookexisted,soIstartedwritinglecturenotes.AswithmostexperimentsthatIparticipatein(thehardworkisactuallydonebymystudents),thingsgotabitoutofhandandIeventuallyfoundmyselfwrit-inganotherbook.Consideringthatmyothertextbooksarereallyon(dis-tributed)computersystemsandbarelycontainanymathematicalsymbols(as,infact,isalsothecaseformostofmyresearchpapers),thisbookistobeconsideredassomewhatexceptional.Infact,becauseIdonotconsiderix myselftobeamathematiciananymore,I’mnotquitesurehowthisbookshouldbeclassified.Isitmath?Isitcomputerscience?Doesitmatter?Thegoalistoprovideafirstintroductionintocomplexnetworks,yetinamoreorlessrigorousway.Afterstudyingthismaterial,astudentshouldhaveaprettygoodideaofwhatmakesreal-worldnetworkscomplexin-steadofcomplicated,andcandoalotmorethanjusthandwavingwhenitcomestoexplainingreal-worldphenomena.Whilegettingtothatpoint,Ialsohopetohaveachievedtwoothergoals:successfullyteachingthefoun-dationsofgraphtheory,andevenmoreimportant,loweringthethresholdforstudyingmathematicalmaterial.Thelattermaynotbeobviouswhenskimmingthroughthetext:itisfullofmathematicalsymbols,theorems,andproofs.Ihavedeliberatelychosenforthisapproach,feelingconfidentthatifenoughandtargetedattentionispaidtothelanguageofmathematicsinthefirstchapters,astudentwillbecomeawareofthefactthatmathematicallanguageissometimesonlyin-timidating:mathematicians’barksareoftenworsethantheirbites.Studentswhohavesofarfollowedmyclasseshaveindeedconfirmedthattheyweresurprisedathowmucheasieritwastoaccessthemathoncetheygotoverthenotations.Ihopethatthisapproachwilllastforlong,makingitatleasteasierformanystudentstonotimmediatelypullbackwhenencounteringmathematicallanguageinothertexts.IntendedreadershipThisbookhasbeenwrittenforfirst-orsecond-yearundergraduateswhohavetakentheusualcoursesinmathematicsastaughtinhighschool.How-ever,althoughIclaimthatthematerialisnotinherentlydifficult,itwillcer-tainlyrequireseriousstudyingbymoststudents,andcertainlythoseforwhichmathdoesnotcomenatural.Asmentioned,Ihavedeliberatelycho-sentousethelanguageofmathbecauseitisnotonlypreciseandcompre-hensive,butaboveallbecauseIbelievethatatthelevelofthisbook,itwilllowerthethresholdforothermathematicaltexts.Itshouldbeclearthatthelecturerusingthismaterialmayneedtopaysomespecialefforttoencour-agestudents.Formoststudents,thelanguagewillturnouttobethehardpart,notthecontent.SupplementarymaterialAssaid,thisbookispartofacourseongraphtheoryandcomplexnet-works.Althoughitcanbeusedforself-study,IencouragestudentsandtheirinstructorstovisittheaccompanyingWebsite:http://www.distributed-systems.net/gtcn/x wherelotsofextramaterialcanbefound,including,mostimportantly,ahugecollectionofexercises(withsolutions).Mygoalistoexpandthissetofexercisescontinuously.Thisisthemostimportantreasonnottohaveincludedanyexercisesinthebook:theycanbereadilyobtainedfromthesite,andalwaysup-to-date.Tomakethematerialmoreaccessible(andfun),butalsotoallowstu-dentstodosomebasicanalysisoflargergraphsandnetworks,wehavebeenusingMathematicaincombinationwithCombinatorica.Allmate-rial,includingMathematicanotebooksanddataongraphsareallavail-ablethroughtheWebsite.Thesitealsohassomeextratoolsforgeneratinggraphs.Ofcourse,slidesandhandoutsareavailable(alloriginatingfromLATEXsources),aswellasallthefiguresfromthebook.Perhapsmostimportantly,anelectronicversionofthebookitselfisalsoavailable.AllmaterialisfreelyaccessibleSometimeswhenyouwriteabook,itmakesalotofsensetothinkbigandactcommercially.Thinkingbiginthissensemeansyouexpectmanypeopletohaveaccesstoyourbook.Actingcommerciallymeansthatyoutrytosuccessfullymarketandsellyourbook.Sometimes,it’senoughtojustthinkbig,knowingthatactingcommerciallywillcertainlykeepeverythingsmall.Whenyouwriteabookcontainingmathematicalsymbols,thinkingbigandactingcommerciallydoesn’tseemtherightcombination.Imerelyhopetoseethematerialtobeusedbymanystudentsandinstructorseverywhereandtoreceivealotofconstructivefeedbackthatwillleadtoimprovements.Actingcommerciallyhasneverbeenoneofstrongpointsanyway.However,freelyaccessibledoesn’tmeanthateveryonehastherighttocopyandspreadthematerial,whichIwouldfindquiteoffensive.Forthisreason,whenrequestinganelectroniccopy,thebookwillbewatermarkedwithyoure-mailaddress.ThewatermarkispartoftheLATEXsource,soit’sprettydifficulttoremove,althoughIdonothavetheillusionthatremovalisimpossible.Finally,forthosewhostillpreferto(also)haveahard-copyversionofthebook(ofcourse,withoutawatermark),suchcanberealizedbyplacinganorderthroughtheWebsite.Furtherinformationcanbefoundthere.Thepriceiscomparabletoprintingityourself.AcknowledgmentsThereareafewpeoplewhodeservetobementioned.SpyrosVoulgarishasbeenresponsibleforcreatinghomeworkassignments,preparingMath-xi ematicanotebooks,andsettingupalltheexerciseclasses.AlbanaGabahasagiftedtalenttoprovideveryconstructivefeedback(nexttothefactthatshehasbeenworkinglikeadogtoprocessallthestudentassignments).AchrafBelmokademhasdoneaterrificjobonsettingupaWeb-basedsubsystemforlettingstudentsself-assesstheirabilitiesforsolvinggraphproblems.Fi-nally,Iwouldliketothankthestudentswhohaveundergonemyteachingforthepasttwoyearsandwhohave,despiteallthemistakes,continuedtoclaimthattheyenjoyedit.MaartenvanSteenAmsterdam,April2010xii CHAPTER1INTRODUCTION On11September2001therewasamaliciousattackontheWTCtowersinNewYorkCity,eventuallyleadingtothetwobuildingscollapsing.Whatisnotknowntomanypeople,isthattherewerethreetransatlanticInter-netcablescomingashoreclosetotheWTCandthatanimportantInternetswitchingstationwasdamaged,alongwithtwootherimportantInternetre-sourcecenters.PeterSalusandJohnQuarterman[2002]hadsincelongbeenmeasuringtheperformanceoftheInternetbycheckingthereachabilityofafairlylargecollectionofservers.Ineffect,theysimplysentmessagesfromdifferentlocationsontheInternettothesespecialcomputersandrecordedwhetherornotserverswouldberesponding.Ifreachabilitywas100%,thismeantthatallserverswereupandrunning.Ifreachabilitywasless,thiscouldmeanthatserverswereeitherout-of-order,orthatthecommunica-tionpathstosomeoftheserverswerebroken.Immediatelyaftertheattackreachabilitydroppedbyabout9%.Within30minutesithadalmostreacheditsoldvalueagain.ThisexampleillustratestwoimportantpropertiesoftheInternet.First,evenwhendisruptingwhatwouldseemasavitallocationintheInternet,suchadisruptionbarelyaffectstheoverallcommunicationcapabilitiesofthenetwork.Second,theInternethasapparentlybeendesignedinsuchawaythatittakesalmostnotimetorecoverfromabigdisaster.Thisrecov-eryisevenmoreremarkablewhenyouconsiderthatnomanualrepairshadevenstarted,butalsothatnodesignerhadeverreallyanticipatedsuchat-tacks(althoughrobustnesswasdefinitelyadesigncriterionfortheInternet).TheInternetdemonstratedemergentself-healingbehavior.1TheInternetisanexampleofwhatisnowcommonlyreferredtoasacomplexnetwork,whichwecaninformallydefineaslargecollectionofinterconnectednodes.Anodecanbeanything:aperson,anorganization,acomputer,abiologicalcell,andsoforth.Interconnectedmeansthattwonodesmaybelinked,forexample,becausetwopeopleknoweachother,twoorganizationsexchangegoods,twocomputershaveacableconnectingthetwoofthem,orbecausetwoneuronsareconnectedbymeansofasynapsesforpassingsignals.Whatmakesthesenetworkscomplexisthattheyaregenerallysohugethatitisimpossibletounderstandorpredicttheiroverallbehaviorbylookingintothebehaviorofindividualnodesorlinks.Asitturnsout,complexnetworksareeverywhere.Or,tobemorepre-cise,itturnsoutthatifwemodelreal-worldsituationsintermsofnetworks,weoftendiscovernewthings.Whatisstriking,isthatmanyreal-worldnet-workslookalike:thestructureoftheInternetresemblestheorganizationofourbrain,butalsotheorganizationofonlinesocialcommunities.Where1Aswe’llencounterinlaterchapters,there’snomagichere:so-calledroutingalgorithmssimplyadjusttheirdecisionswhenpathsbreak.3 thesesimilaritiescomefromisstillamystery,justasitisoftenverydifficulttounderstandhowcertainnetworkswereactuallystructured.Beforewegodeeperintowhatcomplexnetworksactuallyentails,let’sfirstconsiderafewgeneralareaswherenetworksplayavitalrole,startingwithcommuni-cationnetworks.1.1CommunicationnetworksNotevensolongago,settingupaphonecalltosomeoneontheothersideoftheworldrequiredtheinterventionofahumanoperator.Moreover,anestablishedconnectionwasnoguaranteeforbeingabletounderstandeachotherasthequalitycouldbeprettybad.Manywillrecallthesesituationstohappeninthe70sand80softhepreviouscentury—reallynotthatlongago.Today,cellphonesallowustobecontactedvirtuallyanywhereandanytime,andcoveragecontinuestoexpandtoeventhemostremoteareas.Settingupahigh-qualityvoiceconnectionovertheInternetwithpeersanywherearoundtheworldisplainsimple.Alongtheselines,weneedmerelywaitawhileuntilitisalsopossibletohavecheap,high-qualityvideoconnectionsallowingustoexperienceourremotefriendsasbeingvirtuallyinthesameroom.Theworldappearstobebecomingsmaller,andpeoplearebecomingevermoreconnected.Obviously,telecommunicationhasplayedacrucialroleinestablishingthisconnectedworldasitiscommonlyknown,butwiththeconvergenceoftelecommunicationanddatanetworks(andnotablytheInternet),itisdifficultnottobeconnectedanymore.Beingconnectedhasprofoundeffectsforthedisseminationofinformation.Andasweshallsee,howweareconnectedplaysacrucialrolewhenitcomestothespeedandrobustnessofsuchdissemination,amongmanyotherissues.HistoricalperspectiveTohaveaconnectedworlditisobviousthatweneedtocommunicate.Ifwewantthisworldtohavesignificantcoverage,long-distancecommunicationisobviouslyimportant.Unlikewhatmanytendtobelieve,networksthatfacilitatesuchcommunicationhavealonghistory,asdescribedbyHolz-mannandPehrson[1995].Apartfromwell-knownmeansofcommunica-tionsuchassendingmessengersorusingpigeons,long-distancecommuni-cationwithouttheneedtophysicallytransportamessagehasalwayscaughttheattentionofmankind.Typically,suchtelegraphiccommunicationusedtobedonethroughfirebeacons,mirrors(i.e.,heliographiccommunication),drums,andflags.Communicationpathssetupusingsuchmethods,forex-4 amplebyhavingcommunicationpostsorganizedatline-of-sightdistances,areknownfromGreekandRomanhistory.However,itwasn’tuntiltheendofthe18thCenturythatasystem-aticapproachwasdevelopedtoestablishtelegraphiccommunicationnet-works.Suchnetworkswouldconsistofcommunicationposts,ofwhichpairswouldlieineachother’sline-of-sight.Typically,fortheseopticaltelegraphs,distancesbetweentwopostswouldbeintheorderoftensofkilometers,whichwasrealisticgiventhathigh-qualitytelescopescouldbeused.Animportantaspectinthedesignofthesenetworkswasthecommunicationprotocol,whichwouldprescribetheencodingofletters,butalsowhattodoiftherewasatransmissionerror.Tomakemattersmoreconcrete,considerFigure1.1whichshowsamodelofashuttertelegraph.BPNE(a)(b)Figure1.1:(a)Amodelofashutterstationwithsix(open)shuttersand(b)afewexamplesofhowletterswereencoded.AsshowninFigure1.1(b),lettersarerepresentedbyspecificcombina-tionsofopenandclosedshutters.Inthisway,itbecamepossibletotrans-mitmessagesoverlongdistances.Ofcourse,itbecameequallyimportanttothinkaboutencryptionofmessages,handlingtransmissionerrors,syn-chronizationbetweentransmitterandreader(i.e.,senderandreceiver),andsoon.Inotherwords,theseseeminglyprimitivecommunicationnetworkshadtodealwithvirtuallythesameissuesasmodernsystems.Conceptually,thereisreallynodifference.5 Bythemiddleofthe19thCentury,Europehadopticaltelegraphicnet-worksinstalledintheScandinaviancountries,France,England,Germany,andothers.Concerningtopology,thesenetworkswererelativelysimple:therewereonlyrelativelyfewnodes(i.e.,communicationposts),andcyclesdidnotexist.Thatis,betweenanytwonodesmessagescouldtravelonlythroughauniquepath.Suchnetworksarealsoknownastrees.Mattersbecameseriouswhentheelectricaltelegraphsystememerged.Insteadofusingvision,communicationpathswererealizedthroughelec-tricalcables.Themediumprovedtobesuccessful:bythemiddleofthe19thCenturytheelectricaltelegraphspannedmorethan30,000kilometersintheUnitedStates,makingitmorethanjustaseriouscompetitortoopticaltelegraphsystems.Infact,bythenitwascleartomostpeoplethattheop-ticalnetworkswereheadingtowardsadeadend.In1866,networksintheUnitedStatesandEuropeweresuccessfullyconnectedthroughatransat-lanticcable(whereearlierattemptshadfailed).Gradually,theconceptofaworldwidenetworkwasbecomingreality.FromtelephonytotheInternetTheimpactofaworldwidetelephonynetworkcanonlybeunderestimated.Fromanenduser’sperspective,itreallydidn’tmatteranymorewhereyouwere,butonlythattheotherpartywassimultaneouslyonline.Inotherwords,telecommunicationnetworksrealizedlocationindependency.Thisin-dependencycouldberealizedonlybecauseitwaspossibletoestablishacir-cuitbetweenthetwocommunicatingparties:acommunicationpathfromonepartytotheotherwithintermediatenodesoperatingasswitches.Inmostcases,theseswitcheshadfixedlocationsandeveryswitchwasphysi-callylinkedtoafewotherswitches.Thecombinationofswitchesandlinksformacommunicationnetwork,whichcanberepresentedmathematicallybywhatisknownasagraph,theobjectofstudyinthisbook.Aswealreadydiscussed,telecommunicationnetworkswerewellestab-lishedwhenpeoplebegantothinkaboutconnectingcomputersandthusestablishingdatacommunicationnetworks.Ofcourse,themanyexistingnetworksalreadymadeitpossibletosenddata,forexample,asatelegram.Thenewchallengewastoconnectingtheseseparatenetworksintologicallyasingleonethatcouldbeusedbycomputersusingthesameprotocol.Thisledtotheideaofbuildingacommunicationsysteminwhichpossiblylargemessagesweresplitintosmallerunitscalledpackets.Eachpacketwouldbetaggedwiththeaddressofitsdestinationandsubsequentlyroutedthroughthevariousnetworks.Itisimportanttonotethatpacketsfromthesamemessagecouldeachfollowtheirownroutetothedestination,wheretheywouldthenbesubsequentlyusedtoreassembletheoriginalmessage.6 Whenaswitchreceivedapacket,itwouldonlythendecidetowhichnextswitchthepacketwouldbeforwarded.Thispacketswitchingap-proachcontrastssharplywithtelecommunicationnetworksinwhichtwoendpointswouldfirstestablishapathandthensubsequentlyletallcom-municationpassthroughthatpath,alsoreferredtoascircuitswitching.Thefirstpacket-switchingnetworkwasestablishedin1969,calledtheARPANET(AdvancedResearchProjectsAgencyNetwork).ItformedthestartingpointofthepresentInternet.Keytothisnetworkweretheinter-facemessageprocessors(IMPs),specialcomputersthatprovidedasystem-independentinterfaceforcommunication.Inthisway,anycomputerthatwantedtohookuptotheARPANETneededonlytoconformtotheinter-faceofanIMP.IMPswouldthenfurtherhandlethetransferofpackets.Theyformedthefirstgenerationofnetworkswitches,orrouters.Togiveanim-pressionofwhatthisnetworklookedlike,Figure1.2showsalogicalmapofIMPsandtheirconnectedcomputersasofApril1971.LinSRIUtahIllinoisMITCASEcolnUCSBStanSOCCMUfordHarBurUCLARANDBBNvardroughsFigure1.2:AmapoftheARPANETasofApril1971.RectanglesrepresentIMPs;ovalsarecomputers.TheARPANETof1971constitutedanetworkwith15nodesand19links.Itissosmallthatwecaneasilydrawit.We’vepassedthatstagefortheInternet.(Infact,itisfarfromtrivialtodeterminethesizeoftoday’sInter-net.)Ofcourse,thatnetworkwasalsoconnected:itispossibletorouteapacketfromanysourcetoanydestination.Infact,connectivitycouldstillbeestablishedifarandomlyselectedsinglelinkbroke.Animportantde-signcriterionforcommunicationnetworksishowmanylinksneedtofailbeforethenetworkispartitionedintoseveralparts.Forourexamplenet-workofFigure1.2,itisclearthatthisnumberis2.Restassuredthatforthepresent-dayInternet,thisnumberismuchhigher.Likewise,wecanaskourselveshowmanynodes(i.e.,switchesorIMPs)needtofailbeforeconnectivityisaffected.Again,itcanbeseenthatweneed7 toremoveatleast2nodesbeforethenetworkispartitioned.Surprisingly,inthepresent-dayInternetweneednotremovethatmanynodestoestablishthesameeffect.ThisiscausedbythestructureoftheInternet:researchershavediscoveredthattherearerelativelyfewnodeswithverymanylinks.ThesenodesessentiallyformanAchilles’heeloftheInternet.Insubsequentchapters,youwilllearnwhy.TheWebandWikisNexttotheimportanceofe-mailandotherInternetmessagingsystems,thereislittlediscussionabouttheimpactoftheWorldWideWeb.TheWebisanexampleofadigitalinformationspace:acollectionofunitsofin-formation,linkedtogetherintoanetwork.TheWebisperhapsthebiggestinformationspacethatweknowoftoday:bytheendofJanuary2005,itwasestimatedtohaveatleast11.5billionindexablepages[GulliandSignorini,2005],thatis,pagesthatcouldbefoundandindexedbythemajorsearchenginessuchasGoogle.Threeyearslater,differentstudies(usingdifferentmetrics)indicatethatwemaybedealingwith30-50billionpages.Inanycase,weareclearlydealingwithaphenomenalgrowth.WhatmakesinformationspacessuchastheWebinterestingforourstud-ies,isthatagainthesespacesformanetwork.InthecaseoftheWeb,eachpagemay(andgenerallywill)containlinkstootherpagesandcorrespondstoanodeinthenetwork.Whatbecomesinterestingarequestionssuchas:•Ifwetakethenumberoflinkspointingtoapageasameasureofthatpage’spopularity,whatcanwesayaboutthenumberandintensityofpagepopularity(i.e.,whatisthedistributionofpagepopularity)?•DoestheWebalsosharecharacteristicswithwhatareknownassmallworldnetworks:isitpossibletonavigatetoanyotherpagethroughonlyafewlinks?AsweshalldiscussextensivelyinChapter8,theWebindeedhasitsowncharacteristics,someofwhichcorrespondtothoseinsmallworlds.How-ever,therearealsoimportantdifferences.Forexample,itturnsoutthatthedistributionofpagepopularityisveryskewed:therearerelativelyfew,butextremelypopularpages.Incontrast,byfarmostpagesarenotpopular,yettherearemanyofsuchunpopularpages,whichmakesthecollectionofunpopularpagesbyitselfandinterestingsubjectforstudy.AninformationspacerelatedtotheWebisthatoftheonlineencyclo-pediaWikipedia.Bytheendof2007,over7.5millionpageswerecounted,writteninmorethan250differentlanguages.TheEnglishWikipediaisby8 farthelargest,withmorethan2millionarticles.Itisalsothemostpopu-laronewhenmeasuringthenumberofpagerequests:45%ofallWikipediatrafficisdirectedtowardstheEnglishversion[Urdanetaetal.,2009].Again,Wikipediaformsanetworkwithitspagesasnodesandreferencestootherpagesaslinks.LiketheWeb,itturnsoutthattherearefewverypopu-larpages,andmanyunpopularones(butsomanythattheycannotbeig-nored)[Voss,2005].1.2SocialnetworksNexttocommunicationnetworks,networksthatarebuiltaroundpeoplehavesincelongbeensubjectofstudy.Wefirstconsidermodernsocialnet-worksthathavecomeintoplayasonlinecommunitiesfacilitatedbytheInternet.OnlinecommunitiesIntheirlandmarkessay,LickliderandTaylor[1968]foresawthatcomputerswouldformamajorcommunicationdevicebetweenpeopleleadingtotheonlinecommunitiesmuchliketheonesweknowtoday.Indeed,perhapsoneofthebiggestsuccessesoftheInternethasbeentheabilitytoallowpeopletoexchangeinformationwitheachotherbymeansofuser-to-usermessagingsystems[WamsandvanSteen,2004].Thebestknownofthesesystemsise-mail,whichhasbeenaroundeversincetheInternetcametolife.Anotherwell-knownexampleisnetworknews,throughwhichuserscanpostmessagesatelectronicbulletinboards,andtowhichothersmaysubsequentlyreact,leadingtodiscussionthreadsofallsortsandlengths.Morerecentlyinstantmessagingsystemshavebecomepopular,allowinguserstodirectlyandinteractivelyexchangemessageswitheachother,pos-siblyenhancedwithinformationonvariousstatesofpresence.Itisinterestingtoobservethatfromatechnologicalpointofview,mostofthesesystemsarereallynotthatsophisticatedandarestillbuiltwithtech-nologythathasbeenaroundfordecades.Inmanyways,thesesystemsaresimple,andhavestayedsimple,whichallowedthemtoscaletosizesthataredifficulttoimagine.Forexample,ithasbeenestimatedthatin2006al-most2millione-mailmessagesweresenteverysecond,byatotalofmorethan1billionusers.Admittedly,morethan70%ofthesemessageswerespamorcontainedviruses,buteventhenitisobviousthatalotofonlinecommunicationtookplace.Thesenumberscontinuetorise.Morethanthetechnology,itisinterestingtoseewhatthesecommuni-cationfacilitiesdotothepeoplewhousethem.Whatwearewitnessingtodayistheriseofonlinecommunitiesinwhichpeoplewhohavenever9 meteachotherphysicallyaresharingideas,opinions,feelings,andsoon.Infact,Doddsetal.[2003]haveshownthatalsoforonlinecommunitieswearedealingwithwhatisknownasasmallworld.Toputitsimply,asmallworldischaracterizedbythefactthateverytwopeoplecanreacheachotherthroughachainofjustahandfulofmessages.Thisphenomenonisalsoknownasthe“sixdegreesofseparation”[Watts,2003]towhichwewillreturnextensivelylater.Doddsetal.wereinterestedtoseewhethere-mailuserswerecapableofsendingamessagetoaspecificpersonwithoutknowingthatperson’saddress.Inthatcase,theonlythingyoucandoissendthemessagetooneofyouracquaintances,hopingthatheorsheis“closer”tothetargetthanyouare.Withover60,000usersparticipatingintheexperiment,theyfoundthat384outoftheapproximately24,000messagechainsmadeittodesignatedtargetpeople(therewere18targetsfrom13differentcountriesallovertheworld).Ofthese384chains,50%hadalengthsmallerthan5–7,dependingonwhetherthetargetwaslocatedinthesamecountryaswherethechainstarted.Whatwehavejustdescribedisthephenomenonofmessagestravelingthroughanetworkofe-mailusers.Usersarelinkedbyvirtueofknowingeachother,andtheresultingnetworkexhibitspropertiesofsmallworlds,effectivelyconnectingeverypersontotheothersthroughrelativelysmallchainsofsuchlinks.Describingandcharacterizingtheseandothernet-worksformstheessenceofnetworkscience.TraditionalsocialnetworksLongbeforetheInternetstartedtoplayaroleinmanypeople’slives,so-ciologistsandotherresearchersfromthehumanitieshavebeenlookingatthestructureofgroupsofpeople.Inmostcases,relativelysmallgroupswereconsidered,necessarilybecauseanalysisoflargegroupswasoftennotfeasible.AnimportantcontributiontosocialnetworkanalysiscamefromJacobMorenowhointroducedsociogramsinthe1930s.Asociogramcanbeseenasagraphicalrepresentationofanetwork:peoplearerepresentedbydots(calledvertices)andtheirrelationshipsbylinesconnectingthosedots(callededges).AnexamplewewillcomeacrossinChapter9isoneinwhichaclassofchildrenareaskedwhotheylikeanddislike.Itisnothardtoimaginethatwecanuseagraphicalrepresentationtorepresentwholikeswhom,asshowninFigure1.3.Decadeslater,undertheinfluenceofmathematicians,sociogramsandsuchwereformalizedintographs,ourcentralobjectofstudy.Asmen-tioned,graphsaremathematicalobjects,andassuchtheycomealongwith10 -++-+-+-++--++++-Figure1.3:Therepresentationofasociogramexpressingaffectionbetweenpeople.Theabsenceofalinkindicatesneutrality.atheoreticalframeworkthatallowsresearcherstofocusonthestructureofnetworksinordertomakestatementsaboutthebehaviorofanentiresocialgroup.Socialnetworkanalysishasbeenimportantforthefurtherdevelopmentofgraphtheory,forexamplewithrespecttointroducingmetricsforidenti-fyingimportanceofpeopleorgroups.Forexample,apersonhavingmanyconnectionstootherpeoplemaybeconsideredrelativelyimportant.Like-wise,apersonatthecenterofanetworkwouldseemtobemoreinfluentialthansomeoneattheedge.Whatgraphtheoryprovidesusarethetoolstoformallydescribewhatwemeanbyrelativelyimportant,orhavingmoreinfluence.Moreover,usinggraphtheorywecaneasilycomeupwithal-ternativesfordescribingimportanceandsuch.Havingsuchtoolshasalsofacilitatedbeingmorepreciseinstatementsregardingthepositionorrolethatpersonhaswithinacommunity.WewillcomeacrosssuchformalitiesinChapter9.1.3NetworkseverywhereCommunicationnetworksandsocialnetworksaretwoclassesofnetworksthatmanypeopleareawareof.However,therearemanymorenetworksasshowninFigure1.4.Whatshouldimmediatelybecomeclearisthatnet-worksoccurinverydifferentscientificdisciplines:economics,organiza-tionalstudies,socialsciences,biology,logistics,andsoforth.What’smore,theterminologythatisusedtodescribethedifferentnetworksineachdisci-plineislargelythesame,whichmakesitrelativelyeasyformembersofdif-ferentcommunitiestocooperateinunderstandingthefoundationsofcom-plexnetworks.Whatisevenmorestrikingisthefactthatnetworksfromverydifferentdisciplinesoftenlooksomuchalike.Thiscommonterminol-ogyandthestrongresemblanceofnetworksacrossscientificdisciplineshasbeeninstrumentalinboostingnetworkscience.11 NetworkVerticesEdgesDescriptionAirlineairportsflightsConsiderthescheduledflights(ofatrans-specific)carrierbetweentwoairports.portationStreetjunctionsroadAroadsegmentextendsexactlyplanssegmentbetweentwojunctions.Avariationistodistinguishbetweenone-wayandtwo-waysegments.Trainstationsconnec-Twostationsareconnectedonlyiftheretrans-tionisatrainconnectionscheduledthatportationdoesnotpass(possiblywithoutstopping)anyintermediatestations.RailwayjunctionstrackConsidertheactualrailwaytracks.networksegmentWheretracksegmentsmergeorcross,wehavejunctions.BrainneuronssynapsesEachneuroncanbeconsideredtoconsistofinputs(calleddendrites)andoutputs(calledaxon).Synapsescarryelectricalsignalsbetweenneurons.Geneticgenestranscrip-Ingenetic(regulatory)networkswenetworkstionmodelhowgenesinfluenceeachother,factorinparticular,howtheproductofonegenedeterminestherateatwhichanothergeneistranscribed(i.e.,atwhichrateitproducesitsownoutput).Antjunctionsphero-Inorderforantstotelleachotherwherecoloniesmonesourcesoffoodare,theyproducetrailspheromoneswhichisachemicalthatcanbepickedupbyotherants.Pheromonesjointlyconstitutepaths.CitationauthorscitationInscientificliterature,itiscommonnetworkspracticeto(extensively)refertorelatedpublishedworkandsourcesofstatements,inturnleadingtocitationnetworks.Tele-numbercallNetworksofphonecallsreflect(mostly)phonepairsofpeopleexchanginginformation,callsthusformingasocialnetworktechnicallyrepresentedbyphonenumbersandactualcalls.Reputa-peopleratingInelectronictradingnetworkssuchastione-Bay,buyersratetransactions.Asnetworksbuyersinturncanalsobesellers,weobtainanetworkinwhichratesreflectthereputationbetweenpeople.Figure1.4:Examplesofnetworks.12 Understandingcomplexnetworksrequirestherightsetoftools.Inourcase,thetoolsweneedcomefromafieldofmathematicsknownasgraphtheory.Inthisbook,you’lllearnabouttheessentialelementsofgraphthe-oryinordertoobtaininsightintomodernnetworks.Nexttothat,wedis-cussanumberofconceptsthatarenormallynotfoundintraditionaltext-booksongraphtheory,suchasrandomnetworksandvariousmetricsforcharacterizinggraphs.1.4OrganizationofthisbookInthefollowingchapterswe’llgothroughthefoundationsofgraphtheoryandmoveonintopartsthatarenormallydiscussedinmoreadvancedtext-booksonnetworks.Thegoalofthistextistoprovideonlyanawarenessandbasicunderstandingofcomplexnetworks,forwhichreasonnoneoftheadvancedmathematicsthataccompanycomplexnetworksisdiscussed.Tomakematterseasier,specialnotesareincludedthatgenerallyprovidefurtherinformation,suchasthefollowing:Note1.1(Moreinformation)Thisisanexampleofhowadditionalsidenotesarepresented.Textinsuchnotescanalwaysbeskippedasnotesdonotaffecttheflowofthemaintext.Therearedifferenttypesofnotes:Studytips:Studyinggraphtheoryisnotalwayseasy,notbecausethema-terialissodifficult,butbecauseidentifyingthebestapproachtotackleaspecificproblemmaynotbeobvious.Ihavecompiledvarioustipsbasedonexperienceinteaching(andoncemyselflearning)graphthe-ory.Studentsarestronglyencouragedtoreadthesetipsandputthemtotheirownadvantage.Mathematicallanguage:Formanypeople,mathematicsisandremainsabarriertoaccessingotherwiseinterestingmaterial.Thelanguageofmathematiciansaswellasthecommonlyusedtoolsandtechniquesaresometimesevenintimidating.However,therearesomanycasesinwhichthebarrierisonlyvirtual.Theonlythingthatisneededisget-tingacquaintedwithsomebasicsandlearninghowtoapplythem.Innotesfocusingonmathematicallanguage,IgenerallytakeastepbackonpreviouslypresentedmaterialandtranslatethemathintoplainEn-glish,explainmathematicalnotations,andsoforth.Thesenotesaremeanttohelpunderstandthemath,butdonotserveasareplacement.Mathematicssimplyoffersalevelofprecisionthatisdifficulttomatch13 with(informal)English,yetthenotationsshouldnotbesomethingtokeepanyoneawayfromreachingadeeperunderstanding.Prooftechniques:NotablyinChapters2and3sometimeistakentoex-plainabitmoreabouthowtoprovetheorems.Oneofthemaindiffi-cultiesthatIexperiencedwhenfirststudyinggraphtheoryandmoregenerally,combinatorics,wasfindingstructureinproofs.Asinvirtu-allyanyotherfieldofmathematics,graphtheoryusesawholearrayofprooftechniques.Inthesenotes,themostcommonlyusedonesaremadeexplicit,aimingatcreatingabetterawarenessofavailabletech-niquessothatstudentsmayhavelessofafeelingofwalkinginthedarkwhenitcomestosolvingmathematicalproblems.Algorithmics:Graphtheoryinvolvesmanyalgorithms,suchas,forex-ample,findingshortestpaths,identifyingreachablevertices,deter-miningsimilarity,andsoon.Traditionally,algorithmshavealwaysbeendescribedusingmath,butthatlanguageisnotparticularlywell-equippedforexpressingtheflowofcontrolinherenttomostalgo-rithms.Inalgorithmicnotessomeofgraphalgorithmsareexpressedinpseudocode,roughlyfollowingatraditionalprogramminglan-guage.Invirtuallyallcases,thisdescriptionleadstoabettersepa-rationoftheactualmathandthestepscomprisinganalgorithm.Moreinformation:Thesetypeofnotescontainawidevarietyofinforma-tion,rangingfromadditionalbackgroundmaterialtomoredifficultmathematicalmaterialsuchasproofs.Inallcases,thesenotesdonotinterferewiththemaintextandmaybeskippedonfirstreading.Proofsthathavebeenmarked“(*)”maybeskippedatfirstreading:theyaretobeconsideredthetougherpartsofthematerial.Thebookisroughlyorganizedintotwoparts.ThefirstpartscoversChapters2–6.Thesechaptersroughlycoverthesamematerialthatcanusu-allybefoundinstandardtextbooksongraphtheory.ExceptforChapter6,thismaterialistobeconsideredessentialforstudyinggraphtheoryandshouldinanycasebecovered.Chapter6canbeconsideredasacompi-lationofvariousmetricsfromdifferentdisciplinestocharacterizegraphs,theirstructures,andthepositionsthatdifferentnodeshaveinnetworks.ThesecondpartconsistsofChapters7–9anddiscusses(graphmodelsof)real-worldnetworks.NotablyChapter7onrandomnetworkscontainsmaterialthatisoftenpresentedonlyinmoreadvancedtextbooksyetwhichIconsidertobecrucialforraisingscientificinterestinmodernnetworksci-ence.Randomnetworksareimportantfromaconceptualmodelingpointofview,fromananalysispointofview,andareimportantforexplainingtheemergentbehaviorweseeinreal-worldsystems.Bykeepingexplana-14 tionsassimpleaspossibleandattemptingtostickonlytothecoreelements,thismaterialshouldberelativelyeasytoaccessforanyonehavingessen-tiallylearnedonlyhigh-schoolmathematics.Thetwosucceedingchaptersdiscusstheoryandpracticeofreal-worldsystems:computernetworksandsocialnetworks,respectively.15 CHAPTER2FOUNDATIONS Inthepreviouschapterwehaveinformallyintroducedthenotionofanet-workandhavegivenseveralexamples.Inordertostudynetworks,weneedtouseaterminologythatallowsustobeprecise.Forexample,whenwespeakaboutthedistancebetweentwonodesinanetwork,whatdowere-allymean?Likewise,isitpossibletospecifyhowwellconnectedanetworkis?Theseandotherstatementscanbeformulatedaccuratelybyadoptingterminologyfromgraphtheory.Graphtheoryisafieldinmathematicsthatgainedpopularityinthe19thand20thcentury,mainlybecauseitallowedtodescribephenomenafromverydifferentfields:communicationinfrastruc-tures,drawingandcoloringmaps,schedulingtasks,andsocialstructures,justtonameafew.Wewillfirstconcentrateonlyonthefoundationsofgraphtheory.Tothisend,wewillusethelanguageofmathematics,asitallowsustobepreciseandconcise.However,tomanythislanguagewithitsmanysymbolsandoftenpeculiarnotationscaneasilyformanobstacletograsptheessenceforwhatitisbeingused.Forthisreason,wewillgentlyandgraduallyintroducenotationswhileprovidingmoreverbosedescriptionsalongsidethemoreformaldefinitions.Youareencouragedtopayexplicitattentiontotheformalities:intheend,theywillprovetobemuchmoreconvenienttousethanverboseverbaldescriptions.Thelatteroftensimplyfailtobepreciseenoughtocompletelyunderstandwhatisgoingon.Itisalsonotthatdifficult,asmostnotationscomedirectlyfromsettheory.2.1FormalitiesLetusstartwithdiscussingwhatisactuallymeantbyanetwork.Tothisend,wefirstconcentrateonsomebasicformalconceptsandnotationsfromgraphtheory,togetherwithafewfundamentalpropertiesthatcharacterizenetworks.Afterhavingstudiedthissection,youwillhavealreadylearnedalotabouttheworldofgraphsandshouldalsofeelmorecomfortablewithmathematicalnotations.GraphsandvertexdegreesAssaid,thenetworksthathavebeenintroducedsofararemathematicallyknownasgraphs.Initssimplestform,agraphisacollectionofverticesthatcanbeconnectedtoeachotherbymeansofedges.Inparticular,eachedgeofgraphjoinsexactlytwovertices.Usingaformalnotation,agraphisdefinedasfollows.Definition2.1:AgraphGconsistsofacollectionVofverticesandacollectionedgesE,forwhichwewriteG=(V,E).Eachedgee2Eissaidtojointwo18 vertices,whicharecalleditsendpoints.Ifejoinsu,v2V,wewritee=hu,vi.Vertexuandvinthiscasearesaidtobeadjacent.Edgeeissaidtobeincidentwithverticesuandv,respectively.WewilloftenwriteV(G)andE(G)todenotethesetofverticesandedgesassociatedwithgraphG,respectively.Itisimportanttorealizethatanedgecanactuallyberepresentedasanunorderedtupleoftwovertices,thatis,itsendpoints.Forthisreason,wemakenodistinctionbetweenhv,uiandhu,vi:theybothrepresentthefactthatvertexuandvareadjacent.Thisdefinitionmayalreadyraiseafewquestions.Firstofall,isitpos-siblethatanedgejoinsthesamevertices,thatis,cananedgeformaloop?Thereisnothinginthedefinitionthatpreventsthis,andindeed,suchedgesareallowed.Likewise,youmaybewonderingwhethertwoverticesuandvmaybejoinedbymultipleedges,thatis,asetofedgeseachhavinguandvastheirendpoints.Indeed,thisisalsopossible,andweshallbediscussingafewexamplesshortly.Agraphthatdoesnothaveloopsormultipleedgesiscalledsimple.Likewise,thereisnothingthatprohibitsagraphfromhavingnoverticesatall.Ofcourse,inthatcasetherewillalsobenoedges.Suchatrivialgraphiscalledempty.Anotherspecialcaseisformedbyasimplegraphhavingnvertices,witheachvertexbeingadjacenttoeveryothervertex.Thisgraphisalsoknownasacompletegraph.AcompletegraphwithnverticesiscommonlydenotedasKn.Finally,thecomplementofagraphG,denotedasGisthegraphobtainedfromGbyremovingallitsedgesandjoiningexactlythoseverticesthatwerenotadjacentinG.ItshouldbeclearthatifwetakeagraphGanditscomplementG“together,”weobtainacompletegraph.Takingtwographs“together”willbemademorepreciselaterinthischapter.Asanaside,noticethatwhenwewritehu,vi,wecansayonlythatuandvareadajacent,thatis,thatthereisatleastoneedgethatjoinsthetwo.Strictlyspeaking,itisnotpossibleusingthisnotationtodistinguishdiffer-entedgesthatallhappentojoinbothuandv.Ifwewantedtomakethatdis-tinction,wewouldhavetowritesomethinglikee1=hu,viande2=hu,vi.Inotherwords,wewouldhavetoexplicitlyenumeratetheedgesthatjoinuandv.Ofcourse,whendealingwithsimplegraphs,therecanbenomistakeaboutwhichedgeweareconsideringwhenwewritehu,vi.Hereweseeanexamplewheremathematicsallowsustobepreciseandunambiguous.Wewillencountermanymoreofsuchexamples.Asinsomanypracticalsituations,itisoftenconvenienttotalkaboutyourneighbors.Ingraph-theoreticalterms,theneighborsofavertexuareformedbytheverticesthatareadjacenttov,or,inotherwords,thosever-19 ticestowhichvhasbeenjoinedbymeansofanedge.Wecanformulatethispreciselyusingformalmathematicalnotationsasfollows.Definition2.2:ForanygraphGandvertexv2V(G),theneighborsetN(v)ofvisthesetofvertices(otherthanv)adjacenttov,thatisN(v)def=fw2V(G)jv6=w,9e2E(G):e=hu,vigNote2.1(Mathematicallanguage)TheformalnotationisDefinition2.2isveryprecise,yetcanbesomewhatin-timidating.Letusdecypheritabit.First,weusethesymboldef=toexpressthatwhatiswrittenontheleft-handsideisdefinedbywhatiswrittenontheright-handside.Inotherwords,N(v)def=...isnothingbutaccuratelystatingthatN(v)isdefinedbywhatfollowsontherighthandofdef=.Recallthatthesymbol‘9’istheexistentialquantifierusedinsettheorytoexpressstatementslike“thereexistsan...”Keepingthisinmind,youshouldnowbeabletoseethattheright-handsidetranslatesintoEnglishtothefollowingstatement:ThesetofverticeswinG,withwnotequaltov,suchthatthereexistsanedgeeinGthatjoinsvandw.Wewillbeencounteringmanymoreoftheseformalstatements.Ifyouhavetroublecorrectlyinterpretingthem,weencourageyoutomaketranslationslikethepreviousonetoactuallypracticereadingmathematics.Afterawhile,youwillnoticethatthesetranslationscomenaturallybythemselves.Theword“graph”comesfromthefactthatitisoftenveryconvenienttouseagraphicalrepresentation,asshowninFigure2.1.Inthisexample,wehaveagraphGwitheightverticesandatotalof18edges.Eachvertexisrepresentedasablackdotwhereasedgesaredrawnaslines.Whendrawingagraph,itisoftenconvenienttoaddlabels.Bothverticesandedgescanbelabeled.Weshallgenerallynotusesubscriptswhenlabelingverticesandedgesinourdrawingsofgraphs.Thismeansthatalabelsuchase13fromFigure2.1isthesamease13inourtext.Itshouldbeclearthattheremaybemanydifferentwaystodrawagraph.Inthefirstplace,thereisnoreasonwhywewouldsticktojustdotsandlines,althoughitiscommonpracticetodoso.Secondly,thereare,inprin-ciple,norulesconcerningonwheretopositionthedrawnvertices,norarethereanyrulesstatingthatalineshouldbedrawninastraightfashion.However,thewaythatwedrawgraphsisoftenimportantwhenitcomestovisualizingcertainaspects.WereturntothisissueextensivelyinSection2.4.20 v3V(G)=fv1,...,v8ge16E(G)=fe1,...,e18ge5v2e4e1=hv1,v2ie10=hv6,v7ie2=hv1,v5ie11=hv5,v7ie3v4e1e6e3=hv2,v8ie12=hv6,v8ie8e15e4=hv3,v5ie13=hv4,v7ie2v1v5e18v8e5=hv3,v4ie14=hv7,v8ie11e13e6=hv4,v5ie15=hv4,v8ie7e14e7=hv5,v6ie16=hv2,v3ie9e12e17e8=hv2,v5ie17=hv1,v7ie10v7e9=hv1,v6ie18=hv5,v8iv6Figure2.1:Anexampleofagraphwitheightverticesand18edges.Animportantpropertyofavertexisthenumberofedgesthatareinci-dentwithit.Thisnumberiscalledthedegreeofavertex.Definition2.3:Thenumberofedgesincidentwithavertexviscalledthedegreeofv,denotedasd(v).Loopsarecountedtwice.LetusconsiderourexamplefromFigure2.1again.Inthiscase,becausetherearefouredgesincidentwithvertexv1,wehavethatd(v1)=4.Wecancompletethepicturebyconsideringeveryvertex,whichgivesus:VertexDegreeIncidentedgesNeighborsv14hv1,v2i,hv1,v5i,hv1,v6i,hv1,v7iv2,v5,v6,v7v24hv1,v2i,hv2,v3i,hv2,v5i,hv2,v8iv1,v3,v5,v8v33hv2,v3i,hv3,v4i,hv3,v5iv2,v4,v5v44hv3,v4i,hv4,v5i,hv4,v7i,hv4,v8iv3,v5,v7,v8v57hv1,v5i,hv2,v5i,hv3,v5i,hv4,v5i,hv5,v6i,v1,v2,v3,v4,v6,hv5,v7i,hv5,v8iv7,v8v64hv1,v6i,hv5,v6i,hv6,v7i,hv6,v8iv1,v5,v7,v8v75hv1,v7i,hv4,v7i,hv5,v7i,hv6,v7i,hv7,v8iv1,v4,v5,v6,v8v85hv2,v8i,hv4,v8i,hv5,v8i,hv6,v8i,hv7,v8iv2,v4,v5,v6,v7WhenaddingthedegreesofallverticesfromG,wefindthatthetotalsumis36,whichisexactlytwicethenumberofedges.Thisbringsustoourfirsttheorem:Theorem2.1:ForallgraphsG,thesumofthevertexdegreesistwicethenumberofedges,thatis,åd(v)=2jE(G)jv2V(G)21 Proof.WhenwecounttheedgesofagraphGbyenumeratingforeachver-texvofGtheedgesincidentwiththatvertexv,wearecountingeachedgeexactlytwice.Hence,åv2Gd(v)=2jE(G)j.Note2.2(Mathematicallanguage)Again,weencountersomeformalmathematicalnotations.Inthiscase,weusethestandardsymbolåasanabbreviationforsummation.Thus,ånxisthei=1isameasx1+x2+x3++xn.Inmanycases,thesummationissimplyoverallelementsinaspecificset,suchasinourexamplewhereweconsideralltheverticesinagraph.Inthatcase,ifweassumethatV(G)consistsoftheverticesv1,v2,...,vn,thenotationåv2V(G)d(v)istobeinterpretedas:åd(v)def=d(v1)+d(v2)++d(vn)v2V(G)Note,furthermore,thatweusethenotationjSjtodenotethesizeofasetS.Inourexample,jE(G)jthusdenotesthesizeofE(G)or,inotherwords,thetotalnumberofedgesingraphG.Thereisalsoaninterestingcorollarythatfollowsfromthisproperty,namelythatthenumberofverticeswithanodddegreemustbeeven.ThiscanbeeasilyseenifwesplittheverticesVofagraphintotwogroups:Voddcontainingallverticeswithodddegree,andVevenwithallverticeshavingevendegree.Clearly,ifwetakethesumofallthedegreesfromverticesinVodd,andthosefromVeven,wewillhavesummedupallvertexdegrees,thatis,åd(v)+åd(v)=åd(v)v2Voddv2Vevenv2Vwhichiseven.Becausethesumofevenvertexdegreesisobviouslyeven,weknowthatåv2Vevend(v)iseven.Thiscanonlymeanthatåv2Voddd(v)mustalsobeeven.CombiningthiswiththefactthatallvertexdegreesinVoddareodd,weconcludethatthenumberofverticeswithodddegreemustbeeven,thatis,jVoddjiseven.Wehavethusjustproven:Corollary2.1:Foranygraph,thenumberofverticeswithodddegreeiseven.Thevertexdegreeisasimple,yetpowerfulconcept.Asweshallseethroughoutthistext,vertexdegreesareusedinmanydifferentways.Forexample,whenconsideringsocialnetworks,wecanusevertexdegreestoexpresstheimportanceofapersonwithinasocialgroup.Also,whenwediscussthestructureofreal-worldcommunicationnetworkssuchastheIn-ternet,itwillturnoutthatwecanalearnalotbyconsideringthedistributionofvertexdegrees.Morespecifically,bysimplyorderingverticesbytheir22 vertexdegree,wewillbeabletoobtaininsightinhowsuchanetworkisactuallyorganized.DegreesequenceListingthevertexdegreesofagraphgivesusadegreesequence.Thevertexdegreesareusuallylistedindescendingorder,inwhichcasewerefertoanordereddegreesequence.Forexample,ifweconsidertheeightverticesofgraphGfromFigure2.1,wehavethefollowingvertexdegreesvertex:v1v2v3v4v5v6v7v8degree:44347455which,whenorderingthesedegreesindescendingorder,leadstotheor-dereddegreesequence[7,5,5,4,4,4,4,3]Ifeveryvertexhasthesamedegree,thegraphiscalledregular.Inak-regulargrapheachvertexhasdegreek.Asaspecialcase,3-regulargraphsarealsocalledcubicgraphs.Whenconsideringdegreesequences,itiscommonpracticetofocusonlyonsimplegraphs,thatis,graphswithoutloopsandmultipleedges.Aninterestingquestionthatcomestomindiswhenwearegivenalistofnum-bers,istherealsoasimplegraphwhosedegreesequencecorrespondstothatlist?Therearesomeobviouscaseswherewealreadyknowthatagivenlistcannotcorrespondtoadegreesequence.Forexample,wehavejustproventhatthesumofvertexdegreesisalwayseven.Therefore,amini-malrequirementisthatthesumoftheelementsofthatlistshouldbeevenaswell.Likewise,itisnotdifficulttoseethat,forexample,thesequence[4,4,3,3]cannotcorrespondtoadegreesequence.Inthiscase,ifthiswereadegreesequence,wewouldbedealingwithagraphoffourvertices.Thefirstvertexissupposedtohavefourincidentedges.Inthecaseofsimplegraphs,eachoftheseedgesshouldbeincidentwithadifferentvertex.How-ever,thereareonlythreeverticeslefttochoosefrom,so[4,4,3,3]cannevercorrespondtothedegreesequenceofasimplegraph.Ofcourse,takingatrial-and-errorapproachtoseewhetheralistcorre-spondstoadegreesequenceisnotthewaytogo.Fortunately,thereisasystematicwaytoseewhetheragivenlistofnumberscorrespondstothedegreesequenceofasimplegraph,inwhichcasethesequenceissaidtobegraphic.Let’sreturntoourgraphfromFigure2.1,butnowassumethatwearegivenonlythelist[7,5,5,4,4,4,4,3].Weaskourselveswhetherthislistisgraphic.Ifthisisthecase,weshouldbeabletoconstructagraphthathasthisdegreesequence.NotethatthisgraphneednotnecessarilybethesameastheonefromFigure2.1.Thisishowwecanaddressthisissue.23 •Consider[7,5,5,4,4,4,4,3].Ifthissequenceisgraphiccorrespond-ingtoagraph,sayG1,thenweshouldbeabletoconstructG1fromanothergraphG2byaddingavertexv1toG2andjoiningv1tosevenotherverticesfromG2.ThiswouldthenexplainthatG1hasavertexwithhighestdegree7.Notethatforthisconstructiontowork,itisnecessarythatwecanconstructG2.Itshouldbeclearthatifwedonotchangetheorderingofvertexde-grees,thatthedegreesequenceofG2isequalto[4,4,3,3,3,3,2].First,itcontainsoneelementlessthanthedegreesequenceofG1.Second,thefirstelementofthedegreesequenceofG2correspondstothesec-ondelementofG1’sdegreesequence:it’sthedegreeofthesamever-tex,yetforG2itshouldbeonelessthaninG1becausethisvertexisnotyetjoinedtotheaddedvertexv1.Likewise,thesecondelementofG2’sdegreesequencecorrespondstothethirdoneinthedegreesequenceofG1,andsoon.•If[4,4,3,3,3,3,2]isgraphicwecanapplythesametrick:G2shouldbeconstructablefromagraphG3byaddingavertexv2andjoiningv2tofourverticesfromG3.Followingacompletelyanalogousprocedureasbefore,v2isjoinedtotheverticesfromG3suchthattheseverticeswillthenhavevertexdegree4,3,3,and3,respectively.ThiscanonlymeanthatinG3theywillhavedegree3,2,2,and2,respectively,lead-ingtothefollowinglist:[3,2,2,2,3,2].Notethatinthisexample,thefifthelementisthesameasthesixthelementinthedegreesequenceofG2.Thefirstfourelementsrepresentverticesthatwillbejoinedtothenewvertexv2.Theotherelementsrepresentverticesthatremainuntouched,andwillthushavethesamenumberofincidentedgesinG2.•Continuingthislineofreasoning,if[3,3,2,2,2,2]isthe(nowordered)degreesequenceofG3,thenweshouldbeabletoconstructG3fromagraphG4towhichwehaveaddedavertexv3.Thisvertexwouldbejoinedtotheverticeshavingdegree2,1,and1inG4,respectively,yieldingthelist[2,1,1,2,2].Again,notethatthislistcontainsoneelementlessthanthedegreesequenceofG3,butthatnowitsfourthandsubsequentelementsrepresentverticesthathavethesamevertexdegreeinG4andG3.•Wenowhavethatiforderedlist[2,2,2,1,1]isgraphic,thensoshould[1,1,1,1],correspondingtoagraphG5.•Likewise,if[1,1,1,1]isgraphic,thensoshouldthelistofvertexde-grees[0,1,1]correspondtoagraphG6.•Finally,iftheorderedlist[1,1,0]isgraphic,thensoshould[0,0],24 whichistrue:itisagraphG7withtwoverticesandnoedges.Wecansafelyconcludethatthesequence[7,5,5,4,4,4,4,3]indeedcorre-spondstoasimplegraph.TheconstructionofthegraphG1isillustratedinFigure2.2whichshowshoweachgraphG1,G2,...,G6isconstructedbyaddingavertextothepreviousone,startingfromgraphG7.TheanswertowhetherG1isthesameasthegraphfromFigure2.1isaquestionwedeferuntillater.Infact,itturnsouttobequestionthatisgenerallynoteasytoresolve.G7G6G5G4G3G2G1Figure2.2:TheconstructionofgraphG1frompreviousgraphsbasedondegreesequences.Intuitively,itshouldbeclearthatwehavejustintroducedasystematicwayofcheckingwhetheragivenlistofnumberscorrespondstothedegreesequenceofagraph.Italsoformstheessenceoftheproofofthefollowingtheoremthattellsuswhenalistofnumbersisindeedgraphic.Theorem2.2(Havel-Hakimi):Consideralists=[d1,d2,...,dn]ofnnumbersindescendingorder.Thislistisgraphicifandonlyifs=[d,d,...,d]of12n1n1numbersisgraphicaswell,where(di+11fori=1,2,...,d1di=di+1otherwise25 Note2.3(Mathematicallanguage)Notethatthistheoremconsistsoftwostatements:1.ifsisgraphicthensoiss2.ifsisgraphicthensoissThisisthemeaningof“ifandonlyif,”whichisoftenabbreviatedtoiff.Wewillencountermoreofsuchtheorems,andinordertoprovethemcorrect,proofsinthesecaseswillalwaysconsistoftwoparts.ProofofTheorem2.2.Toprovethistheorem,letusfirstassumethatsisgraphic.Wethenneedtoshowthatsisalsographic.LetGbeasim-plegraphwithdegreesequences.WenowconstructasimplegraphGfromGwithdegreesequencesasfollows(andindoingso,weshowthatsisgraphic).TakeGandaddavertexu.Forreadability,letk=d1andconsiderthekverticesv1,v2,...,vkfromGhavingrespectivelyde-greed,d,...,d.Wethenjointheseverticestothenewlyaddedvertex12ku.Obviously,unowhasdegreek,butalsoeachvertexvinowhasdegreed+1.BecauseallotherverticesofGarenotjoinedwithu,theirvertexidegreeisleftunaffected.Asaconsequence,thenewlyconstructedgraphGhasdegreesequence[k,d+1,d+1,...,d+1,d,...,d],whichis12kk+1n1preciselys.Letusnowconsidertheopposite:ifsisgraphic,weneedtoshowthatsissoaswell.Inotherwords,weneedtofindagraphGthathasdegreesequences.Tothisend,weconsiderthreedifferentsetsofverticesfromG.Letubeavertexwithdegreek=d1.LetV=fv1,v2,...,vkgbethere-spectiveverticeswiththeknexthighestdegreesd2,d3,...,dk+1.Finally,letW=fw1,w2,...,wnk1gbetheremainingnk1verticeswithdegreedk+2,dk+3,...,dn,respectively.ConsiderthegraphGbyremovingufromG,alongwiththekedgesincidentwithu.IfeachoftheseedgesisincidentwithoneoftheverticesfromV,thenobviouslyGisagraphwithdegreesequence(d21,d31,...,dk+11,dk+2,...,dn),whichispreciselys.NowconsiderthesituationthatuisadjacenttoavertexfromW,saywi.Ifforsomevertexvj2V,thedegreeofvjandwiarethesame,i.e.,d(wi)=d(vj),thenwecansimplyswapwiandvjintheoriginalconstructionofthesetsVandW,meaningthathu,wiiisnowanedgeincidentwithavertexfromVinsteadofW.However,ifd(wi)d(wi),thereisavertexxadjacenttovjbutnotadjacenttowi(notealsothatx6=u),asshowninFig-ure2.3(a).InconstructingGwenowfirstremoveedgeshu,wiiandhvj,xi,andthenaddedgeshx,wiiandhu,vji,leadingtothesituationshowninFig-ure2.3(b).TheeffectisthatwenowhaveagraphG0inwhichuisadjacenttovjinsteadofwi,butwithoutaffectingthedegreeofu,vj,x,orwi.Inotherwords,G0hasthedegreesequences.IfuisnowadjacenttoverticesonlyfromV,wehavealreadyshownthatsisgraphic.IfuisstilladjacenttoavertexfromW,weapplythesamemethodtoconstructagraphG00inwhichuisadjacenttoonemorevertexfromV.Ifnecessary,werepeatthismethoduntiluisadjacentonlytoverticesfromV,atwhichpointweknowthatsisgraphic.xxwiwiuuvjvj(a)(b)Figure2.3:ChangingagraphsothatitmeetsthesetsVandWoftheHavel-Hakimiproof.Note2.4(Prooftechniques)TheproofoftheHavel-Hakimitheoremillustratesanumberofimportantissuesingraphtheory.Inthefirstplace,itisaproofbyconstruction.InthecaseoftheHavel-Hakimitheoremthismeansthatweshowthatthetheoremholdsbyactuallyconstructingagraphfromagivendegreesequence.Ingeneral,prov-ingpropertiesbyconstructionisverypowerful:notonlydowedemonstratetheexistenceofaproperty,wealsoshowhowtogetthere.Incontrast,withnon-constructiveproofswemerelyprovethatsomepropertymustexist,oftenbyfirstassumingthatitdoesnotexistandsubsequentlyarrivingatacontradiction.Wewillcomeacrossmoreoftheseproofs,butalsoonesinwhichwemerelyshowthatapropertymustexist,withoutgivingagraphthathasthespecificproperty.AnotherimportantissueinprovingtheHavel-Hakimitheorem,isthatweshowthepowerofvisualization.Visualizingsituations,eitherexplicitlyonpaperorotherwisemerelyinyourmind,isparticularlyusefulinthecaseofgraphs,andshouldcomeasnosurprise.Whengraphsarestudiedforthefirsttime,itistemptingtodrawcompleteexamples,thatis,graphsinwhicheach27 edgejoinstwovertices.However,asyoubecomemoreexperienced,itturnsoutthatsketchinggraphsasisdoneinFigure2.3isactuallymoreillustrativeasthesedrawingsreflecttheessenceofwhatyouaretryingtoprove.Irrelevantdetailsarethusavoided.Youareencouragedtogoforthesketches.Notethattwographswiththesamedegreesequenceneednotbethesame.Inotherwords,whengivenadegreesequence,itmaybepossibletoconstructseveral,different,graphsthathavethatsequence,asisillustratedinFigure2.4.ThetwographsinFigure2.4(a)havethesamedegreese-quence,yettheyaretrulydifferent.ThesameholdsforthetwographsfromFigure2.4(b).WereturntothenotionofsimilarityofgraphsinSection2.2.(a)(b)Figure2.4:Differentgraphswiththesameordereddegreesequence:(a)[3,3,2,2,2],and(b)[7,5,5,4,4,4,4,3].SubgraphsandlinegraphsAnotherimportantconceptofgraphsisthatofasubgraph.AgraphHisasubgraphofGifHconsistsofasubsetoftheedgesandverticesofG,suchthattheendpointsofedgesinHarealsocontainedinH.Strictlyspeaking,wehavethefollowing:28 Definition2.4:AgraphHisasubgraphofGifV(H)V(G)andE(H)E(G)suchthatforalle2E(H)withe=hu,vi,wehavethatu,v2V(H).WhenHisasubgraphofG,wewriteHG.Asanexample,Figure2.5showsaso-calledcubicgraph(i.e.,3-regulargraph)with8verticesandthreeofitssubgraphs.QG1G2G3Figure2.5:ThecubicgraphQwith8verticesandthreesubgraphsG1,G2,andG3.Whenanalyzingpropertiesofgraphs,itisoftenconvenienttoconsidersubgraphsformedbyaspecificsubsetofvertices.Theseareso-calledin-ducedsubgraphs,whichareconstructedbytakingasubsetVofverticesandaddingeachedgefromtheoriginalgraphthatconnectstwoverticesfromV.Formally,wehave:Definition2.5:ConsideragraphGandasubsetVV(G).ThesubgraphinducedbyVhasvertexsetVandedgesetEdefinedbyEdefg=fe2E(G)je=hu,viwithu,v2VLikewise,ifEE(G),thesubgraphinducedbyEhasedgesetEandavertexsetVdefinedbyVdef:e=hu,vig=fu,v2V(G)j9e2EThesubgraphinducedbyVorEiswrittenasG[V]orG[E],respectively.Clearly,everysimplegraphG=(V,E)havingnverticescanbeseenasasubgraphofthecompletegraphKn.Moreover,ifweconsideritscom-plementG=(V,E),thentheunionofGandG,thatis,thegraphwithvertexsetVandedgesetE[E,correspondstoKn.Thisiswhatwehavepreviouslycoinedtakingtwographs“together.”Somewhatrelatedtothenotionofaninducedsubgraphisthatofalinegraph.29 Definition2.6:ConsiderasimplegraphG=(V,E).ThelinegraphofG,denotedasL(G)isconstructedfromGbyrepresentingeachedgee=hu,vifromEbyavertexveinL(G),andjoiningtwoverticesveandveifandonlyifedgeseandeareincidentwiththesamevertexinG.Toillustrate,considerthegraphshowninFigure2.6(a),containingfourver-ticesandsixedges.Itslinegraph,showninFigure2.6(b),consistsofsixvertices.e1e1e4e2e4e6e6e5e5e3e2e3(a)(b)Figure2.6:(a)AgraphGand(b)itslinegraphL(G).Note2.5(Mathematicallanguage)Notethatweusedoneofthoseawkward,yetprecisemathematicalstatementswhendefiningasubgraphinducedbyasetofedges.Inthiscase,themathe-maticalstatementVdef:e=hu,vig=fu,v2V(G)j9e2EshouldbetranslatedintoplainEnglishasfollows:VisthesetofverticesfromV(G)formedbytheendpointsofedgesinE.Ifwewouldliterallytranslatefrommath,wewouldhaveVisdefinedbyallverticesuandvfromV(G)forwhichthereexistsanedgeinEthatjoinsuandv.Whenreadingthissecondversion,itisimportanttotrytomoveawayfromallthemathandcomeupwithsomethinglikethefirstone,whichismoreintuitiveandactuallysimpler.Aspecialinducedsubgraphistheonebywhichwesimplyremoveaspecificvertex,sayv:G[V(G)nfvg].Wecameacrossthistypeofgraphinourproof30 ofTheorem2.2.InsteadofusingthenotationG[V(G)nfvg]wewilloftensimplywriteGv.Likewise,ifeisanedge,wewilloftenwriteGein-steadofG[E(G)nfeg].Similarsimplifiednotationswillbeusedwhendeal-ingwithsubsetsofverticesoredges,respectively.2.2GraphrepresentationsItshouldbeclearfromthepresentationsofarthatgraphscanbedrawnindifferentways,butalsothatwhenconsideringtheirformaldefinition,theyaremerelydescribedintermsofverticesandedges.Letusnowpayatten-tiontohowwecanconvenientlyrepresentgraphs.Thisissueisparticularlyimportantwhenweneedtorepresentverylargegraphsforautomatedpro-cessingbycomputers.DatastructuresTherearedifferentwaystorepresentgraphs.Perhapsthemostappealingoneistouseanadjacencymatrix.ConsideragraphGwithnverticesandmedges.ItsadjacencymatrixisnothingelsebutatableAwithnrowsandncolumnswithentryA[i,j]denotingthenumberofedgesjoiningvertexviandvj.Toillustrate,Figure2.7showsasimplegraphwithitsaccompanyingadjacencymatrix.Itisnotdifficulttoseethatthefollowingpropertieshold:•Anadjacencymatrixissymmetric,thatisforalli,j,A[i,j]=A[j,i].Thispropertyreflectsthefactthatanedgeisrepresentedasanunorderedpairofverticese=hvi,vji=hvj,vii.•AgraphGissimpleifandonlyifforalli,j,A[i,j]1andA[i,i]=0.Inotherwords,therecanbeatmostoneedgejoiningverticesviandvjand,inparticular,noedgejoiningavertextoitself.•Thesumofvaluesinrowiisequaltothedegreeofvertexvi,thatis,d(v)=ånA[i,j].ij=1Asanalternative,wecanalsouseanincidencematrixofagraphasitsrepresentation.AnincidencematrixMofgraphGconsistsofnrowsandmcolumnssuchthatM[i,j]countsthenumberoftimesthatedgeejisincidentwithvertexvi.NotethatM[i,j]iseither0,1,or2:anedgecanbeonlynotincidentwithvertexvi,ithasvertexviasexactlyoneofitsendpoints,orisaloopjoiningvertexviwithitself.Figure2.8showstheincidencematrixforthegraphfromFigure2.7.Again,thefollowingpropertiesareeasytoverify:31 e1v1e2v1v2v3v4e7v12110e3v2v21020v4e6v31201e4v40012e5v3Figure2.7:Agraphwithitsassociatedadjacencymatrix.•AgraphGhasnoloopsifandonlyifforalli,j,M[i,j]1.•Thesumofallvaluesinrowiisequaltothedegreeofvertexvi.Inmathematicalterms,thisisexpressedas8i:d(v)=åmM[i,j].ij=1•Becauseeachedgehasexactlytwo,notnecessarilydistinctendpoints,weknowthatforallj,ånM[i,j]=2.i=1e1v1e2e1e2e3e4e5e6e7e7v12110000e3v2v20100110v4e6v30011110e4v40001002e5v3Figure2.8:Agraphwithitsassociatedincidencematrix.Oneoftheproblemswithusingeitheranadjacencymatrixoraninci-dencematrixisthatwithoutfurtheroptimizations,thetotalnumberofel-ementsforrepresentingagraphisnnornm,respectively.Thisisnotveryefficientwhenhavingtodealwithverylargegraphs,especiallywhenthenumberofedgesisrelativelysmall.Toseewhythisistrue,considertherepresentationofanadjacencymatrixinacomputer.Assumethatweuseonlyasinglebytetocountthenumberofedgesjoiningapairofvertices.Withoutanyfurtheroptimizations,agraphwith100,000verticeswouldre-quireatotalof100,000100,000bytesofstorage,thatis,closeto10Gbyte.Usinganincidencematrixandassumingatotalof250,000edges,astraight-forward,nonoptimizedrepresentationwouldrequirecloseto25Gbytesof32 storage.Bothrepresentations,evenwhenapplyingallkindsofstorageop-timizations,generallytendtoberatherinefficient.Anoftenmoreefficientrepresentation,andusedinpractice,isthatofanedgelist.Inthiscase,wemerelylisttheedgesofagraphGbyspecifyingforeachedgewhichverticesitisincidentwith.Notethatthisrepresenta-tiongrowslinearlywiththenumberofedges.Forexample,theedge-listrepresentationofthegraphfromFigure2.8is:(hv1,v1i,hv1,v2i,hv1,v3i,hv2,v3i,hv2,v3i,hv3,v4i,hv4,v4i)Inparticular,withmedges,wewouldneedtostoreonly2mdataitems.Assumingthatavertexcanberepresentedbyfourbytes,thismeansthatforourexamplegraphwith100,000verticesand250,000edges,wewouldneedonlycloseto2Mbytesofstorage.Inpractice,thisnumberwillbelargerbecauseweneedadditionaldatastructurestoeasilynavigatethroughtheedgelist.Nevertheless,thetotalamountofrequiredstoragewillgenerallystaysignificantlylessthanwhatisneededforanadjacencyorincidencematrix.Itshouldbeclearthatbysimplygoingthroughthislist,wealsofindtheverticesoftheassociatedgraph,providedthateachvertexisincidentwithatleastoneedge.Inpractice,anedgelistisoftenaccompaniedbyalistofvertices,forexample,todescribeattachedlabels(suchas“v1”).GraphisomorphismAnimportantobservationisthatalltheserepresentationsareindependentofthewaythatwedrawagraph.ConsiderthegraphsshowninFigure2.9.Nomatterwhetherwerepresenteachgraphbyitsadjacencymatrix,inci-dencematrix,oredgelist,ifweproperlyattachlabelstoverticesandedges,wewillfindthattheirrespectiverepresentationsareexactlythesame.Asaconsequence,theyshouldalsobeconsideredtobethesame.Thisnotionofsimilarityisformalizedthroughwhatisknownasgraphisomorphism.Definition2.7:ConsidertwographsG=(V,E)andG=(V,E).GandGareisomorphicifthereexistsaone-to-onemappingf:V!Vsuchthatforeveryedgee2Ewithe=hu,vi,thereisauniqueedgee2Ewithe=hf(u),f(v)i.Stateddifferently,twographsGandGareisomorphicifwecanuniquelymaptheverticesandedgesofGtothoseofGsuchthatiftwoverticeswerejoinedinGbyanumberofedges,theircounterpartsinGwillbejoinedbythesamenumberofedges.33 Figure2.9:Sixdifferentdrawingsofgraphswiththesamerepresentation,thatis,isomorphicgraphs.Note2.6(Mathematicallanguage)Couldn’twejusttalkaboutthesamegraphs,youmightwonder,insteadofusingatermlikeisomorphism?However,“isomorphism”isawell-definedmathemat-icalconceptthatisusedformorethanjustgraphs.Inessence,itisusedinthosesituationswherewearedealingwithsets(likevertices),andthattheelementsinthosesetsaresomehoworganizedinaspecificway.Isomorphismisthenusedtoexpressthattwosetshaveessentiallythesameelementswhenyouignorelabelingissues,butalsothattheirorganizationisthesame.Anisomorphismisthenastructure-preservingmappingbetweentwosets.Inmanycases,checkingwhethertwographsareisomorphicisrelativelysimpleasthereareanumberofimportantnecessaryrequirementsthatneedtobefulfilled.Forexample,itshouldbeobviousthatthetwographsneedtohavethesamenumberofverticesandedgesinordertobeisomorphic.Astrongerrequirementisthattheyhavethesameordereddegreesequence.Thismaysameobvious,butifwewanttobeprecise,showingtheobviousmayturnouttobemorecumbersomethanexpected.Let’sconsiderthefollowingformalformulation.34 Theorem2.3:IftwographsGandGareisomorphic,thentheirrespectiveordereddegreesequencesshouldbethesame.Proof.Letfbetheone-to-onemappingbywhichGandGareknowntobeisomorphic.ConsidervertexufromGanditsadjacentverticesv1,...,vk.Bydefinition,eachedgeei=hu,viiincidentwithuinGismappedtoauniqueedgee=hf(u),f(vi)iinG.Becauseeachedgeeisincidentwithiif(u),wemusthavethatd(u)d(f(u)).Nowconsideravertexv2V(G)thatisadjacenttof(u).Bydefini-tionofisomorphism,weknowthattheedgee=hf(u),vimustuniquelymaptoanedgee=hf1(f(u)),f1(v)iinG,wheref1denotestheinversemappingoff.Becausefisaone-to-onemapping,wealsoknowthatf1(f(u))=u,andthusthate=hu,f1(v)i.Inotherwords,everyedgeincidentwithf(u)inGwillbeincidentwithuinG.Thismeansthatd(f(u))d(u).Weconcludethatd(u)=d(f(u))forallverticesofG,implyingthattheordereddegreesequencesofGandGshouldbethesame.Unfortunately,thistheoremgivesusonlyanecessaryconditionfortwographstobeisomorphic,yetitisnotasufficientcondition.Inotherwords,iftwographshavethesameordereddegreesequence,thenthatfactaloneisnotsufficienttoconcludethattheyarealsoisomorphic.Yettobeisomor-phic,itisnecessaryfortheirrespectiveordereddegreesequencestobethesame.Note2.7(Mathematicallanguage)Thedifferencebetweennecessaryandsufficientconditionsseemsanobviousone,yettheyaresurprisinglyoftenconfusedinmathematicalproofs.Formally,ingraphtheory,conditionsareusedtoprovepropertiesofgraphs.Whenacon-ditionCissaidtobenecessary,thismeansthatapropertyPcanholdonlyifCismet.WhenaconditionCissaidtobesufficient,thismeansthatifCismet,thenpropertyPwillholdtrue.Andindeed,whenpropertyPistrueifandonlyifconditionCismet,indicatesthatCisanecessaryandsufficientconditionforpropertyPtobevalid.Toillustrate,considerthegraphsfromFigure2.4(a),whichareshownagaininFigure2.10.Althoughtheyhavethesameordereddegreesequence,theyarenotisomorphic.Onewayofseeingthisisthatthetwoverticeswithdegree3areadjacenttooneanotherinG,butnotinG.(Thereareotherstructuraldifferences,yetexplainingtheserequirestheintroductionofmoregraphconcepts,whichwedeferuntillater.)35 GGFigure2.10:Twononisomorphicgraphswiththesameordereddegreesequence.Thebadnewsisthattherearenoknowneasysufficientconditionsthatwilltellusingeneralwhethertwographsareisomorphicornot.Essen-tially,thismeansthatoncewehavefoundthatallnecessaryconditionshavebeenfulfilled,wewillhavetoresorttoatrial-and-errormethod.Forexam-ple,withthegraphsfromFigure2.10,wewereabletosuccessfullyconsiderwhetherthehighest-degreeverticeswereadjacentinbothgraphs.Inothercases,however,wemayhavetolookatotherproperties.Note2.8(Moreinformation)Intheworstcase,wemayhavetoresorttoanexhaustivemethod.ConsideragraphGwithnverticesfv1,v2,...,vng,andagraphGalsowithnvertices.Tocheckforisomorphism,weneedtofindaone-to-onemappingbetweenthesetwovertexsets.Withanexhaustiveapproach,wesimplygothroughallpossi-blemappingstoseeifthereisonethatestablishesisomorphism.Unfortunately,theremaybequiteafewmappingsthatweneedtocheck.Tobeprecise,therearepotentiallyn!mappingstoconsider,wheren!def=n(n1)(n2)21(tobepronouncedasnfactorial).Thisisrelativelyeasytoseeasfollows.Foranymapping,wehavenchoicesformappingv1tooneoftheverticesfromG.Afterthat,therearen1possibilitiesleftformappingv2toavertexfromG,andthenanothern2formappingv3,andsoon.Finally,afterhavingmadeachoiceforeachvertexv1,v2,...,vn1,wehaveonlyonemoreoptionleftforvn.Checkingn!mappingsisnopleasuregame—considerthefollowingtable:nn!nn!nn!1167201139,916,800227504012479,001,60036840,320136,227,020,8004249362,8801487,178,291,2005120103,628,800151,307,674,368,00036 Infact,forlargen,itsfactorialcanbeapproximatedbypnn!2pn()newhichreachesamazinglyhighnumbersevenforrelativelysmallvaluesofn.Thereisalsonochancethatbrute-forcecomputationswithacomputerarego-ingtobringanyserioushelphere.Forexample,ifacomputerwereabletocheckwhetheronespecificmappingcouldestablishisomorphismbetweentwographsinonly1nanosecond(whichis109seconds),itwouldstilltakeabout500yearstogothroughallpossiblemappingsfortwo25-vertexgraphs.Moreclevernessisneeded.Wenotethatalgorithmsdoexistthatcanefficientlytestisomorphismformanygraphsuptoapproximately100vertices,withperhapsthefastestonebeingnautydevisedbyMcKay[1980].Also,efficientalgorithmsexistforgraphsforwhichthemaximalvertexdegreeisknowntobeboundbyaconstant[Luks,1982].2.3ConnectivityInallthegraphswehaveconsideredsofar,eachvertexvcouldbereachedfromanyothervertexwinthesensethatwecouldindicateachainofad-jacentverticesfromvtow.Inthissection,wewilltakeacloserlookatthisimportantconceptofconnectivity.Westartwithsomebasicterminology:Definition2.8:ConsideragraphG.A(v0,vk)-walkinGisanalternatingsequence[v0,e1,v1,e2...vk1,ek,vk]ofverticesandedgesfromGwithei=hvi1,vii.Inaclosedwalk,v0=vk.Atrailisawalkinwhichalledgesaredistinct;apathisatrailinwhichalsoallverticesaredistinct.Acycleisaclosedtrailinwhichallverticesexceptv0andvkaredistinct.Usingthenotionofapath,wedefineagraphtobeconnectedwhenthereisapathbetweeneachpairofdistinctvertices.Formally,wehave:Definition2.9:TwodistinctverticesuandvingraphGareconnectedifthereexistsa(u,v)pathinG.Gisconnectedifallpairsofdistinctverticesareconnected.Clearly,allthegraphswehaveconsideredsofarareindeedconnected.However,thereisnoreasontoassumethatagraphisalwaysconnected.Ifwetakealookatthedefinitionofagraph,thereisnothingtherethatstatesthatallverticesshouldbeconnected.Intuitively,thismeansthatagraphcouldalsoconsistasacollectionofcomponents,whereeachcomponentisaconnectedsubgraph.Thiscanbemadepreciseasfollows:37 Definition2.10:AsubgraphHofGiscalledacomponentofGifHisconnectedandnotcontainedinaconnectedsubgraphofGwithmoreverticesoredges.ThenumberofcomponentsofGisdenotedasw(G).Notethatacomponentisnotjustasubgraph:itisamaximal,connectedsub-graph.Inotherwords,ifwewouldconsiderasubgraphHofagraphGandwouldfindthatthereisavertexnotinHthatisconnectedtoavertexinH,thenHis,bydefinition,notacomponent.Maximalityalsoincorpo-ratesedges,meaningthatifanedgeejoinstwoverticesinG,eshouldbecontainedinH.Thenotionofconnectivityisimportant,notablywhenconsideringtherobustnessofnetworks.Robustnessinthiscontextmeanshowwellthenet-workstaysconnectedwhenweremoveverticesoredges.Forexample,aswementionedinChapter1,theInternetcanbeviewedasa(huge)graphinwhichroutersformtheverticesandcommunicationlinksbetweenrouterstheedges.Inaformalsense,theInternetisconnected.However,ifitwerepossibletopartitionthenetworkintomultiplecomponentsbyremovingonlyasinglevertex(i.e.,router)oredge(i.e.,communicationlink),wecouldhardlyclaimtheInternettoberobust.Infact,itisextremelyimportantfornetworkssuchastheInternettobeabletosustainseriousattacksandfail-uresbywhichroutersandlinksarebroughtdown,suchthatconnectivityisstillguaranteed.Therearemanynetworksforwhichrobustnessinonewayoranotherplaysanimportantrole.Letusnowformalizethisnotionbyconsideringwhatareknownasvertexandedgecuts.Definition2.11:ForagraphGletVV(G)andEE(G).Viscalledavertexcutifw(GV)>w(G).IfVconsistsofasinglevertexv,thenviscalledacutvertex.Likewise,ifw(GE)>w(G)thenEiscalledanedgecut.IfEconsistsofonlyasingleedgee,theneisknownasacutedge.NotethatwehaveusedthenotationGVtoindicatetheinducedsub-graphG[V(G)nV].WhatthedefinitionstatesisthatVisavertexcutofaconnectedgraphiftheremovalofverticesinVfromGwillmakeGdisinte-grateintoseveralcomponents.Inotherwords,Gwillbecomedisconnected.Analogously,anedgecutofGisacollectionofedgesthatwillmakeGfallapartintomultiplecomponentswhenthoseedgesareremoved.Inthedef-initiongivenabove,wehaveusedthesimplernotationGEtoindicatetheinducedsubgraphG[E(G)nE].Ofparticularinterestistheminimalvertexcutforaconnectedgraph.Inotherwords,howmanyverticesdoweneedtoremovefromaconnectedgraphbeforeitbecomesdisconnected?Animportantobservationisthefol-lowing.Letk(G)denotethesizeofaminimalvertexcutforgraphG,and38 likewise,l(G)thesizeofaminimaledgecut.Asitturnsout,k(G)l(G),butalsothatl(G)islessorequaltotheminimalvertexdegree.UsingthenotationminStodenotethesmallestvaluefoundamongtheelementsinsetS,thesepropertiesareformulatedinthefollowingimportanttheorem.Theorem2.4:k(G)l(G)minfd(v)jv2V(G)gProof.Thatl(G)minfd(v)jv2V(G)giseasytosee.Consideravertexuwithminimaldegree,thatis,d(u)=minfd(v)jv2V(G)g.Ifwesimplyremovethed(u)edgesincidentwithu,thenuwillbecomeisolated,andcertainlytheresultinggraphwillhaveatleastonemorecomponentthenithadbefore(namelytheoneconsistingonlyofu).Toprovethatk(G)l(G),consideragraphGwithl(G)=kandletE=fe1,e2,...,ekgbeaminimaledgecutofG,withei=hui,vii.LetUdenotethesetofverticesfu1,...,ukgandVthesetfv1,...,vkg.Notethatinthiscase,theverticesineithersetneednotbedistinct.ThegraphGEwillfallapartintoexactlytwocomponents,sayG1andG2(weleaveittoyoutoshowthatthisisindeedtrue).IfG1containsavertexudistinctfromanyui,asshowninFigure2.11(a),thenclearlyremovingallverticesinUwilldisconnectufromanyvertexinG2,sothatk(G)k.Ifthereisnosuchvertexu,thenassumethatV(G1)=U.Considervertexu1.Weknowthatu1isadjacenttod1verticesfromG1,andeachoftheseneighborsinG1isadjacenttoavertexfromV.LetEbeasetofedges1fromEjoiningverticesfromthed1neighborsofu1andexactlyonevertexfromV.Likewise,letEbethed2edgesfromEincidentwithu1.This2situationisshowninFigure2.11(b).Obviously,d1+d2=jE[EjjEj.12Also,thed1+d2neighboringverticesofu1formavertexcut,shownasopencirclesinFigure2.11(b).Thisalsomeansthatk(G)d1+d2jEj=l(G),completingtheproof.AgraphGforwhichk(G)kforsomekissaidtobek-connected.Like-wise,graphGisk-edge-connectedifl(G)k.Finally,agraphforwhichk(G)=l(G)=minfd(v)jv2V(G)gissaidtobeoptimallyconnected.Note2.9(Studytip)Thepreviousproof,andnotablyprovingthatk(G)l(G),isatypicalexamplewheregraphtheoryrequiresinsight.Theproofisnotobvious,anditcancer-tainlynotbeexpectedthatanundergraduatestudentwouldbeabletodeviseitfromscratch.Whatisimportant,however,isthattheproofitselfisunderstoodwell.Tothisend,youareencouragedtostartwithreproducingproofs,asthiswillenforceyoutocarefullythinkabouteverystepthatistaken.Simplybeing39 E*2G1u1v1G2G1u1G2uvkukE*1(a)(b)Figure2.11:ThetwoscenariosfortheproofofTheorem2.4.abletoreproduceproofsisawell-knowntechniquetosuccessfullystudygraphtheory.WhatTheorem2.4tellsusisthateverygraphisatmostdmin-edgecon-nected,andatmostdmin-connected,wheredmin=minfd(v)jv2V(G)g.Weshowedthisforedgeconnectivity.Vertexconnectivityisalsoeasytosee:simplyremovethedminverticesadjacenttoavertexofdegreedminandthelatterbecomesdisconnected.Ofcourse,findingalowerboundforkismoreinteresting,butthisturnsouttobearelativelydifficultproblemtosolve.Withoutgoingintotheratherintricatedetails,wecansaysomethingaboutalowerboundforkbyconsideringthenotionofpathindependence.Definition2.12:ConsideragraphGandacollectionPof(u,v)-pathsinG,withu,v2V(G).Pisvertexindependentifforall(u,v)-pathsP1,P22PwehavethatV(P1)V(P2)=fu,vg.Thecollectionisedgeindependentifforallits(u,v)-pathsP1andP2,wehavethatE(P1)E(P2)=Æ.Inotherwords,two(u,v)-pathsP1andP2arevertexindependentiftheyshareonlytheverticesuandv,andareedgeindependentiftheyhavenoedgeincommon.Usingpathindependence,wenowcometooneofthemorefundamentaltheoremsingraphtheory,formulatedbytheAustrianmathematicianKarlMenger.Theorem2.5(Menger):LetGbeaconnectedgraphanduandvtwononadjacentverticesinG.Theminimumnumberofverticesinavertexcutthatdisconnectsuandvisequaltothemaximumnumberofpairwisevertex-independentpathsbetweenutov.Analogously,theminimumnumberofedgesinanedgecutthatdisconnectsuandv,isequaltothemaximumnumberofpairwiseedge-independentpathsbetweenuandv.40 Weomittheproof,andinsteadrefertheinterestedreadertoBondyandMurty[2008],Diestel[2005],orWest[2001].Note2.10(Mathematicallanguage)Menger’stheoremshouldbereadcarefully:itmentionspairwiseindependentpaths.Inthiscase,theadjectivepairwiseisusedtomakeclearthatweshouldalwaysconsiderpairsofpathswhenconsideringindependence.Andindeed,thismakessensewhenyouwouldconsidertryingtocountthenumberofinde-pendentpaths:beinganindependentpathcanonlyberelativetoanotherpath.Tocompletethestory,alsonotethatthetheoremisallaboutcountingthenumberof(u,v)-paths,andnotthenumberofpairsofsuchpaths.Inotherwords,pairwiseisanadjectivetoindependent,andnottopaths.ItisnotdifficulttoseethatMenger’stheoremleadstothefollowingcorollary:Corollary2.2:AgraphGisk-connectedifandonlyifanytwodistinctverticesareconnectedbyatleastkpairwisevertex-independentpaths.Gisk-edgeconnectedifandonlyifanytwodistinctverticesareconnectedbyatleastkpairwiseedge-independentpaths.Ofparticularinterestisthefollowingone:Corollary2.3:Eachedgeofa2-edge-connectedgraphliesonacycle.Thiscorollaryactuallyfollowsfromthepreviousone,whichstates(forthespecialcasek=2)thatagraphis2-edge-connectedifandonlyifanytwodistinctverticesareconnectedbyatleast2pairwiseedge-independentpaths.Thelatter,ofcourse,togetherformacycle.Wewillusethiscorollaryinthenextchapterwhendiscussingso-calleddirectedgraphs.Intuitively,itshouldbeclearthatforanysimplegraphGahighervalueofk(G),i.e.,thesizeofaminimalvertexcut,impliesthatmoreedgesareneeded.Wehavejustseenthatineveryk-connectedgrapheachvertexwillhaveatleastkincidentedges.Knowingthatåd(v)=2m,thismeansthatforagraphwithnvertices,wewouldneed12åd(v)andthusatleast11nkedges.Butwhatistheminimalnumberofedgesforagraph2åk=2tobek-connected?Thisquestionbringsustoaso-calledHararygraph:Definition2.13:AHararygraphHk,nisak-connectedsimplegraphwithnver-ticesandwithaminimalnumberofedges.WhatwenowneedtofigureoutisactuallyhowmanyedgesanHararygraphhas.WewillshowthatHk,nhasexactlydkn/2eedges,thatis,thesmallestnaturalnumberofedgesgreaterorequaltokn/2.Tothisend,we41 labeltheverticesinHk,nasf0,1,...,n1gandorganizethemgraphicallyasacircle.FollowingBondyandMurty[1976],weconsiderthefollowingthreecasesforcombinationsofkandn.kiseven:WeconstructHk,nbyjoiningeachvertexitoitsk/2closestleft-hand(i.e.,clockwise)neighborsanditsk/2closestright-hand(i.e.,counterclockwise)neighbors1.kisodd,niseven:Inthiscase,weconstructHk1,nandaddn/2edgesbyjoiningvertexitoitsleft-handneighboratdistancen(with02i1,weprovethatitalsoholdsfork+1.Indoingso,wehavethencompletedtheproof.Proofbyinductionisextremelyimportantandyoushouldmakesurethatyounotonlyunderstanditwell,butalsothatyouareproficientinapplyingit.d’AngeloandWest[2000]devoteacompletechaptertotheprincipleofin-ductionandprovidemanyexamplesofitsuse.Formally,inductionisdefinedbyconsideringthenaturalnumbers,thatis,Ndef=f1,2,...g.Wethenhavethefollowingimportanttheorem.Theorem2.8(Principleofinduction):LetS(n)beamathematicalstatementformu-latedintermsanaturalnumbern.S(n)istrueifthefollowingtwostatementsaretrue:51 1.S(1)istrue2.foranyk2N,ifS(k)istrue,thenS(k+1)istrueWhatthistheoremtellsus,isthattoconductaproofbyinduction,weneedtofirstshowthatS(1)holds.Secondly,weneedtoshowthatifS(k)istrue,thenS(k+1)isalsotrue.WecanthenconcludethatS(n)istrueforanyn2N.InpracticethismeansthatweshowS(1)tobetrue,thenassumethatS(k)istruefork>1,afterwhichweneedtoshowthatS(k+1)istruebasedonthatassumption.Ofcourse,showingthatS(k+1)istrueisoftenthenastypart.Acommonapproachistotrytoreducethesituationfork+1toS(k).Thisisexactlywhathappenedinthecaseofourlemma:wesimplyremovedanedgewhichleadtosubgraphsofsmallersizeforwhichweknewthatourstatementn=m+1wastrue.Fromthereon,wecouldsubsequentlycountthenumberofverticesandedgesintheoriginalgraph.Usingthislemma,wecannowcompleteourproofofEuler’sformula,againbymeansofinduction:ProofofTheorem2.7.Theproofisbyinductiononr,thenumberofregions.Ifr=1,thenthereisonlyasingleregion,whichmeanstherecannotbearegionenclosedbyedgesofG.Inotherwords,Gmustbeacyclic,inwhichcasem=n1andthusnm+r=n(n1)+1=2.Forr=1theformulaisthereforeseentobetrue.Nowassumetheformulaistrueforallplanegraphswithlessthanrregions,andletGbeaplanegraphwithr>1regions.Chooseanedgee(whichisnotacutedge)andconsiderthesubgraphG=Ge.Asewaspartofacycle,wewillhavemergedtworegions,reducingthetotalnumberofregionsby1.Inthatcase,weknowthatEuler’sformulaistrue,andasaconsequence,jV(G)jjE(G)j+(r1)=2.ConsideringthatjV(G)j=jV(G)jandjE(G)j=jE(G)j1,wenowobtainjV(G)j(jE(G)j1)+r1=jV(G)jjE(G)j+r=2,completingourproof.Euler’sformulaisimportantasitallowsustoderiveanumberofproper-tiesbywhichwecanmoreeasilydeterminewhetheragivengraphisplanarornot.Tothisend,wefirstprovethefollowing:Theorem2.9:ForanyconnectedsimpleplanargraphGwithn3verticesandmedges,wehavethatm3n6Proof.ConsideraregionfinanyplanegraphofG.Foranyinteriorregion,letB(f)denotethenumberofedgesbywhichfisenclosed,i.e.,thelength52 ofits“border.”Obviously,B(f)3foranyinteriorregion.However,withn3wealsohavethattheexteriorregionis“bounded”byatleast3edges.Therefore,ifthereareatotalofrregions,thenclearlyåB(f)3r.Ontheotherhand,itisnotdifficulttoseethatåB(f)countseveryedgeinGonceortwice,andhenceåB(f)2m,sothatweobtain3råB(f)2m,andthusr2m.FromTheorem2.7wethenderivethatm=n+r23n+2m2,sothatm3n6.3Notethatthistheoremgivesusanecessaryconditionforasimplegraphtobeplanar.Inotherwords,ifwehaveasimplegraphGforwhichm>3n6,thenGcannotbeplanar.Itisnotasufficientcondition,aswewillshowshortly.Furthermore,whatwelearnfromthistheoremisthataplanargraphwillhaverelativelyfewedges,whichisintuitivelyclear.Wecanuseittoprovethatthecompletegraphon5vertices,thatis,K5cannotbeplanar.Corollary2.4:Thecompletegraphon5vertices,K5isnonplanar.Proof.Withn=jV(K55)j=5andm=jE(K5)j=()=10,wehavethat2m63n6,sothatK5cannotbeplanar.Note2.15(Moreinformation)Therearetwonoveltiesinthisproof.First,weintroducedthenotation(n),kwhichispronouncedas“nchoosek,”andisdefinedasndefn!k=(nk)!k!Second,wearestatingthatthenumberofedgesinKnnisequalto().Consid-2eringthatwehavenverticesinKn,itshouldbeclearthattoconstructKn,weneedtoconsiderexactlyallpairsofvertices.Obviously,thereareexactly(n)of2suchpairs.AnotherwayofcountingthenumberofedgesinKnisasfollows.Assumethattheverticesarelabeledf1,2,...,ng.Forvertex1,wecanchoosefromn1verticestojoinitto.Afterthat,thereareonlyn2verticestojointovertex2(becausevertex1isalreadyjoinedwithvertex2).Forvertex3,wecanchoosefromn3vertices,andso.Inotherwords,thetotalnumberofedgesinKnisequalto:1jE(Kn)j=(n1)+(n2)+(n3)++2+1=n(n1)2Toshowthatån1i=1n(n1)isleftasanexercise.i=12Analogoustoacompletegraph,wealsohavecompletebipartitegraphsKp,q,whichisasimplegraphconsistingofthetwodisjointsetofverticesV153 andV2asinDefinition2.14onpage46,withp=jV1jandq=jV2j,andatotalofpqedges.Anobservationisnowthefollowing:Theorem2.10:ThecompletebipartitegraphK3,3isnonplanar.Proof.Becausen=jV(K3,3)j=6andm=jE(K3,3)j=9,wefindthatm3n6,sothatthiswillnotgiveusevidencethatK3,3isindeednonpla-nar.Instead,weneedtofollowasimilarreasoningasfortheproofofThe-orem2.9.First,notethateachinteriorregionfinanyKp,qwillnecessarilybeenclosedbyanevennumberofedges.Again,ifB(f)denotesthenum-berofedgesenclosinginteriorregionf,andrealizingthatalsotheexteriorregionwillbe“bounded”byatleastfouredges,wefindthatåB(f)4r,whereristhetotalnumberofregions.Becauseedgesarecountedtwice,weshouldhavethat4r2m=18.However,Euler’sformulatellsusthatr=2n+m=26+9=5,sothat4r618.Therefore,K3,3cannotbeplanar.Note2.16(Mathematicallanguage)Indeed,aswestatedabove,themerefactthatm3n6isnotenoughtoconcludethatagraphisplanar.Inotherwords,itisnotasufficientcondition.Withthesetworesults,wecannowconcludethat:Corollary2.5:Anyconnected,simplegraphhavingasubgraphisomorphictoeitherK5orK3,3cannotbeplanar.54 CHAPTER3EXTENSIONS Inthepreviouschapterwehavelookedonlyattheverybasicsofgraphs,althoughitshouldbeclearthatthosefoundationsalreadyprovideapow-erfultoolformodelingandanalyzingreal-worldnetworks.Inthischapterweconsideranumberofimportantextensions.Westartwithintroducinggraphsinwhichtheedgesaredirected,thatis,pointingfromonevertextoanother.Besidesaddingadirectiontoanedge,wecanalsoassociateaweightwithanedge,whichoftenrepresentssomekindofcostordistance.Finally,wetakealookataspecificapplicationofgraphsbywhichtheverticesoredgesarecolored.Asweshallsee,coloringsallowustocapturereal-worldsituations.3.1DirectedgraphsInthegraphswehaveconsideredsofar,twoverticescouldbeconnectedbyoneormoreedges.Anedgewasrepresentedbyanunorderedpairofvertices,suchashu,viinthecaseofsimplegraphs.However,havingnoorderingisnotalwaysconvenient.Considerthefollowingexamples:•Supposewewanttomodelastreetplanasanetwork.Thisisnat-urallydonebyrepresentingajunctionasavertexandastreetasanedgeconnectingtwojunctions.However,weneedanotionofedgedirectionifwewanttorepresentone-waystreets.•InsocialrelationsitisoftenconvenienttorepresentthefactthatAliceknowsBob,butthattheoppositeisnotthecase.Inasocialnetworkthisisdonebyrepresentingpeoplebyvertices,andthe“whoknowswhom”relationbymeansofdirectededge.•Incomputernetworks,andnotablywirelessnetworks,linksbetweentwodifferentnodesareoftennotsymmetricinthesensethatmes-sagescangenerallybesuccessfullysentfromstationAtoB,butnottheotherwayaround.Modelingsuchacomputernetworkismoreconvenientlydoneusingdirectededges.Whatwearethusseekingisawaytoextendgraphsthatwewillbeabletomodeltheseandsimilarsituations.BasicsofdirectedgraphsTheneedforassociatingadirectionwiththeedgesofagraphleadstothenotionofadirectedgraph,orsimplydigraph:Definition3.1:AdirectedgraphordigraphDconsistsofacollectionverticesV,!andacollectionofarcsA,forwhichwewriteD=(V,A).Eacharca=hu,viis57 saidtojoinvertexu2Vtoanother(notnecessarilydistinct)vertexv.Vertexuiscalledthetailofa,whereasvisitshead.TheunderlyinggraphG(D)ofadigraphDisobtainedbyreplacingeach!arca=hu,viwithitsundirectedcounterpart.Asweshallseeinlaterchap-ters,analyzingtheunderlyinggraphisoftenmoreconvenientthandirectlyconsideringtheoriginaldigraph.Conversely,wecantransformanyundi-rectedgraphGintoadirectedone,D(G),byassociatingadirectionwitheachedge.Suchadigraphisalsoknownasanorientation.WeleaveitasanexercisetoprovethatforasimplegraphGwithmedgesthatthereare2mdifferentorientationspossible.Aswithundirectedgraphs,neighborsetsplayanimportantroleindi-rectedgraphs.Wemakeadistinctionbetweentwotypesofneighbors:Definition3.2:ConsideradirectedgraphDandvertexv2V(D).Thein-neighborsetNin(v)ofvconsistsoftheadjacentverticeshavinganarcwithvasitshead.Likewise,theout-neighborsetNout(v)consistsoftheadjacentver-ticeshavinganarcwithvasitstail.Formallywehave:def!Nin(v)=fw2V(D)jv6=w,9a=hw,vi:a2A(D)gdef!Nout(v)=fw2V(D)jv6=w,9a=hv,wi:a2A(D)gThesetofneighborsN(v)ofvertexvissimplytheunionofitsin-neighborsandout-neighbors,i.e.,N(v)def=Nin(v)[Nout(v).Note3.1(Mathematicallanguage)Noticethattheformalpartofthisdefinitionisalmostidenticaltothatoftheneighborsetinthecaseofundirectedgraphs.Andagain,itisprecise,yetcanseemsomewhatintimidatingatfirstsight.Informally,thein-neighborsetcon-sistsofadjacentverticesfromwhichvcanbedirectlyreached:theyareneigh-bors“pointing”tov.Theout-neighborsetconsistsofverticestowhichvis“pointing.”Thesetypeofinformaltranslationsofmathematicaldefinitionsareimportanttomake,andasbefore,youareencouragedtopracticeinformulatingthem.Adigraphissaidtobestrictifithasnoloopsandnotwoarcswiththesameendpointshavethesameorientation.Notethatthenotionofastrictdigraphisanalogoustothatofasimpleundirectedgraph.Manyconceptsthatwedefinedforundirectedgraphshavetheircounterpartsindigraphs.Letusstartwiththatofvertexdegree.Definition3.3:ForavertexvofdigraphD,thenumberofarcswithheadviscalledtheindegreedin(v)ofv.Likewise,theoutdegreedout(v)isthenumberofarcshavingvastheirtail.58 Theconceptofindegreeandoutdegreecansometimesplayasurprisinglyimportantrolewhendevisingoranalyzingreal-worldnetworks.Togiveanexample,supposewearedevisingacommunicationnetworkinwhichwemodelthecasethatnodeucansendamessagedirectlytonodevbymeans!ofanarca=hu,vi.Theindegreeofnodevmaythenindicatehowmanymessagesvcanexpectpertimeunit,alsoknownastherateofincomingmessages.Inmanycases,itisdesirablethatthisrateislimitedinordertoensurethatnodesarenotoverloaded.Ingeneral,consideringvertexdegreedistributionsisanimportanttech-niqueforanalyzingnetworks.Adegreedistributionshowshowmanyver-ticeshavedegree0,1,2,...,andsoon.Inmanypracticalcases,weareoftenmoreinterestedinfindingthedistributionoftheindegrees.Forexample,inthecaseofsocialnetworks,nodeswithahighindegreeareoftenconsid-eredtobeimportant.Bycomputingtheratioofindegreesbetweendifferentnodes,wecangetanimpressionofexactlyhowmoreimportantcertainnodesare.Wewillreturntovertexdegreedistributionsextensivelyinlaterchapters.Returningtograph-theoreticalissues,itisnotdifficulttoseethatthefollowinganalogytoundirectedgraphsholds:Theorem3.1:ForanydirectedgraphDthesumofindegreesaswellasthesumofoutdegreesisequaltothetotalnumberofarcs:ådin(v)=ådout(v)=jA(D)j.v2V(D)v2V(D)Proof.Clearly,everyarcinDhasexactlyoneheadandonetail.Thesumoftheindegreesisthesameascountingallarcheads,andlikewise,thesumofalloutdegreesisthesameascountingalltails,bothbeingequaltothetotalnumberofarcs.AnaturalrepresentationofdirectedgraphsisbymeansofanadjacencymatrixAinwhichA[i,j]isequaltothenumberofarcsjoiningvertexvitovj.Incontrasttoanadjacencymatrixforanundirectedgraph,wehavethefollowingpropertiesincaseofadirectedgraph:•AdigraphDisstrictifandonlyifforalliandj,A[i,j]1andA[i,i]=0.Inotherwords,therecanbeatmostonearcjoininganyvertexvitoanothervertexvj,andnoarcsjoiningavertextoitself.•Foreachvertexvi,åjA[i,j]=dout(vi)andåjA[j,i]=din(vi).Inotherwords,thesumoftheentriesinrowicorrespondstotheoutdegreeofvertexvi,whereasthesumoftheentriesincolumniequalstheinde-greeofvi.59 Notethatincontrasttoundirectedgraphs,theadjacencymatrixforadi-rectedgraphisnotnecessarilysymmetric,thatis,ingeneral,A[i,j]6=A[j,i].Rephrasingthisinnaturallanguagemeansthatwhenthereisanarcjoiningvertexvitovj,thenthereneednotnecessarilyalsobeanarcjoiningvjtovi.TakingthesamegraphfromFigure2.7butnowwithaspecificorientation,Figure3.1showsanexampleofadigraphanditsadjacencymatrix.a1v1v2v3v4åv1a2a7v111002a3v200101v2v4v311002a4v400112a6a5å22217v3Figure3.1:Adigraphwithitsassociatedadjacencymatrix.Similarly,wecanrepresentadigraphbymeansofanincidencematrixM.Inthiscase,M[i,j]representswhetherornotvertexviisincidenttoarcaj.Inparticular:8><1ifvertexviisthetailofarcajM[i,j]=1ifvertexviistheheadofarcaj>:0otherwise!Unfortunately,ifadigraphhasloops(i.e.,arcsoftheformhu,uithatjoinavertextoitself),thisrepresentationwillnotwork,asisalsoillustratedinFigure3.2.Partlyalsoforthisreason,itismorecommontouseadjacencymatricesorsimplylistingthearcsanalogoustoedge-listrepresentationsinthecaseofundirectedgraphs.a1a2v1a1a2a3a4a5a6a7a7v101-10000a3v2v20-10-1100v4a4v30011-1-10a6v40000010a5v3Figure3.2:Adigraphwithitsassociatedincidencematrix.60 ConnectivityfordirectedgraphsConnectivityisalsoanimportantconceptfordirectedgraphs.Todefineconnectivityfordigraphs,weneedtheequivalentnotionsofpaths.Definition3.4:ConsideradigraphD.Adirected(v0,vk)-walkinDisanalter-natingsequence[v0,a0,v1,a1...vk1,ak1,vk]ofverticesandarcsfromDwith!ai=hvi,vi+1i.Adirectedtrailisadirectedwalkinwhichallarcsaredistinct;adirectedpathisadirectedtrailinwhichallverticesarealsodistinct.Adirectedcycleisadirectedtrailinwhichallverticesaredistinctexceptforv0andvk.Notethatthedefinitionsofwalk,trail,path,andcycleareindeedcompletelyanalogoustothoseforundirectedgraphs.Theconceptofapathandacyclearepracticallyspokenthemostimportantones.Wecannowdefinetheconnectivityofadigraphasfollows:Definition3.5:AdigraphDisstronglyconnectedifthereexistsadirectedpathbetweeneverypairofdistinctverticesfromD.Adigraphisweaklyconnectedifitsunderlyinggraphisconnected.Itisnotdifficulttoimaginethattheconceptofconnectivityindeedplaysanimportantroleindirectedgraphs.Asweexplained,communicationnet-worksareconvenientlymodeledasdirectedgraphs.Inthesenetworks,itisimportantthatamessagefromanarbitrarilychosennodeucanberoutedthroughthenetworktoanyothernode.Thisrequirementisequivalenttostatingthattheassociateddirectedgraphisstronglyconnected.Likewise,intransportationnetworksitisimportantthatforanarbitrarilychosennodewecanfindaroutetoanyothernode.Again,thisisthesameasstatingthatwewanttheassociateddirectedgraphtobestronglyconnected.Note3.2(Moreinformation)Ifbeingstronglyconnectedisimportantyoumayconcludethatweaklycon-necteddigraphsarenotthatinteresting.Thereisoneimportanttypeofweaklyconnecteddigraph:aso-calleddirectedacyclicgraph,orsimplyDAG.ADAGisadirectedgraphwithoutanydirectedcycle.Inpractice,DAGsarealsoas-sumedtobeweaklyconnected.Directedacyclicgraphshavemanyapplications,ofwhichalargenumberdealwithexpressingdependencies.Forexample,workplansaregenerallybro-kendownintosmallerunitssuchasactivities.Toexecuteaworkplan,therewillbemanyactivitiesthatcanstartonlyafterthecompletionofotheractivi-ties.Theseplansareconvenientlymodeledasdirectedgraphs,inwhichaver-texrepresentsanactivityandanarcfromvertexutovthefactthatactivityvcanstartonlyafteruhascompleted.Forsuchplans,wedemandthatthegraphisindeedacyclic.61 Totestforconnectivityindirectedgraphs,wecanperformasimplereachabilityanalysis.AvertexvinadigraphDissaidtobereachablefromvertexu,ifthereexistsadirected(u,v)-pathinD.Tocomputetheverticesthatcanbereachedfromagivenvertexu,wecanproceedasfollows:Algorithm3.1(Reachablevertices):LetRt(u)denotethesetofreachableverticesfromufoundaftertsteps.1.Sett0andR0(u)fug.2.ConstructthesetRt+1(u)Rt(u)[v2Rt(u)Nout(v).3.IfRt+1(u)=Rt(u),stop:R(u)Rt(u).Otherwise,incrementtandrepeatthepreviousstep.Thisisanexampleofabreadth-firstalgorithm,socalledbecauseateachstepeachnewlyaddedvertexisexamined.WeshalldiscussmoreofsuchalgorithmsinChapter4.Theessenceofthealgorithmissimple:wesys-tematicallyexpandthesetR(u)ofverticesreachablefromuwithanynewout-neighborsthatcanbereachedonceavertexhasbeenaddedtoR(u).Clearly,ifnonewneighborsarediscovered[whichiswhenRt+1(u)isequaltoRt(u)],wewillhaveidentifiedallreachablevertices.Then,thedigraphDwillbestronglyconnectedifandonlyif:8u2V(D):R(u)=V(D)Notethatwecanalsoapplythesamemethodforcheckingtheconnectivityofanundirectedgraph.Weleavethedescriptionofthatalgorithmasanexercise.Note3.3(Algorithmics)Thisalgorithmisexpressedratherrigorously.Asbefore,weusethenotationxStoexpressthatthevariablextakesthevalueresultingfromevaluatingtheexpressionS.IfweweretotranslatethisalgorithmintoEnglish,wewouldhavesomethinglike:1.Settto0,andletR0(u)initiallycontainonlyu.2.AddtoRt(u)alltheverticeswthatcanbereachedbyanarcfromvtow,wherevisalreadycontainedinRt(u).NamethisnewsetRt+1(u).3.IftherearenoverticesthatcanbeaddedtoRt+1(u)we’redone.Makingsuchinformaltranslationscanconsiderablyhelpinunderstandinganalgorithm.However,itshouldalsobeclearthatweneedtheprecisionoftheformalnotationifwearetoconstructaprogramthatdoesthejob.Infact,from62 theformalnotationwecanreadilyderivethefollowingfragmentofpseudo-code.(WeuseNalltostoreallout-neighborsfoundsofar,andRnowfortheverticesthatstillneedtobechecked.)t0;R0(u)fug;repeatNallÆ;RnowRt(u);whileRnow6=Ædoselectanyv2Rnow;RnowRnowfvg;NallNall[Nout(v);endwhile;Rt+1(u)Rt(u)[Nall;tt+1;untilRt(u)=Rt1(u);Pseudo-codecombinesconceptsfromprogramminglanguageswithmathemat-icalandnatural-languagestatements.Theprogramming-languageconceptsaregenerallyusedforexpressingtheflowofcontrolinanalgorithm,thatis,theorderinwhichstatementsneedtobeexecuted.Thestatementsthemselvesarewritteninsomeconvenientnotation.Ascanbeseenfromthisexample,thenextsteptowardanactualimplementationwouldmostlyinvolveprogram-mingconstructsfordeclaringandhandlingsets,butisotherwiseindependentofthealgorithm.Insteadoftestingforstrongconnectivity,wecanalsoaskourselvesifandhowwecanprovideanorientationforagiven(connected)undirectedgraphsuchthattheresultingdirectedgraphisstronglyconnected.Thisquestionisrelevant,forexample,whendesigningatrafficcirculationplaninwhichmoststreetsshouldbeone-way.Thefollowingtheoremgivesanecessaryandsufficientconditionforprovidingsuchanorientation.Theorem3.2:ThereexistsanorientationD(G)foraconnectedundirectedgraphGthatisstronglyconnectedifandonlyifl(G)2.Inotherwords,Gcannotbe1-edge-connected.Proof.LetusfirstconsiderastronglyconnectedorientationDofG.Weprove,bycontradiction,thatGis2-edge-connected.Tothatend,assumethatGisnot2-edge-connectedandthattheremovalofe=hu,vidiscon-nectsG,thatisGefallsintotwocomponentsG1andG2.Clearly,wecanassignonlyoneorientationtoe,thatis,D(G)willeithercontainthe!0!arca=hu,viorthearca=hv,ui.BecauseallpathsinGfromavertexx2V(G1)toavertexy2V(G2)willcontaine,itisalsoclearthatwitheitherorientationofe,D(G)cannotbestronglyconnected,whichviolatesourinitialassumption.Hence,Gcannotbe1-edge-connectedandthereforeis(atleast)2-edge-connected.63 Wenowneedtoprovenecessity,thatis,l(G)2,thenthereexistsaanorientationDofGthatisstronglyconnected.Considera2-edge-connectedundirectedgraphG.FromCorollary2.3weknowthateveryedgeofGliesonacycle.ConsideracycleC=[v1,v2,...,vn,v1].Wereplaceeachedge!!hvi,vi+1iwithanarchvi,vi+1iandedgehvn,v1iwitharchvn,v1i.Anyedgehvi,vjibetweennonadjacentverticesonCcanbeorientedarbitrarily.ThissituationisshowninFigure3.3(a).Clearly,ifV(C)=V(G)wewillhaveconstructedastronglyconnectedorientationofG.vi-1v3vwi3v2vw22v1vw1vnvnwv3wj2vj+1(a)(b)Figure3.3:Theconstructionofastronglyconnectedorientation.In(a)wehavefoundpartoftheorientationbyconsideringacycleC.In(b),theexistingorientationisextendedforverticesnotlyingonC.AssumeV(C)6=V(G)sothatwehavenotyetcoveredallverticesofG.Letwbesuchavertex,i.e.,w62V(C).BecauseGis2-edge-connected,weknowfromCorollary2.2thattherearetwoedge-independentpathscon-nectingwtov1,asshowninFigure3.3(b).Withoutlossofgenerality,wemayassumethatthesetwopathspartlyoverlapwithC.Onepath,P1,willhavetheform[w=w1,w2,...,wk,vj,vj+1,...,v1].Theotherwillneces-sarilyhavetheform[w=w1,w2,...,wl,vi,vi1,...,v1],where1i!jn.Wethentransformeachedgehwx,wx+1itothearchwx,wx+1i,and!eachedgehwy,wy+1itohwy+1,wyi.Again,edgesbetweennonadjacentver-ticesonP1andP2maybeorientedarbitrarily.ItshouldbeclearthatallverticesinW=V(C)[V(P1)[V(P2)areconnectedthroughtwoedge-disjointpathsinD.IfthereisstillavertexinV(G)nW,wesimplyrepeattheprocedureun-tilalledgeshavebeenprovidedwithanorientation.TheresultwillbeastronglyconnectedorientationofG.Again,noticehowourproofconsistsofapartprovingsufficiency,andapartprovingnecessity.64 Note3.4(Prooftechniques)Thisistypicallyoneofthoseproofswherevisualizationisalmostanecessity.Infact,theproofbyitselfisnoteventhatdifficulttoproduceonceyouhaveafairlyclearpictureofwhatisgoingon.Inthiscase,themoredifficultpartisprovidingthecorrectmathematicalnotationsandstatements.Aswehavearguedbefore,incasessuchastheseitmakessensetopracticereproducingtheproofsothatyouforceyourselftobepreciseandtogetfurtheracquaintedwiththelanguageofmathematics.Anotherissueworthwhilenotingaboutthisproof,isthatwestatedthatwithoutlossofgenerality,wecouldassumethatbothP1andP2overlapwithC.Thisisanimportantassumption:aspecialcasewouldbewhentherewouldbenooverlap.However,notethatourproofalsocoversthecaseswheneitheroneorbothpathswouldbeedgeindependentfromC.Inthatcase,theproposedorientationwouldstillensurethatthereisadirectedpathfromwtov1andonefromv1tow,whichisexactlywhatwerequiredforbeingstronglyconnected.Finally,notethatwehavemadeuseoftwoprooftechniques.ToprovethatGis2-edge-connectedwhenthereisastronglyconnectedorientation,weap-pliedaproofbycontradiction.Provingthatthereisastronglyconnectedorien-tationwhenGis2-edge-connectedwasaccomplishedbyaproofbyconstruc-tion.Asmentionedbefore,thelatterhasthestrongadvantagethatweactuallyshowhowtoobtainsuchanorientation.Asmentioned,digraphsplayanimportantrolewhenmodelingreal-worldnetworks.Wewillcomeacrossvariousapplicationsinlaterchapters,butnotablywhenconsideringtheWebinChapter8,itwillbecomeclearthattheconceptsofconnectivityand(in)degreedistributionplayacrucialroleinobtainingadeeperinsightintheorganizationoftheworld’slargestinformationsystem.3.2WeightedgraphsLetusnowdirectourattentiontoanotherimportantextensionofthefoun-dationsdiscussedinChapter2,namelyassigningweightstoedges(orarcs).Aweightisareal-valuednumberassociatedwithanedge.Thisextensionisanaturalonewhenmodelingreal-worldnetworksasgraphs.Forexample,whenmodelingarailwaynetworkasagraph,railwaystationsarenatu-rallyrepresentedbyvertices,whereastwoadjacentstationsareconnectedbymeansofanedge.Wethenassignaweighttoanedgerepresentingthedistancebetweenthosetwostations.Definition3.6:AweightedgraphGisagraphforwhicheachedgeehasanasso-ciatedreal-valuednumberw(e)calleditsweight.ForanysubgraphHG,theweightofHissimplythesumofweightsofitsedges:w(H)=åe2E(H)w(e).65 Acommonlyadoptedconventionforweightedgraphsistosimplywritethatw(hu,vi)=¥whenverticesuandvarenotadjacent.Thisalsomeansthatforeachedgee2E(G)wedemandthatw(e)<¥.Weoftenuseweightedgraphstofindsubgraphswithamaximal(orminimal)weight.Inparticular,wecanusethemtodeterminethedistancebetweentwovertices,whichisformallydefinedasfollows.Definition3.7:ConsideranundirectedgraphGandtwoverticesu,v2V(G).LetPbea(u,v)-pathhavingminimalweightamongall(u,v)-pathsinG.TheweightofPisknownasthe(geodesic)distanced(u,v)betweenuandv.PathPiscalledashortestpath(u,v)-path,orageodesicbetweenuandv.Findingshortestpathsisacentralprobleminvirtuallyallcommunicationnetworks.Fortunately,thereexistsanefficientalgorithmforcomputingtheshortestpathsfromagivenvertexutoallotherverticesinagivenundi-rectedgraph.Again,thisisanexampleofabreadth-firstalgorithm.Thealgorithm,duetotheDutchmathematicianEdsgerDijkstra,wasdevelopedin1959andformsthecoreofmanyso-calledroutingalgorithmsthatareusedintheInternet.Itisbeyonddoubtoneofthemostimportantalgorithmsinmoderncommunicationnetworks.Theprincipleisasfollows.ConsideranundirectedgraphG,avertexu2V(G),andthesetS(u)ofverticeswhoseshortestpathfromuhasalreadybeenfound.Ineachstepwe,considerthesetofverticesthatareadjacenttosomevertexinS(u)butdonotbelongtoS(u)yet.WepicktheoneamongtheseverticesthatisclosesttouandthenaddittoS(u).Beforeweformallydescribethealgorithm,letusconsideranexample.InFigure3.4weseeasimplegraphforwhichwewanttofindtheshortestpathsoriginatingfromvertexv0.WestartwithinitializingS(v0)tofv0gandconsiderthevertexthatisclosesttov0.Inourexample,thisvertexisv3,whichissubsequentlyaddedtothesetS(v0).Inaddition,welabelv3with(k,d),wherekistheindexofthevertexthroughwhichv0canreachv3(which,inthiscase,isv0,i.e.,k=0),anddisthelengthoftheshortestpathtov3(withd=1inthisexample).Theprocedurecontinueswithidentifyingthevertexclosesttov0thatcanbereachedfromanyvertexinS(v0),whichisnowequaltofv0,v3g.Clearly,thisisvertexv2,whichisthenaddedtoS(v0)andreceivinglabel(0,3).Thenextvertextoaddisv5:withS(v0)nowbeingequaltofv0,v2,v3g,theverticesreachablefromS(v0)arev1,v4,v5,andv6,atdistances5(viav2),6(viav0),4(viav2),and5(viav3),respectively.Afteraddingv5toS(v0)andgivingitlabel(2,4),wecanchooseeitherv1orv6,whicharebothatdistance5fromv0.ThisprocedurecontinuesuntilallverticesfromGhavebeenaddedtoS(v0).LetusnowformallydescribeDijkstra’salgorithm.66 v21v5v21v52354235466v1v0v4v7v1v0v4v77132713244v3v6v3(0,1)v6v2(0,3)1v5v2(0,3)1v5(2,4)2354235466v1v0v4v7v1v0v4v77132713244v3(0,1)v6v3(0,1)v6v2(0,3)1v5(2,4)v2(0,3)1v5(2,4)23542354v16v4v16v4v0v7v0v7(2,5)(2,5)7132713244v3(0,1)v6v3(0,1)v6(3,5)v2(0,3)1v5(2,4)v2(0,3)1v5(2,4)23542354v1v06v4(0,6)v7v1v06v4(0,6)v7(2,5)(2,5)(6,7)7132713244v3(0,3)v6(3,5)v3(0,3)v6(3,5)Figure3.4:Computingtheshortestpathsfromv0.Algorithm3.2(Dijkstra):Consideranundirected,simpleweightedgraphG.Edgeweightsarerequiredtobenonnegative.Consideravertexu.Weintroducethefollowingsetsandlabels:•LetSt(u)bethesetofverticestowhichashortestpathfromvertexuhasbeenfoundafterstept.•EachvertexvisassignedalabelL(v)defL(v),L(v),inwhichL(v)is=121thevertexprecedingvintheshortest(u,v)-pathfoundsofar,andL2(v)thetotalweightofthatpath.•LetRdefN(v),withN(v)denotingtheneighborsetoft(u)=St(u)[v2St(u)v.Inotherwords,Rt(u)consistsofallverticesinSt(u)andtheirneighbors.67 1.Initializet0andS0(u)fug.Furthermore,forallv2V(G):((u,0)ifv=uL(v)(,¥)otherwise2.Foreachvertexy2Rt(u)nSt(u),considertheverticesN0(y)thatareneigh-borsofythatlieinS0def0t(u),i.e.,N(y)=N(y)St(u).Selectx2N(y)forwhichL2(x)+w(hx,yi)isminimal.SetL(y)x,L2(x)+w(e).3.Letz2Rt(u)nSt(u)forwhichL2(z)isminimal.SetSt+1(u)St(u)[fzg.IfSt+1(u)=V(G),stop.Otherwise,tt+1,computeRt(u)againandrepeatthepreviousstep.Note3.5(Algorithmics)Admittedly,theformaldescriptionofDijkstra’salgorithmisnotaneasyread.Thisispartlycausedbythefactthatweneedtoexpresstheflowofcontrol,whichisratherawkward.Usingpseudo-code,thingsbecomemucheasiertoread.Strictlyfollowingourpreviousnotations,yetomittingthestepcountert,weobtainthefollowingcodefragment:S(u)fugL(u)(u,0);foreachv2V(G),u6=v:L(v)(,¥);whileS(u)6=VdoR(u)S(u)[v2S(u)N(v);forally2R(u)nS(u)doforallx2N(y)S(u)doifL2(x)+w(hx,yi)1vertices,andconsideragraphGwithk+1vertices.Letvertexv2V(G)withd(v)=D(G).ThegraphG=Gvhaskvertices,sothereexistsavertexcoloringCofGwithc(G)D(G)+1differentcolors.IfD(G)=D(G),thenintheworstcase,thenumberofcolorsusedinGisc(G)=D(G)+1=D(G)+1.ConsideringthatvhasD(G)neighbors,thismeansthatthereisacoloravailablefromtheonesusedinGthatwecanuseforvandwhichhasnotbeenusedforanyofv’sneighbors.73 Ontheotherhand,ifD(G)>D(G),thenwecansimplypermitour-selvestointroduceanewcolorforvandusetheonesfromanoptimalcol-oringofGforallothervertices.Atworst,wewillthenhavethatc(G)=c(G)+1D(G)+2.IfD(G)6,weprovethetheorembycontradiction.Tothisend,consideraplanargraphGforwhichn>6.LetmbethenumberofedgesofG.Weknowthatåv2V(G)d(v)=2m.Therefore,ifthereisnovertexwithdegree5orless,then6n2m.Inaddition,fromTheorem2.9weknowthatm3n6,andthusthat6n6n12.Obviously,thisisfalse,meaningthatourassumptionthatthereisnovertexwithdegree5orlessmustbefalseaswell.75 Note3.9(Prooftechniques)Notethatthisproofbycontradictiontellsusthattheremustbeavertexwithdegreelessorequaltofive,butitgivesusnofurtherhintsonhowtofindsuchavertex.Thisistypicalforexistentialproofs,incontrasttoproofsbyconstruction.FollowingChartrand[1977],wenowprovethefollowingtheorembyinduc-tiononthenumberofvertices:Theorem3.8:ForanyplanargraphG,c(G)5.Proof.Letn=jV(G)j.Forn=1,thetheoremisobviouslytrue.Assumethetheoremholdsforallplanargraphswithk>1verticesandconsideragraphGwithk+1vertices.Letvertexvwithd(v)5(wejustprovedthatsuchavertexexists),andconsiderthegraphG=Gv.BecausejV(G)j=k,weknowthereexistsa5-vertexcoloringofG,with,say,colorsc1,...,c5.IfnotallofthesecolorsareusedbytheverticesintheneighborsetN(v)ofv,wecanassigntheunusedcolortovandwillthushaveconstructeda5-vertexcoloringofG.ConsiderthesituationthatallfivecolorshavebeenusedforcoloringtheverticesofN(v).Notethatd(v)=5sothatwemayassumethatN(v)=fv1,...,v5gandthatvertexvihascolorciaccordingtoaclockwiseorderingoftheseverticesaroundv,asshowninFigure3.7.WewillrearrangethecolorsofGsuchthatwecanassignoneofthecolorscitov.v1v5vv2v4v3Figure3.7:Theorderingofverticesadjacenttov.Vertexvihascolorci.Letusfirstassumethatthereisno(v1,v3)-pathinGforwhichallver-ticeshavebeencoloredeitherc1orc3.NowconsiderallpathsinGthatoriginateinv1andforwhichtheverticesarecoloredeitherc1orc3.ThesepathsinduceasubgraphHofG.Notethatv362V(H),asthiswouldmeanthatthereisa(v1,v3)-path.Forthesamereason,noneofv3’sneighborscanbeinH,i.e.,N(v3)V(H)=Æ.Whatwecanthendoisinterchangethe76 colorsc1andc3inH,whichleadstoanother5-vertexcoloringofG.How-ever,inthiscase,vertexv1willbecoloredc3,andnoneoftheverticesinN(v)willbecoloredc1.Therefore,wecanusec1forv.Letusnowassumethatthereisa(v1,v3)-pathPinGforwhichallver-ticeshavebeencoloredeitherc1orc3.Considerthecycle[v3,v,v1,P].Thiscycleeitherenclosesv2(asshowninFigure3.7),oritenclosesv4andv5.Hence,becauseGisplanar,therecanbeno(v2,v4)-pathinGwhosever-ticesarecoloredusingonlyc2andc4.Again,considerallpathsoriginatinginv2andthathaveeithercolorc2orc4.Asbefore,thesepathsinduceasubgraphH0ofG.WeinterchangethecolorsoftheverticesinH0,allowingustoassigncolorc2tov,andthusleadingtoa5-vertexcoloringofG.Therearemanyotherpropertiesrelatedtocoloringvertices,butweshallnotdiscusstheseanyfurther.Bynow,itshouldhavebecomeclearthatver-texcoloringimposesanumberofverydifficultquestions(suchasefficientlyfindingthechromaticnumberofagraph),andthatevenunderrelativelyfa-vorableconditionssuchasplanarity,takingasmallstepfromoneproblemformulation(“c5”)toanother(“c4”)canmakeadifferencebetweensimpleandcomplicatedsolutions.77 CHAPTER4NETWORKTRAVERSAL Withthematerialpresentedinthepreviouschapterswehaveenoughtoolsinourhandstostartstudyingproblemsrelatedtothetraversalofnetworks.Networktraversalproblemsfocusonoptimizingawalkthatcontainsallverticesofagraph,alsoreferredtoasaspanningwalk.Recallthata(v0,vk)-walkwasdefinedasanalternatingsequenceW=[v0,e1,v1,e2...vk1,ek,vk]ofverticesandedges,whereedgeei=hvi1,vii.Onecategoryofspanningwalksthatwe’llconsideristheonecontainingclosedwalksthatalsotraverseeachedgeinagraph.Thesewalksarealsoknownastours.Animportantquestionistofindtoursinwhichedgesareadditionallycrossedasfewtimesaspossible.Anotherimportantcategoryisformedbyspanningcycles.Inotherwords,closedspanningwalksinwhichallverticesaredistinct.Thisso-calledHamiltoncyclesplayacrucialrolewhenwealsotrytominimizethetotaldistancecovered,whichoccurswhenconsideringweightedgraphs.Letustakeacloserlookatthesetwotypesofspanningwalks.4.1EulertoursWestartourdiscussionwithprobablyoneoftheoldestgraph-theoreticalproblems:isitpossibletotraverseagraphsuchthatalltheedgesarecrossedexactlyonce?Ofcourse,thiswasnothowtheoriginalproblemwasformu-lated.TheproblemoriginatedinthecityofKonigsberg(nowcalledKalin-¨grad)thatwasdividedbytheriverPregel.Theseveralpartsofthecitywereconnectedbymeansofsevenbridges,asshowninFigure4.1.Thepopu-lationofKonigsberghadbeenamusingthemselvesforsometimewitha¨simplequestion:isitpossibletowalkthroughthecityandcrosseachofthebridgesexactlyonce?Theanswerissimply“no,”butinordertounderstandwhy,weneedgraphtheory.Ofcourse,ifweweredealingwithapuzzleapplicableonlytotheoldcityofKonigsberg,onecouldjustifiablequestionwhetheritshoulddeserve¨anyseriousattentionatall.However,itturnsoutthattheproblemiseas-ilygeneralizedtoothersituations.Animportantonethatwewilldiscussbelowisfindingaspanningwalkthatcoverseverystreetofacity,butsuchthateachstreetispreferablypassedthroughatmostonce.Thisisthesameasfindingatourwithminimaltotalweight,whereweightisdefinedbythelengthofastreet.Assaid,wereturntothisimportantapplicationbelow,afterdiscussingsomebasicissues.81 Figure4.1:ThesevenbridgescrossingtheriverPregelinKonigsberg.¨e0e1e2v0v0v1e3e0e2e1v3v1v3e3e4e5e5e6e6e4v2v2Figure4.2:ThebridgesofKonigsbergmodeledasagraph.¨ConstructinganEulertourReturningtothesevenbridgesofKonigsberg,wecanmodeltheproblemby¨representingeachareaseparatedbyabridgeasavertex,andeachbridgebyanedgeconnectingtwoseparatedareas,leadingtothegraph(withmultipleedges)showninFigure4.2.ThepeopleofKonigsbergwereinterestedin¨findingaspecifictour:82 Definition4.1:AtourofagraphGisa(u,v)-walkinwhichu=v(i.e.,itisaclosedwalk)andthattraverseseachedgeinG.AnEulertourisatourinwhichalledgesaretraversedexactlyonce.EulertourswerenamedaftertheSwissmathematicianLeonhardEulerwhoinitiallysolvedtheproblemoftheKonigsbergbridges.Tothisend,he¨provedthefollowingtheorem:Theorem4.1:AconnectedgraphG(withmorethanonevertex)hasanEulertourifandonlyifithasnoverticesofodddegree.Proof.First,assumethatPisanEulertourofG,originatingandendingin,say,vertexv.Consideravertexudifferentfromv.Obviously,uliesonPandforeachedgehw1,ui2E(P)thatisusedfor“entering”u,thereisauniqueotheredgehu,w2itraversedfor“leaving”u.Moreover,becausetheseedgesaretraversedexactlyonce,edgesforenteringuarealwaysuniquelypairedwithedgesforleavingu.Hence,thedegreeofumustbeeven.Byasimilarreasoning,thedegreeofvmustalsobeeven.WeconcludethatallverticesofGhaveevendegree.Conversely,assumethatallverticesofGareofevendegree.WenowneedtoprovethatGhasanEulertour.Tothisend,selectanarbitraryver-texvandconstructatrailPbysubsequentlytraversingedgesuntilitisnolongerpossibletotraverseanedgenotbelongingtoP.LetwbethevertexwherePends.Ifw6=v,thenclearlywehave“entered”woncemorethanwehave“left”it,meaningthatd(w)isodd.Thisviolatesourassumption,hencew=vandhencePmustbeaclosedtrail.IfE(P)=E(G)wehavejustconstructedanEulertourandwe’redone.NowassumeE(P)6=E(G),thatisE(P)E(G).BecauseGisconnected,thereisavertexuofPincidentwithedgesthatarenotpartofP.ConsidertheinducedsubgraphconstructedbysimplyremovingalledgesthatarepartofP:Hdef=GE(P).NotethatHmaybedisconnected.BecauseeveryvertexinGhasevendegree,butalsoeveryvertexinP,sowilleveryvertexinHhaveevendegree.LetcomponentH0containu.Again,constructa(closed)trailP0inH0originatinginuuntilnomoreedgescanbeaddedthatarenotyetcontainedinP0.BecausejE(P0)j>0,mergingPandP0willyieldalargertrailinG.IfthislargertraildoesnotcontainalledgesofG,werepeattheprocedureuntilwehaveconstructedaclosedtrailcontainingalledgesofG.ThistrailwillformanEulertour.Note4.1(Prooftechniques)Ourproofbyconstructionusesanimportantprooftechnique,calledextremal-83 ity[West,2001].Theessenceofthistechniqueisthatweconsiderextremecases,suchasapathortrailofmaximallength.Notethatinourexample,themerefactthatweconstructPsuchthatitisindeedofmaximallengthleadsustocon-cludethatitisaclosedtrail.Therearemanyothersituationsinwhichexploringextremalityisnecessarytodrawconclusionsandwewillencountermoreex-amplesthroughoutthetext.DefininganEulertrailasa(u,v)-trailofaconnectedgraphGthattra-versesalledgesexactlyonce,itisnotdifficulttoseethatthefollowingstate-mentistrue:Theorem4.2:AconnectedgraphG(withmorethanonevertex)hasanEulertrailifandonlyifithasexactlytwoverticesofodddegree.Moreover,thetrailoriginatesandendsintheverticesofodddegree.Proof.First,letPbeanEulertrailoriginatinginuandendinginv.Bythesamereasoningasinthepreviousproof,allverticesexceptuandvmustbeofevendegree.Conversely,assumeGhasexactlytwoverticesuandvofodddegree.ConsiderthegraphGconstructedfromGbyaddinganedgee=hu,vi.AllverticesinGwillnowhaveevendegree.BecauseGisobviouslyalsoconnected,weknowthatGhasanEulertourP.RemovingefromPyieldsanEulertrailforG.Sofar,wehaveprovidedonlysomenecessaryandsufficientconditionsforagraphtobeEulerian.Whatismissing,ofcourse,isaprocedurebywhichwecanconstructanEulertour(ifoneexists).ThemostwidelyknownalgorithmthataccomplishessuchatourisduetoaFrenchmathematician,Fleury.Algorithm4.1(Fleury):ConsideranEuleriangraphG.1.Chooseanarbitraryvertexv02V(G)andsetW0=v0.2.AssumethatwehaveconstructedatrailWk=[v0,e1,v1,e2,v2,...,vk1,ek,vk].Chooseanedgeincidenttovk,butwhichisnotyetpartofWk,thatis,ek+1=hvk,vk+1iandek+12E(G)nE(Wk).Inaddition,makesurethatek+1isnotacutedgeoftheinducedsubgraphGk=GE(Wk),unlessthereisnootheroption.84 3.WenowhaveatrailWk+1.Ifthereisnoedgeek+2=hvk+1,vk+2itoselectfromE(G)nE(Wk+1),stop.Otherwise,repeatthepreviousstep.Obviously,Fleury’salgorithmconstructsatrailinG:atnopointwillanedgebeselectedthatisalreadypartofthewalkWkconstructedsofar.Hence,Wkmustbeatrail.ThatthealgorithmactuallyconstructsanEulertourisformalizedinthefollowingtheorem(seealsoBondyandMurty[1976]).Theorem4.3:AtrailconstructedbyFleury’salgorithminanEuleriangraphGisanEulertourofG.Note4.2(Algorithmics)Beforewedelveintothedetailsofthistheorem,notethatthereissomethingspecialaboutit:itstatesthatFleury’salgorithmiscorrect.Asaconsequence,ifweprovethistheorem,wewillhaveshownthatFleury’salgorithmindeedfindsanEulertourifoneexists.Suchtheorem/proofcombinationsformafun-damentalcomponentofalgorithmdesignincomputerscience.However,itisimportanttomakeadistinctionbetweenthecorrectnessofanalgorithmandthecorrectnessofaprogramthatimplementsthatalgorithm.Inthelattercase,weneedtotakeintoaccountthefactthataprogramisexecutedbyacomputerandthatthestatementsweareusinghavingprecisemeaning,thatis,haveformalsemantics.Proof(*).Let’sfirstconsideratrailWnconstructedbymeansofFleury’salgorithmthatcontainsalledgesofG.Assumethatthistrailstartsinv0andendsinvn.WeneedtoshowthatWnisaclosedtrail,i.e.,thatv0=vn.Tothisend,considertheinducedsubgraphGn=GE(Wn).BecauseWnconsistsofalledgesinG,eachvertexinGnmusthavedegree0.Inparticular,thisistrueforverticesv0andvn.Ifv06=vn,thentheycanonlyhaveodddegreesinG,whichisimpossible,becauseweknowthatGisEulerianandthusthatallverticeshaveevendegree.Therefore,WnmustbeaclosedtrailandthusanEulertour.NowsupposethatWnisnotanEulertourofG.Again,letWnbeequaltothesequence[v0,e1,v1...vn1,en,vn].NotbeinganEulertourmeansthatwewerenolongerabletoselectanyedgesincidentwithvnthathadnotalreadybeenselected.Afewobservationsareimportant.•Wenecessarilyhavethatv0=vn,forifthiswerenotthecaseandtherewerenomoreedgesincidentwithvntoselect,thenfollowingthesamereasoningasbefore,d(vn)wouldbeodd,andthusGwouldnotbeEulerian.85 •LetEdefnbetheedgesthatarenotpartofWn,i.e.,En=E(G)nE(Wn).BecauseWnisassumednottobeanEulertour,wemusthavethatEn6=Æ.LetSbethesetofverticesincidentwithedgesfromEn.SomeoftheseverticesbelongtoWn,andothersdonot.Notethatvn62S,forotherwisethiswouldmeanthatitwouldbeincidenttoanedgethatisnotinWn,meaningthatWncouldhavebeenexpanded.•LetS=V(G)nS.NotethatverticesinSarenotincidentwithedgesinEn,andthusareincidentonlywithedgesfromWn.Inparticular,vn2S.•BecauseallverticesinWnhaveevendegree,sowillalltheverticesintheinducedgraphGdefn=G[En].•ConsideravertexufromGn[S].Bydefinition,uisincidentwithanedgefromEn.BecauseGisEulerian,thedegreedG(u)ofuinGiseven.Also,wejustobservedthatdGn(u)iseven.ThiscanonlymeanthatthedegreedGn[S](u)ofuinintheinducedsubgraphGn[S]ofGnisevenaswell.Letmbethelargestindexsuchthatvm2Sandvm+12S.Inotherwords,vmisthe“last”vertexofWnthatisstillinS,andthusincidentwithanedgethatisnotpartofWn.Allotherverticesvm+1,...,vnareinSandthusincidentonlywithedgesofWn.Nowconsideredgeem+1=hvm,vm+1i.ThisedgeistheonlyedgeinGmbetweenverticesinSandS.Toseethis,assumethereisanothersuchedgee0inWm.Notethatbecausee0isincidentwithavertexfromS,e062E(Wn).Ontheotherhand,ifoneofitsendpointsbelongstoS,thene0wouldnecessarilybelongtoE(Wn),whichbyconstructionisimpossible.Inotherwords,boththeendpointsofe0mustbelongtoS,andhence,noe0exists.Thisalsomeansthatem+1isacutedgeofGm.LetebeanyotheredgeinGmincidentwithvm.InFleury’salgorithmweprefertheselectionofedgesthatarenotcutedges.Becauseweselectedem+1,whichisacutedge,emustalsobeacutedgeofGm.ItisthensurelyalsoacutedgeoftheinducedsubgraphGm[S].BecauseGnGm,wealsohavethatGm[S]=Gn[S].Asnoted,allverticesinGn[S]andthusalsoinGm[S]haveevendegree.However,inagraphwithonlyeven-degreever-tices,therecannotbeacutedge(whichweleaveasanexercisetothereader).WehavenowestablishedacontradictionbasedontheassumptionthatWnisnotanEulertourofG.Inotherwords,ourassumptioncanonlybefalse,whichcompletestheproof.86 Note4.3(Studytip)Obviously,thisisnotaneasyproof.However,despiteitscomplexity,itisim-portanttounderstandandbeabletoreproduceit,foritwillforceyoutocon-sidereverydetailwhenmakinganextstep.Atthesametimeitisimportanttograspthebigpicture,namelythattheconstructionoftheproofistowardreachingacontradictionbasedonthefactthatFleury’salgorithmprescribesthatweshouldpreferablynotselectacutedge.Byshowingthattherewasnootherchoice(i.e.,em+1isnecessarilyacutedge),yetatthesametimetheremusthavebeenanalternativeedgethatwasnotacutedge,wearriveatacontradic-tion.ThiscontradictiontellsusthatwhenexecutingFleury’salgorithm,weareconstructinganEulertour,ifoneexists.ToseehowFleury’salgorithmworks,considerthegraphinFigure4.3.Ateachstep,thebold-facededgeisaddedtothetrailWk.WhencutedgesincidentwithvkappearinGk,theyaremarkedasadashedline.Thesearetheonesthatweshouldprefernottochoose,butsometimesthereisjustnoalternative.AlthoughFleury’salgorithmisapparentlyelegantandsim-ple,thedifficultyinitspracticalexecutionisdeterminingwhetheraselectednextedgeisacutedgeornot.Itisforthisreasonthatmoreefficientalgo-rithmshavebeendeveloped.TheChinesepostmanproblemLetusnowconsiderapracticalapplicationofEuler’sresearch:theChinesepostmanproblem,so-calledbecauseitwasfirstpostulatedbytheChinesemathematicianKuan[1962].ThisproblemismoregeneralandalsomorecomplicatedthanthatoffindinganEulertour.ConsideraweightedgraphGinwhicheachedgehasanonnegativeweight.TheproblemistofindaclosedwalkW=[v0,e1,v1...vn1,en,vn]thatcoversalledgesofG,butwithminimalweight.Inotherwords,E(W)=E(G)andånw(e)ismin-i=1iimal.Notethatwedonotdemandthateachedgeistraversedexactlyonce,forinthatcasewewouldhaveanEulertour,andobviously,suchawalkwouldautomaticallyhaveminimalweight.Instead,weareaimingforaclosedwalksuchthatifitisnecessarytocrossanedgemorethanonce,thatthewalkissuchthatthetotalweightiskeptaslowaspossible.TheChinesepostmanproblemisageneralizationofmanytraversalprob-lems.Considerthefollowingexamples.Routinggarbagetrucks:Inordertocollectthegarbageinaspecificneigh-borhood,garbagecansareplacedonthecurbonceaweektobeemp-tiedbytrucks.Anoptimalrouteforatruckconsistsofpassingthrough87 910111267581234StartStep01Step02Step03Step04Step05Step06Step07Step08Step09Step10Step11Step12Step13Step14Step15Step16Step17Step18Figure4.3:AnillustrationofFleury’salgorithm.eachstreetatleastonce,andpossiblymore,butinsuchawaythatthetotalelapseddistanceisminimal.Inthisexample,wemodeltheneighborhoodasanundirectedgraphinwhicheachjunctionisrepresentedbyavertexandastreetasanedgewithitsweightcorrespondingtothelengthofthestreet.Avari-ationoftheproblemistoallowatrucktostartandendatadifferentlocation.Inthatcase,thewalkneednotbeclosed,yetwestillneedto88 makesurethateveryedgeiscrossedatleastonce.Routingapostman:Somewhatsimilarisdetermininganoptimalrouteforapostman.However,inthiscaseweneedtotakeintoaccountthatstreetsnormallyhavehousesonbothsidesofaroad.Ratherthanlettingapostmancrossthestreetfromonesidetotheotherallthetime,weassumethathefirstdeliversthemailtooneside,andlatertotheother.Inthiscase,ajunctionisagainrepresentedbyavertex,yetastreetwithhousesonbothsidesisrepresentedbytwoedges,eachedgeef-fectivelyrepresentingonerowofhouses.CheckingaWebsite:Typically,aWebsiteconsistsofnumerouspages,inturncontaininglinkstoeachother.Asissooftenthecase,mostWebsitesarenotoriouslypoorathavingtheirlinksmaintainedtothecor-rectpages.Thisisoftenduetothesimplereasonthatsomanypeopleareresponsibleformaintainingtheirpartofasite.Apartfromlinksthatarebroken(i.e.,refertononexistingpages),itisoftennecessarytomanuallycheckhowpagesarelinkedtoeachother.GraphtheorycanhelpbymodelingaWebsiteasanundirectedgraphwhereapageisrepresentedbyavertexandalinkbyanedgehavingweight1.Notethatwearenotusingadirectedgraph,aswemayneedtocrossalinkinreverseorder,forexample,whengoingbacktotheoriginalpage.Ifasiteistobemanuallyinspected,thenweareseekingasolutiontonavigatethroughasite,butwithpreferablycrossingalinkatmostonce.Thisisnowthesameasfindingadirectedwalkcontainingalledgesofminimallength.Otherexampleseasilycometomind,andsomelessobviousonesarede-scribedbyThimbleby[2003](whichincludesthecaseofnavigatingthroughaWebsite).Theseexamplesshouldmakeclearthatwemaysometimesneedtotraverseanedgetwice.Formally,thesemeansthatforaclosedwalkW=[v0,e1,v1...vn1,en,vn]tobeminimal,itmayoccurthatforsomei6=j,ei=ej.InordertosolvetheChinesepostmanproblem,weproceedbytrans-forminganon-EuleriangraphintoaEulerianonebysimplyduplicatingedges.Duplicatinganedgee=hu,vimeansthatwesimplyaddanex-traedgee=hu,viwiththesameweightase.Thetrick,ofcourse,istoduplicateasfewedgesaspossibleandsuchthattheaddedtotalweightoftheresultinggraphisminimal.OncewehavetransformedtheoriginalgraphintoaEulerianone,wecanapplyFleury’salgorithmtofindanEulertour.Notethatbyensuringthatthetotalweightofthetransformedgraph89 isminimal,wealsoensurethatourEulertourinthetransformedgraphisminimal.Unfortunately,transformingagraphtoaEulerianonethathasaslessweightaspossibleisnottrivial.Forexample,supposethatedgee=hu,viisincidentwithavertexvwithodddegreeandvertexwwithevendegree.Duplicatingewillforceustosubsequentlyreconsidervertexw,whichinthenewsituationwillthenhaveodddegree.Ageneralsolution,butwhichistoocomplicatedforourpurposestodescribehere,isgivenbyEdmondsandJohnson[1973].Aspecialcasethatiseasytosolveiswhenthereareonlytwoverticeshavingodddegree,sayuandv.WecanthenuseDijkstra’salgorithmtofinda(u,v)-pathhavingminimalweight,andsubsequentlyduplicateeachedgeonthatpath.Weleaveitasanexercisetoshowthattheresultisindeedaminimum-weightEuleriangraph.Thisapproachcanbeeasilygeneralized.RecallfromChapter2thatev-erygraphhasanevennumberofverticeswithodddegree,say2k.Whatweareseekingarekpathseachconnectingtwoodd-degreeverticessuchthatnotwopathshaveasourceanddestinationvertexincommon,andsuchthatthesumoftheirrespectiveweightsisminimal.FollowingGibbons[1985],wetacklethisproblemasfollows.Algorithm4.2(Chinesepostman):Consideraweighted,connectedgraphGwithodd-degreeverticesVodd=fv1,...,v2kgwherek1.1.Foreachpairofdistinctodd-degreeverticesviandvj,findaminimum-weight(vi,vj)-pathPi,j.2.Constructaweightedcompletegraphon2kverticesinwhichvertexviandvjarejoinedbyanedgehavingweightw(Pi,j).3.FindthesetEofkedgese1,...,eksuchthatåw(ei)isminimalandnotwoedgesareincidentwiththesamevertex.4.Foreachedgee2E,withe=hvi,vji,duplicatetheedgesofPi,jingraphG.TheresultinggraphGisEulerianwithminimalweight,forwhichwethenapplyFleury’salgorithmtofindaminimum-weightEulertour.Let’sconsiderasimpleexamplefromGibbons[1985]todemonstratethisalgorithm.Figure4.4(a)showsourinitialgraphwithodd-degreeverticesv1,v2,v3,andv4.Wefirstfindminimum-weightpathsbetweenallthesevertices.Itisnotdifficulttoverifythatthefollowingpathsindeedhaveminimalweight:P1,2=[v1,v2](weight:3)P2,3=[v2,u3,u5,u4,v3](weight:5)P1,3=[v1,u2,v3](weight:3)P2,4=[v2,u6,v4](weight:2)P1,4=[v1,u1,u5,v4](weight:5)P3,4=[v3,u4,u5,v4](weight:4)90 Ournextstepisconsidertheweightedcompletegraphonthefourverticesv1,v2,v3,andv4asshowninFigure4.4(b).Weareseekingtofindasetoftwoedgessuchthattheirtotalweightisminimal,andsuchthattheydonohaveanyendpointsincommon.Thisisachievedbythesetfhv1,v3i,hv2,v4ig,correspondingtothetwopathsP1,3andP2,4.Theedgesofthesetwopathsarethenduplicated,leadingtotheEulergraphwithminimalweightasshowninFigure4.4(c).3u6v211u32v13v2315v1u13232212v421u5515v34v44u412u2v3(a)(b)3111231132212211154122(c)Figure4.4:AnexampleofsolvingtheChinesepostmanproblem.(a)Theinitialgraph,(b)findingtheoptimalpaths,(c)theexpandedgraph.Note4.4(Moreinformation)ThesolutiontotheChinesepostmanproblembuildsonanimportanttopicingeneralgraphtheory,namelythatofmatchings.AmatchingMinagraphGisasubsetoftheedgesofGsuchthatnotwoedgesfromMareincidentwiththesamevertex.Matchingsaretypicallyappliedtosituationsinwhichweneedtoteamuppairsofsomesort,andwhereeachpairissubjecttoaconstraint.Consider,forexample,agroupofnpeoplep1,...,pnandmtaskst1,...,tm,withnm.Apersonpicanfulfilltasktjwithacertainexpertiseei,j2[0,1],wherethevalue0representsthecasethatpicannotfulfilltj.Assumethatforeachtaskthereisatleastonepersonwhocanfulfillthattask.Weaskour-selveswhatthebestassignmentofpeopletotasksis.Thissituationcanbe91 modeledbymeansofaweightedbipartitegraph,forwhichwearethenseek-ingamaximum-weightmatching.InthecaseoftheChinesepostmanproblem,weareactuallylookingforaperfectmatching:amatchingMsuchthateveryvertexinGisincidentwithanedgefromM.Therearevarioussolutionstofindingoptimal(weighted)matchings,butwewillnotgointofurtherdetailshere.TheinterestedreaderisreferredtoGibbons[1985].4.2HamiltoncyclesWhereEulertoursfocusontraversingeveryedgeinagraph,Hamiltonwalksdealwithtraversingeveryvertexinagraph.Inthissectionwecon-centrateontheproblemoftryingtoconstructa(closed)walksuchthateveryvertexisvisitedexactlyonce.Asweshallsee,notonlyisthisanimportantproblem,italsoturnsouttobenotoriouslydifficultifwewanttooptimizeonthedistancetraveled.PropertiesofHamiltoniangraphsWestartwithpreciselydefiningwhataHamiltoniangraphis,alongwithanumberofexampleapplications.Definition4.2:ConsideraconnectedgraphG.AHamiltonpathofGisapaththatcontainseveryvertexofG.Likewise,aHamiltoncycleisacyclecontainingeveryvertexofG.GiscalledHamiltonianifithasaHamiltoncycle.Whatmakestheissueof(non)Hamiltoniangraphssodifficultisthat,incon-trasttoEulertours,thereisnoknownefficientprocedurebywhichonecaningeneraldeterminewhetheragraphisHamiltonianornot.Ontheotherhand,findingHamiltoncycles,orclosedtrailsthatminimizethenumberofduplicatevisitstoavertexisimportant.Toillustrate,considerthefollowingtwoproblems,whicharerepresentativeforawiderangeofapplications.Transportationproblems:Considerschedulingaservicethatneedstopickuppeopleatndifferentlocations.Theproblemistofindthemosteffi-cientroute(e.g.,expressedinthesmallesttravelingdistance)suchthatallnlocationsarevisited.Thisproblemcanbeformulatedintermsofaroadmapwithlocationsrepresentedasverticesandroadsbetweenpairsoflocationsasweightededges.WeareinterestedinfindingaminimalweightedHamiltoniansubgraphcontainingallvertices,pos-siblyafterexpandingthegraphtoaccountfortraversinganedgemore92 thanonce.Therearemanyvariationsonsuchtransportationprob-lems,ofwhichaniceoverviewisprovidedbyApplegateetal.[2007].Wereturntothisproblemlaterinthischapter.Drillingholes:Therearemanycasesinwhichweneedtodrillholesinaboard,suchasforelectricalcircuits.Thisrequirestheschedulingofadrillingmachinebywhichholesaredrilledonebyone.Tominimizethetimeittakestodrillallholes,weshouldminimizethedistancethatthemachine(orequivalently,theboard)needstomakewhenmovingfromholetohole.Wecanmodelthisproblemasacompletegraphwiththeverticesformingtheholestobedrilledandtheweightoneachedgerepresentingthegeometricdistanceoftheedgestwoendsontheboard.AnoptimalschedulecorrespondstoaminimalweightedHamiltoncycle.Toillustrate,Figure4.5(a)showsanexam-pleinwhichsome2400pointsneedtobedrilledintoaboard.Fig-ure4.5(b)showsonepossibleschedule,whereasFigure4.5(c)showsanoptimalsolutioninwhichthemachineneedsto“travel”halfthedistanceofthepreviousschedule.TheexampleisdiscussedinmoredetailbyGrotschelandPadberg[1993].¨Thesetwoexamplesareinstancesofwhatisknownasthetravelingsalesmanproblem.Asmentioned,aseriousissueisthattherearenoknownefficientsolutionsfordeterminingwhetheragraphisHamiltonianornot.Worse,ifweareinterestedinfindingaminimal-weightedHamiltoncycle,wewillhaveatoughproblemtosolveasitwillmostlikelyrequirealotofcomputationalresources.Consideringthemanyapplicationsrelatedtotravelingsalesmanproblems,itshouldcomeasnosurprisethatresearchersandpractitionershavepaidconsiderableeffortstofindingefficientmethodsfor(near-)optimalsolutions.Fortunately,therearesomereasonablestartingpointstofindinggoodsolutions.Forone,wehavethefollowingnecessaryconditionforagraphtobeHamiltonian:ifweconsiderasubsetSoftheverticesofagraphandsubsequentlyremovethosevertices,thegraphshouldfallapartintoatmostjSjcomponents.Moreformally:Theorem4.4:IfgraphGisHamiltonian,thenforeverypropernonemptysubsetSV(G),wehavethatw(GS)jSj.Proof.LetCbeaHamiltoncycleofG.IfweconsideranypropernonemptysubsetSV(G),thenobviously,becauseeveryvertexisvisitedexactlyonce,thenumberofcomponentsinCSwillbelessorequaltojSj.How-ever,becauseCcontainsallverticesofG,wealsohavethatw(GS)w(CS),whichcompletestheproof.93 ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................(a)................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................(b)...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................(c)Figure4.5:Anexampleofschedulingadrillingmachinewith(a)theholesthatneedtobedrilled,(b)aschedule,and(c)anoptimalschedule.Takenwithpermissionfrom[GrotschelandPadberg,1993].¨94 Note4.5(Moreinformation)Thisisoneofthoseexampleswhereasimplediagramhelpstounderstandwhatisgoingon.Figure4.6showsagraphGandanarbitrarysetSofver-ticesfromG.WehavealsosketchedaHamiltoncycleC,whichrunsthrougheveryvertexinS.Effectively,wesplitthecycleCintoalternatingsegmentsS1,S1,S2,S2,...,Sn,Sn,eachsegmentSiconsistingofanumberofconsecutiveverticesfromS,andeachsegmentSiconsistingofconsecutiveverticesnotinS.Inthe“worst”situation,eachsubgraphinducedbyasegmentSiisacom-ponentofthegraphGS,i.e.,G[Si]isdisconnectedfromtheotherpartsofGS.ThemaximalnumberofsegmentsconsistingofverticesoutsideSthatwecanobtain,iswheneachsegmentSiconsistsofexactly1vertex.Hence,thismaximalisequaltojSj.SSS12nSFigure4.6:SegmentationofaHamiltoncycleforanarbitrarysetSofvertices.TheprevioustheoremprovidesuswithanecessaryconditionforagraphtobeHamiltonian.In1952,themathematicianGabrielDiracprovedthefol-lowingsufficientcondition,whichessentiallystatesthatagraphisHamil-tonianifeachvertexisconnectedtoatleasthalfoftheotherverticesTheorem4.5(Dirac):IfGisasimplegraphwithn=jV(G)jvertices,n3andeachvertexvhasdegreed(v)n/2,thenGisHamiltonian.Proof.Arelativelysimpleproofisbycontradiction:assumethetheoremisfalse.LetGbeanon-Hamiltoniangraphwithn3verticesandforwhichd(v)n/2foreachofitsvertices.Moreover,assumethatGhasamaximalnumberofedges,i.e.,addingasingleedge(whilekeepingGsimple)wouldmakeitHamiltonian.Letuandwbetwononadjacentvertices.Byconstruc-tionofGweknowthatifweaddanedgee=hu,wi,theresultinggraphGwouldbeHamiltonian,andthusthereexistsaHamiltonpath(u,w)-pathPinGwith[u=v1,v2,...,vn=w],asshowninFigure4.7(a).Nowconsiderthefollowingtwosetsofvertices:S=N(u)=fvijhu,vii2E(G)gandT=fvijhvi1,wi2E(G)gSconsistsoftheneighborsofu,whereasTconsistsofthesuccessorsonPofneighborsofw.NotethatjSjn/2.Likewise,becausePcontains95 vjvj-1vjvj-1uwuw(a)(b)Figure4.7:(a)AHamiltonpathinG,and(b)theconstructedHamiltoncycleinG.allverticesinG,Tcontainsasmanyelementsasthereareedgeshvi1,wi,whichcorrespondstod(w).ThismeansthatjTjn/2.Furthermore,vertexuisnotcontainedinS(becauseitcannotbeaneighborofitself),norisitcontainedinT(whichcontainsonlysuccessorsofotherverticesonP).Inotherwords,S,Tfv2,...,vng,which,togetherwiththefactthatjSj+jTjn,meansthatthetwosetshaveatleastonevertexincommon.Letthisbevertexvj.Wenowhavethesituationthatvjisaneighborofu,andthatvj’spredecessorvj1isaneighborofw.Butinthatcase,wecanconstructtheHamiltoncycle[u=v1,vj,vj+1...vn=w,vj1,vj2...v1=u],showninFigure4.7(b).Notethatthiscycledoesnotcontainedgehu,wi.Inotherwords,wehavejustshownthatGisHamiltonian,whichcontradictsourinitialassumption.Thismeansthatthereisnovertexvj2STandthusjSTj=0.Ontheotherhand,weknowthatu62S[T,sothatjS[Tji(T),thuscontradictingourchoiceofTandourassumptionthatTKruskalisnotoptimal.Supposethati(T)=k,meaningthatalledgese1,e2,...,ek1arebothedgesinTaswellasinTKruskal.ItcanbeeasilyseenthatthegraphT+ekcontainsauniquecycleC.LeteˆbeanedgeofCsuchthateˆ62E(TKruskal),buteˆ2E(T).BecauseeˆliesonC,itcannotbeacutedgeofT+ek.ThisalsomeansthatTˆ=(T+ek)eˆisalsoaconnectedsubgraphofG,andthusalsoaspanningtree.Notethatthetotalweightw(Tˆ)ofTˆisequaltow(Tˆ)=w(T)+w(ek)w(eˆ)Animportantobservationisthatedgeekwaschosentobeonewithminimalweightthatkepttheconstructedsubgraphuptothatpointacyclic.Clearly,thegraphinducedbyedgese1,e2,...,ek1,eˆisalsoacyclic,sothatwemustconcludethatw(eˆ)w(ek),andhence,w(Tˆ)w(T).ThiscanonlymeanthatTˆisalsooptimal.However,becauseek2E(Tˆ),weknowthati(Tˆ)>i(T),whichcontradictsourchoiceofT,namelyasthetreewiththelargestvaluefori.5.4RoutingincommunicationnetworksTreesplayaprominentroleincommunicationnetworks,whosemainjobisensuringthatmessagesaresentfromtheirsourcetotheirintendeddes-tination(s),alsoreferredtoasmessagerouting.Howmessageroutingisaccomplishedislaiddowninaroutingprotocol:acollectionofspecifica-tionsdescribingexactlywhattodowhenanodeinanetworkreceivesamessagefromsourceAthatisdestinedfornodeB.Ingeneral,anodeinacommunicationnetworkcanbeviewedasconsistingofseveralinterfaces,whereeachinterfaceconnectsthatnodetoexactlyoneothernodeinthenet-work.Inthisway,wecanrepresentacommunicationnetworkasagraphwithnodesasverticesandlinksbetweentwonodesasedges.Aninterfaceisactuallytheendpointofalink,anditsrepresentationcoincideswiththevertexrepresentingthenodetowhichthatlinkisattached.Anodeusuallymaintainsaroutingtable.Eachrowinthistablespeci-fiestowhichinterfaceamessageshouldbeforwarded,givenitssourceanddestination,andoptionallyalsotheinterfacethroughwhichitarrived.Animportantfunctionofaroutingprotocolisconstructingthesetables.ThisisexactlywhatweestablishedwhendiscussingDijkstra’sshortestpathal-gorithminSection3.2:eachnodemaintainedexactinformationonthenextclosestnodetowhichamessageshouldberouted,includinghowfarames-sagewouldstillneedtotravel.119 Crucialforroutingisthatmessagesarenotendlesslyforwarded.Techni-cally,thismeansthatforeverydestinationumessagesshouldfollowpathsinaspanningtreethatissaidtoberootedatu,hencecalledarootedtree.Inparticular,andanalogoustowhatwealsomentionedinSection3.2,rootedinthiscasemeansthatweareinterestedonlyin(v,u)-paths,wherevindi-catesthesourcenode.Withubeingthedestinationnode,sucharootedtreeisalsocalledasinktreeforu.Dijkstra’salgorithmTheissueforroutingprotocolsistoconstructthesesinktrees,oneforev-erynodeinthenetwork.AfamousoneisDijkstra’salgorithm,whichwealreadydiscussedinChapter3.There,weillustratedhowthealgorithmworksforundirectedgraphs.Itisnotdifficulttoseethatthealgorithmalsoworksfordirectedgraphs,and,infact,thatitcanbeeasilyformulatedtoconstructsinktrees.Theonlyrestrictionwedemandisthattheweightas-sociatedwithanarcisnonnegative.Dijkstra’ssolutionforconstructingop-timalroutesissoimportantthatitisworthwhilealsoconsideringitscoun-terpartfordirectedgraphs.Forexample,itiswidelydeployedincommuni-cationnetworkswhereitisknownasalink-stateroutingprotocol(see,forexample,Moy[1995]).ThefollowingdescriptionofthealgorithmisnearlyidenticaltotheonegiveninChapter3,exceptthatwenowconstructpathstotherootvertexu.Algorithm5.2(Dijkstra,sinktreeconstruction):Consideradirected,weightedgraphDwhereweightsarenonnegative,andavertexu2V(D).Weintroducethefollowingsetsandlabels:•LetSt(u)bethesetofverticesfromwhichashortestpathtovertexuhasbeenfoundafterstept.•EachvertexvisassignedalabelL(v)defL(v),L(v),inwhichL(v)is=121thevertexsucceedingvintheshortest(v,u)-pathfoundsofar,andL2(v)thetotalweightofthatpath.•LetRdeft(u)=St(u)[v2St(u)Nin(v),withNin(v)denotingthesetofin-neighborsofv.Inotherwords,Rt(u)consistsofallverticesinSt(u)andtheverticesfromwhereSt(u)canbereachedthroughanarc.1.Initializet0andS0(u)fug.Furthermore,forallv2V(G):((u,0)ifv=uL(v)(,¥)otherwise120 2.Foreachvertexy2Rt(u)nSt(u),considertheverticesNout0(y)thatareout-neighborsofythatlieinS0deft(u),i.e.,Nout(y)=Nout(y)St(u).Selectx20!Nout(y)forwhichL2(x)+w(hy,xi)isminimal.SetL(y)x,L2(x)+w(e).3.Letz2Rt(u)nSt(u)forwhichL2(z)isminimal.SetSt+1(u)St(u)[fzg.IfSt+1(u)=V(G),stop.Otherwise,tt+1,computeRt(u)againandrepeatthepreviousstep.Toillustratethisalgorithm,letusreconsiderthegraphfromFigure3.4,butnowwithitsedgesbedirected,asshowninFigure5.7.Whatweseeisthatwecanapplythesamesteps,but,ofcourse,becausethegraphisnowdirected,weobtainadifferent(directed)treerootedatvertexv0.Again,wecanformulatethisversionofDijkstra’salgorithminpseudo-code,whichislefttothereader.Note5.5(Mathematicallanguage)Despiteourdeliberateuseofformalnotations,bynowitshouldbeclearfromthemathematicaldescriptionwhattheprinciplebehindDijkstra’salgorithmis.EverytimewehavecompletedthesetS(u),weattempttoexpanditbyaddingavertexfromthenextringofverticesfromwhereS(u)canbereached,andsubsequentlyaddthevertexclosesttou,asshowninFigure5.8.RSuFigure5.8:AnillustrationoftherelationbetweenSandR.ToproperlyunderstandalgorithmssuchastheonefromDijkstra,itisimportanttodevelopthesetypeofhigh-levelinsights.Drawingsgenerallyhelpalotandforceyoutotranslatethemathematicalconceptsintosimplerprinciples,inturn,assistinginunderstandingthoseconcepts.AlthoughDijkstra’salgorithmisrelativelysimple,itisnotobviousthatitisalsocorrect.WefollowGoodrichandTamassia[2002]inprovingitscorrectness.121 v21v5v2(0,3)1v523234545v0v0v1v4v7v1v4(0,6)v76671327132v2v34v6v3(0,1)4v6v0(0,3)(4,11)(4,9))'/R97.9998Tf-138.5436.535Td(v1v2(0,3)v5112234345v05(0,6)v3v1v4(0,6)v766(3,8)(3,8)13271327v2v5v24(0,1)4v0v0(0,3)(4,11)(4,9))'/R97.9998Tf-138.5436.535Td(v1(0,3)v4v7(4,11)(4,9))'23.15727.532Td[((6,1)74.003(1))]TJ/R97.9998Tf-161.699.0031Td(v11123423455(0,6)v3v6(0,6)v3(3,8)6(3,8)671327132v2v5v2(0,1)4(0,1)4v0v0(0,3)(4,11)(4,9))'23.15727.532Td[((6,1)73.948(1))]TJ/R97.9998Tf-161.699.0035Td(v1(0,3)v4v7(4,11)(4,9))'23.15727.532Td[((6,1)74.003(1))]TJ/R97.9998Tf-161.699.0035Td(v11123423455(0,6)v3v6(0,6)v3(3,8)6(3,8)671327132(0,1)4(0,1)4Figure5.7:ApplyingDijkstra’salgorithmtoconstructasinktreeinaweighteddirectedgraph.Theorem5.8:GivenaweighteddirectedgraphD.WhenapplyingDijkstra’salgo-rithmtoavertexu,eachtimeavertexzisaddedtothesetSt(u),L2(z)correspondstothelengthofashortest(z,u)-path.Proof.Bycontradiction.Letd(w,u)denotethetotalweightofanoptimal(w,u)-path.LetzbethefirstvertexthatwasaddedtoanSt(u)forwhichL2(z)>d(z,u).Inotherwords,upuntilandincludingsteptwehavethatforallverticesv2St(u),L2(v)=d(v,u),butSt+1(u)contains,forthefirsttimeavertexzforwhichL2(z)>d(z,u).Becausezwasselected(aftertsteps),weknowthatL2(z)<¥andthusthatthereisa(z,u)-path.In122 particular,theremustbeashortest(z,u)-path,sayP.Letybethelastvertexonthatpath(fromztou)thatisnotinSt(u),andxitssuccessor(andthusinSt(u)).Bychoiceofz,weknowthatL2(x)=d(x,u),i.e.,L2(x)isequaltothetotalweightofanoptimal(x,u)-path.Whenxwasselected(say,atstept0),wealsoevaluatedyandpossi-blyadjustedL2(y)sothatthevalueofL2(y)isinanycaseatmostL2(x)+!!w(hy,xi),i.e.,L2(y)L2(x)+w(hy,xi).Ontheotherhand,becauseyisontheshortest(z,u)-pathP,xisthesuccessorofyonP,andL2(x)=d(x,u),!!wenecessarilyhavethatL2(x)+w(hy,xi)=d(x,u)+w(hy,xi)mustcorre-!spondtothelengthofashortest(y,u)path,i.e.L2(x)+w(hy,xi)=d(y,u).However,wehavetorealizethatywasnotselectedtobeincludedinanSt(u),whichcanonlymeanthatL2(z)L2(y).Becauseyisonashortest(z,u)-path,wealsohaved(z,y)+d(y,u)=d(z,u)andbecaused(y,u)0,wenowhavethat:1.L2(z)L2(y)2.L2(y)=d(y,u)3.d(y,u)d(y,u)+d(z,y)4.d(y,u)+d(z,y)=d(z,u)andthusthatL2(z)d(z,u),contradictingourchoiceofz.Hence,theassumptionthatthereexistsazthatwasaddedtoSt(u)withL2(z)>d(z,u)isfalse,completingourproof.TheBellman-FordalgorithmAnimportantobservationisthatinordertoexecuteDijkstra’salgorithm,weneedtoknowexactlywhatthegraphlookslike.Inotherwords,weneedtoknowwhichverticesareadjacenttoeachotherandwhattheweightoftheirrespectiveconnectingedgesare.Wesaythatweneedtoknowthetopol-ogyofthegraph.Inpractice,whenanodeuinacommunicationnetworkreceivesamessageintendedfornodev,itneedstoforwardthatmessagealongtheoptimalsinktreeforv.Thesameholdsforanyotherincomingmessageregardlessitsdestination,Asaconsequence,nodeuwillhavetoprecomputetheoptimalsinktreeforeachnodeinthenetwork.Inrealnet-works,wethereforeseethatthetopologyofanetworkisfirstspreadtoallnodesinthatnetwork(and,ofcourse,onaregularbasisbecausenetworkschange).Giventhissituation,onecanaskwhetheritispossibletocomputeop-timalsinktreeswithouthavingtoknowthetopologyinadvance.Infact,it123 isactuallynotnecessarythatanodeneedstoknowacompletesinktree,aslongasitknowstowhichnextnodeitshouldforwardanincomingmessageandthatthisforwardingisdonealonganoptimalsinktree.Asolutiontothisproblemwasprovidedbyseveralpeople,butisgenerallyknownastheBellman-Fordalgorithm.ItwasthebasisforoneofthefirstwidelyappliedroutingprotocolsintheInternet,butforreasonswebrieflydiscussbelow,ithasbeenlargelyreplacedbyprotocolsbasedonDijkstra’salgorithm.Theprotocolcanbecompletelydescribedfromtheperspectiveofanode.Tothisend,weproceedinroundsbylettingeachnodevicomputetheop-timalpathtoothernodesbasedontheinformationthatisavailabletoviinthatround.Letdt(i,j)denotethetotalweightoftheoptimal(vi,vj)paththatvertexvihasfoundafterroundt.Wedenotethistotalweightastheroutingcostofgettingamessagefromvitovj.Initially,wehave(00ifi=jd(i,j)¥otherwiseInotherwords,weleteachnodeinitiallysetthecosttoitselftobezero,andthecosttoanyothernodeasinfinite.Wenowletviadjustitsvalueofdtasijfollows:dt+1(i,j)minw(v,v)+dt(k,j)ikk2N(vi)inwhichN(vi)isthecollectionofneighboringnodesofviandw(vi,vk)theweightofedgehvi,vki.Notethatassoonasdtbecomesanythingelsethanijinfinite,viwillhavediscoveredapathtovj.Inparticular,afterthefirstround,viwilldiscoverapathtoeachofitsrespectiveneighbors,namelythepathconsistingoftheedgeconnectingvitothatneighbor.Aftertworounds,optimalpathsoflength2willhavebeendiscovered,andsoon.Inpractice,thealgorithmisimplementedbylettingnodesexchangein-formationfoundintheirrespectiveroutingtables.ConsidertheundirectedversionofFigure5.7.Eachnode(whichisrepresentedbyavertexofthegraphshowninFigure5.7)willinitiallyknowonlyaboutitselfandnoothernode.Afteroneround,theroutingtablesforeachnodewillbeasshowninFigure5.9.Weusethenotation(d,v)toindicatethatapathofcostdhasbeenfound,forwhichmessagesaretobeforwardedtoadjacentnodev.Nowconsidernodev1,who,afteroneround,hasdiscoveredpathstov2andv3.Atacertainmoment,nodev2andv3willeachpasstheirroutingtabletov1.Assumethatv3wasfirst.Inthatcase,v1learnsthatv3hasdiscoveredapathtonodev0atcost1.Becausev1hasapathtov3,ithasnowdiscoveredapathtov0atcost8,forwhichitneedonlyforwardmessagestoitsneighborv3.However,assoonasv2haspasseditsroutingtableto124 Destinationv0v1v2v3v4v5v6v7v0:(0,v0)(3,v2)(1,v3)(6,v4)v1:(0,v1)(2,v2)(7,v3)v2:(3,v0)(2,v1)(0,v2)(1,v5)v3:(1,v0)(7,v1)(0,v3)(4,v6)v4:(6,v0)(0,v4)(5,v5)(3,v6)v5:(1,v2)(5,v4)(0,v5)(4,v7)v6:(4,v3)(3,v4)(0,v6)(2,v7)v7:(4,v5)(2,v6)(0,v7)Figure5.9:TheroutingtablesforthenodesintheundirectedversionofFigure5.7,afteroneroundoftheBellman-Fordalgorithm.v1,thelatterwilldiscoverabetterpathtov0,namelyoneviav2andattotalcostw(v1,v2)+d(v2,v0)=2+3=5.Completelyanalogous,v0willeventuallypassitsroutingtabletov2,inwhichcasev2willdiscoverapathtov3atcostw(v2,v3)+d(v0,v3)=3+1=4.Itcanbereadilyverifiedthatafterthesecondround,theroutingtableswillbeasshowninFigure5.10.Notethattherearetwodifferentpathsofequalcostbetweennodesv3andv4.Destinationv0v1v2v3v4v5v6v7v0:(0,v0)(5,v2)(3,v2)(1,v3)(6,v4)(4,v2)(5,v3)v1:(5,v2)(0,v1)(2,v2)(7,v3)(3,v2)(11,v3)v2:(3,v0)(2,v1)(0,v2)(4,v0)(6,v5)(1,v5)(5,v5)v3:(1,v0)(7,v1)(4,v0)(0,v3)(7,v0)(4,v6)(6,v6)v4:(6,v0)(6,v5)(7,v6)(0,v4)(5,v5)(3,v6)(5,v6)v5:(4,v2)(3,v2)(1,v2)(5,v4)(0,v5)(6,v7)(4,v7)v6:(5,v3)(11,v3)(4,v3)(3,v4)(6,v7)(0,v6)(2,v7)v7:(5,v5)(6,v6)(5,v6)(4,v5)(2,v6)(0,v7)Figure5.10:TheroutingtablesforthenodesintheundirectedversionofFigure5.7,aftertworoundsoftheBellman-Fordalgorithm.125 Reconsidertheroutingtablefornodev1.Again,v2willeventuallypassitsnowupdatedtabletov1,reportingacostofd(v2,v3)=4ofapathitdiscoveredtov3.Assoonasv1obtainsthisinformation,itwillhavefoundabetterpathtov7thanthedirectconnectionthroughedgehv1,v3i,namelyviav2.Thecostforthispatharew(v1,v2)+d(v2,v3)=2+4=6.Hence,v1willadjustitsroutingtableaccordingly.Notethattheonlythingv1knows,isthatmessagesfordestinationv3shouldberoutedviav2.Inparticular,v1isunawareofthelengthofitsnewlydiscoveredpathtov3,i.e.,thenumberofedgesofthatpath.TheBellman-Fordalgorithmisparticularlyattractivebecauseitallowseachnodetograduallydiscoveroptimalpathstothecurrentlyreachablenodesinthenetwork.Itisimportanttorealizethatthealgorithmiscom-pletelydecentralized:alldecisionsthatanodetakesconcerningoptimalroutesisbasedentirelyonlocalinformation,withouttheneedtobecom-plete.Incontrast,Dijkstra’salgorithmrequiresthatthecompletetopologyofthenetworkisfirstdisseminatedtoeachnodebeforeeachcanstartcomput-ingoptimalroutes(i.e.,sinktrees).Nevertheless,thealgorithmhadsomeseriousdrawbacksinpractice,eventuallymakingitlesspopular.Furtherinformationonapplyingtheprotocolinpractice(whereitisgenerallyre-ferredtoasadistance-vectorroutingprotocol)canbefoundin[MalkinandSteenstrup,1995].Note5.6(Moreinformation)ThereisoneparticularlynastyprobleminherenttotheBellman-Fordprotocol.Consideranetworkinwhichthenodesareorganizedasastraightline:v1v2v3v4v5v6Assumethatthedistancebetweentwoadjacentnodesisalways1(i.e.,d(vi,vi+1)=1).Eventually,thenodeswillbuildthefollowingroutingtables:Destinationv1v2v3v4v5v6v1:(0,v1)(1,v1)(2,v1)(3,v1)(4,v1)(5,v1)v2:(1,v1)(0,v2)(1,v3)(2,v3)(3,v3)(4,v3)v3:(2,v2)(1,v2)(0,v3)(1,v4)(2,v4)(3,v4)v4:(3,v3)(2,v3)(1,v3)(0,v4)(1,v5)(2,v5)v5:(4,v4)(3,v4)(2,v4)(1,v4)(0,v5)(1,v6)v6:(5,v5)(4,v5)(3,v5)(2,v5)(1,v5)(0,v6)126 Nowsupposethatthelinkbetweennodev1andv2breaks.Inotherwords,v2cannolongerdirectlyreachv1.Asaconsequence,nodev2willhavetodiscoveranalternativeroutetov1,and“fortunately,”noticesthatv3isadvertisingthatithasapathtov1ofroutingcost2.Ofcourse,thisadvertisedpathis[v3,v2,v1],butthisinformationiswithheldfromv2.Theonlythingthatv2getstoknowfromv3isthatthelatterhasdiscoveredapathtov1ofcost2.Nodev2willthenupdateitsrouting-tableentryforgettingtonodev1with(3,v2)andadvertisethatithasdiscoveredapathtov1ofcost3.Theproblemwillnowbecomeclear:v3hadregisteredtheentry(2,v2)basedontheinitiallyadvertisedroutingcostbyv2(whichwas1),anditsownroutingcostofgettingtov2(also1).Nowthatv2isadvertisingaroutingcostof3,v3willadjustitsentryto(4,v2),andsubsequentlyadvertisearoutingcostof4togettov1.Assoonasthisnewrouting-costinformationreachesv2,itwilladjustitsadvertisedcostfrom3to5.Thisprocesswillnotstopaslongasthelinkbetweenv1andv2remainsdefect.Theresultisknownasthecount-to-infinityproblemwhichturnedouttohavenoeasyfix.Inpracticalsettings,theBellman-Fordalgorithmisusedwithafulladvertisementofthepath,al-lowinganodetodiscoverwhetheritispartofthatpath,avoidingthemistakeofchoosingapathwithaknownbrokenlink.AnoteonalgorithmicperformanceRealizingthatDijkstra’salgorithmaswellastheBellman-FordalgorithmlieattheheartofsomeofthemostimportantroutingalgorithmsintheIn-ternet,itisworthwhileseeinghowefficientthesealgorithmsactuallyare.Inparticular,wecanaskourselveshowlongitwilltaketofindasinktreeasafunctionofthenumberofvertices.Asitturnsout,inmostcasesDi-jkstra’salgorithmwilloutperformtheBellman-Fordsolution.Inparticu-lar,whengraphsarelargeandhavemanyedges,Dijkstrawillgenerallybemoreefficient.Toillustrate,Figure5.11showsthetimetocomputeasinktreeasafunctionofthesizeofthegraph(expressedinthenumberofver-tices).Figure5.11(a)showstheresultsforaso-calledgridgraph:agraphinwhichtheverticesandedgesareorganizedasinatwo-dimensionalgrid.Figure5.11(b)showsthetimeneededtocomputeasinktreeinacompletegraph.Indeed,wecanseethatBellman-FordoutperformsDijkstra’salgo-rithmforgridgraphs,butnotforcompletegraphs.Theseresultsarenotsosurprisingwhentakingacloserlookatthenum-berofalgorithmicstepsthatweneedtotakeforeachalgorithm.LetusfirstconsiderDijkstra’salgorithm.Ateachstept,weneedtoinspectallverticesinRt(u)nSt(u),afterwhichweexpandSt(u)withonevertex.Ifndenotesthetotalsetofvertices,theneachstepthusrequiresconsideringintheorderofnvertices,whichwerepeatntimes.Inotherwords,wecanexpectthat127 DijkstraBellman-FordComputationaltimeComputationaltimeDijkstraBellman-FordNumberofverticesNumberofvertices(a)(b)Figure5.11:Thetimeneededtocomputeasinktreein(a)agridgraphand(b)acompletegraph.thecomputationaltimeofDijkstra’salgorithmisroughlyproportionalton2.FortheBellman-Fordalgorithm,weobservesomethingdifferent.Ateachstep,eachvertexneedstoinspecttheinformationcollectedateachofitsneighbors.Intotal,theverticesneedstoinspectroughlymothervertices,wheremisthetotalnumberofedges.Thetotalnumberofstepsweneedtoperformisequaltothelengthofthelongestshortestpathandcanbeshowntoincreaseproportionaltothenumberofvertices.Hence,thecomputa-tionaltimeoftheBellman-Fordalgorithmisapproximatelyproportionaltonm.Westressthatthesearemerelyback-of-the-envelopecalculations.Indeed,whenconsideringthattheminimalnumberofedgesthatweneedforagraphofnverticestobeconnectedisequalton1(asweshowedinTheo-rem5.1),wecouldequallyarguethattheBellman-Fordalgorithmwilltakeatleastalsointheorderofnn1n2timeunitstocomplete.Morede-tailsneedtobeconsideredtoarriveatmoreaccuratecalculations,butwhichgoesbeyondthescopeofthistext.Whatourcalculationsdoshow,isthatthemoreedgesagraphhas,wemayindeedexpectthattheBellman-FordalgorithmperformscomparativelylessthanDijkstra’salgorithm.Note5.7(Mathematicallanguage)Above,westatedthatweneededtoconsiderintheorderofnvertices.ThiscanbemademathematicallypreciseusingwhatisknownasthebigOnotation,whichallowsustodescribethebehaviorofafunctionf(x)forlargevaluesofx.Thebasicideaisthatwewanttocapturewhatcanbecalledthedominatingcomponentsofafunction.Forexample,thefunctionf(x)=ax2+bx+cis128 completelydominatedbythetermax2whenxbecomesverylarge.Theothertermseventuallyhardlyplayaroleanymore,regardlesshowbigtheconstantsbandcare.Infact,theformoff(x)iscompletelydeterminedbythetermx2.Formally,wewritef(x)O(g(x))whenthereexistsaconstantM>0suchthatforallx>x0wehavethatjf(x)jM0jg(x)j.Notethatf(x)W(g(x))if0andonlyifg(x)O(f(x)).Finally,afunctionf(x)caneventuallyhaveexactlythesameformasanotherfunctiong(x),ormoreprecisely,thereexistconstantsMandM0suchthatforallx>x0wehavethatM0jg(x)jD,wesimplydiscardthoseh(d).Obviously,ån1h(d)=n.Toillus-d=0trate,considerthegraphsinFigure6.1,whichwewilldenoteasGAcomplexandGBcomplex,respectively.Fromthefigure,wemaysuspectthattheyaredifferent(and,infact,ifweconsiderotherembeddingsthisdifferencewillbemoreevident),butexpressingthisdifferencemaybesomewhatdifficult.However,whenconsideringtheirrespectivedegreedistributions,weseethatweareindeeddealingwithtwoverydifferentgraphs.Tocompletethissimpleanalysis,wenotethatbothgraphshave100vertices,withthegraph134 fromFigure6.1(a)having300edges,andtheonefrom(b)having317edges.Therearedifferentwaystovisualizedegreedistributions.Aboveweusedhistograms.Wecanalsoconsiderthefractionofverticesthathaveacertaindegree,i.e.,drawh(d)/n.Thistechniqueisactuallyusedtoap-proximatetheprobabilityP[d(u)=d]thatavertexuhasdegreed.An-othertechniqueistofirstordertheverticesaccordingtotheirdegree,andthenplotthedegreevertex.Effectively,weconsiderthedegreesequence[d1,d2,...,dn]ofagraphandsubsequentlyplotdkforeachk.Toillustrate,Figure6.2showsthisalternativewayofdisplayingvertexdegreesforourtwoexamplegraphsfromFigure6.1.10129108876654422040608010020406080100(a)(b)Figure6.2:VisualizingthevertexdegreesofGAcomplexandGBcomplexafterrankingtheverticesaccordingtotheirdegree.They-axisshowsthevertexdegree,thex-axistherespectivevertexrank.Whendisplayingvertexdegrees,wesometimesalsoneedtoconsiderthescalingoftheaxis.Considerthefollowingexampleofa10,000-nodegraph,asshowninFigure6.3(whichwediscussinmoredetailinChapter7).Asinourpreviousexample,weranktheverticesaccordingtotheirdegreeandsubsequentlyplotthevertexdegreeofeachkthvertex.InFigure6.3(a)wehaveusedalinearscaleforbothaxes.Unfortunately,weseethatmostver-ticeshavethesame,lowdegree,implyingthatitisdifficulttoseewhatisgoingon.InFigure6.3(b)wehaveusedlogarithmicscalesforbothaxes.Inotherwords,thedisplayeddistancebetweentwopointsonanaxisisproportionaltothelogarithmoftheactualdistancebetweenthosetwopoints.Toillus-trate,thedisplayeddistancebetweenx=10andx=100isthesameastheonebetweenx=100andx=1000.Theresultisdramatic:wecannoweasilyimaginethatastraightlinethroughallthedatapointscanbedrawn,implyingthatthevertexdegreedistributionfollowssomekindofexponentialfunction.WewillreturntotheseissuesinChapter7.Ingen-eral,displayingthedistributionofvertexdegreesinmanycasesprovidesa135 1500100050010002001005005020010020004000600080001000010100100010000(a)(b)Figure6.3:Differentrepresentationsofvisualizingvertexdegrees:(a)usinglinearscalesfortheaxes,and(b)usinglogarithmicscales.lotofinformation,andweshallmakeuseofthistechniquequiteoften.Note6.1(Moreinformation)Inmanycases,beingabletodisplayavertexdegreedistributionallowsustomoreadequatelyapplyatechniqueknownascurvefitting.Thisisawell-knownstatisticaltechniquebywhichwetrytofinda(continuous)functionf(x)throughasetofdatapoints,suchthatthetotalerrorwemakeisminimal.Toexplain,considerthedegreesequence[d1,d2,...,dn].Inthiscase,wehavendatapoints(k,dk).Whenfindingasuitablecurvethroughthesedatapoints,wewillbegenerallylookingforarelativelysimplefunctionf(x),inturnim-plyingthatwewillnotalwayshaveanexactfitforeverydatapoint.Inotherwords,foreveryvalueofktherewillbeadifferencebetweenf(k)anddk.Inpractice,wethentrytofindafunctionthatminimizestheso-calledleastsquareerrore:n2e=ådkf(dk)k=1Othererrormetricsarealsopossible.Mostpackagesfordataanalysisordataplottinghavefacilitiesonboardforsimpleandoftenalsoadvancedcurvefit-ting.Wewillnotdelveintoanyfurtherdetails.Moreinformationonthetech-nicalitiescanbefoundin[Juddetal.,2009].DegreecorrelationsBesidesjustdisplayingvertexdegrees,weareofteninterestedtowhatex-tentverticesofthesameordifferentdegreesarealsojoined.Forexample,insocialnetworkshigh-degreeverticesseemtogenerallybejoinedtoeachother,whereasinmanytechnologicalnetworks,high-degreeverticesarejoinedwithlow-degreeones[Newman,2002].Theunderlyingphenomenon136 thatwearedealingwithisthatinreal-worldnetworksweoftenseethatsim-ilarnodestendtolinktoeachother,or,incontrast,thatthereisatendencyfordissimilarnodestohavelinks.Theextenttowhichthisphenomenonoccursisknownasassortativemixing.Similarityisdefinedbyallkindsofnetwork-specificproperties:thesubjectofWebpages,thepreferencesortasteofpeople,thenumberofsharedfilesinpeer-to-peercomputernet-works,etc.Thesepropertiesarenormallynotcapturedwhenmodelingreal-worldnetworks.Atbest,wecanassignatypetoavertexandthenaskourselvestowhatextentverticesofthesameordifferenttypearejoined(asisdiscussedbyNewman[2003b]).Amuchsimplerapproachistoconsideronlythevertexdegreeandtomeasurethedegreecorrelationbetweentherespectivedegreesoftwoadja-centvertices.Informally,thecorrelationbetweentwovariablesxandytellsustowhatextentwecanexpectthatifweseeachangeinx,wewillalsoseeachangeiny.Ifthecorrelationispositive,thenanincreaseinxshouldshowusalsoanincreaseiny.Inthecaseofanegativecorrelation,anin-creaseinxwillshowadecreaseinthevalueofy.Itisimportanttorealizethatwearedealingwithobservedchanges.Inotherwords,xandyaretwoobservablevariablessuchashumidityandthegrowthofaplantinthecaseofabiologicalsystem.Formally,correlationisdefinedthroughwhatisknownasacorrelationcoefficient:Definition6.1:Letxandybetwostochasticvariables,forwhichwehaveaseriesofobservationpairs(x1,y1),(x2,y2),...,(xn,yn).Thecorrelationcoefficientr(x,y)betweenxandyisdefinedas:1n(xx)(yy)defnåi=1iir(x,y)=qq1ån(xx)21ån(yy)2ni=1ini=1iwherexistheaverageoverthex’s:xdef1nx,andlikewiseydef1ny.i=nåi=1i=nåi=1iNotethattheexpressionforr(x,y)canbeslightlysimplifiedton(xx)(yy)defåi=1iir(x,y)=qqån(xx)2ån(yy)2i=1ii=1iNote6.2(Mathematicallanguage)Ifyouhaveneverseenformaldefinitionsofcorrelationsbefore,theycanbequiteintimidating.Forourpurposes,itismerelyimportantthatyouhavesomeintuitionofwheretheycomefrom.First,considertheexpression137 å(xix)(yiy).Eachterm(xix)measurestowhatextenttheobservedvaluexideviatesfromtheaverageobservedvaluesofx.Ifxandyarepositivelycorrelated,wewouldexpecttoseethateachproduct(xix)(yiy)wouldalsobepositive(andcertainlynonzero).Inessence,theonlythingthatwearedoingissimplycomputingtheaverageoveralltheseproducts,forwhichreasonwedividethesumbythetotalnumberofobservations,n.Sowhatarethesetermsinthedenominator?Aswejustmentioned,(xix)measuresthedeviationofxifromtheaverageoverallobservations.Inordertotrulycomparesuchdeviations,weneedtonormalizeourmeasurements.Inotherwords,weneedtomakesurethattherangesofvaluesthatwearecompar-ingaremoreorlessthesame,otherwisewewillbebiasingourmeasurementstowardsthevariablewiththelargestranges.Oneapproachistosimplydivideourobservationsbytheaveragedeviation,thatis,1nå(xix).However,forreasonsthatarebeyondthescopeofthistext,itiscommonpracticetouseaqdifferent“average,”namely1å(x2,whichisknownasthestandardnix)deviation.Itshouldbenotedthatthisexplanationdoesnotdojusttothemathematicalstatisticsunderlyingthedefinitionofthecorrelationcoefficient.Infact,thedefi-nitionshouldactuallybefine-tuned.MoreinformationcanbefoundinMandel[1984]orJuddetal.[2009].Takingthisformaldefinitionofcorrelationasourbasis,wecannowde-finethecorrelationbetweenvertexdegrees.Tothisend,wemakeuseofagraph’sadjacencymatrixA.RecallthatforasimplegraphGwithvertexsetV(G)=fv1,v2,...,vng,A[i,j]=1ifthereisanedgejoiningvertexviandvj,andotherwiseA[i,j]=0.Definition6.2:LetGbeasimplegraphwithdegreesequenced=[d1,d2,...,dn]andadjacencymatrixA.LetV(G)=fv1,v2,...,vngbesuchthatd(vi)=di.ThedegreecorrelationofGisdefinedas:nn(dd)(dd)A[i,j]defåi=1åj=i+1ijrdeg(G)=n2åi=1(did)whereddenotestheaveragevertexdegree,i.e.,1nd.nåi=1iThesimilaritybetweenr(x,y)andrdeg(G)shouldbeobvious.Exceptfortheuseoftheadjacencymatrix,itisseenthattheformoftherespectivenominatorsisvirtuallythesame,withdessentiallyreplacingbothxandyinr(x,y).Thatthesameholdsforthedenominatorcanbeseenwhenconsideringthatrr1dd)21dd)2=1dd)2å(iå(iå(innn138 Note6.3(Mathematicallanguage)NotehowweusedtheadjacencymatrixAtoelegantlysumupallpossibleedgesbetweentwovertices,butdiscardingthosethatarenotpartofG.Anequivalent,yetmoreconcisenotationisthefollowing:defåj>i(did)(djd)A[i,j]rdeg(G)=n2åi=1(did)inwhichcaseweassumethattheexactvaluesofiandjareclearfromthecontextinwhichthesummationisused.Foranalternativenotationinwhichtheadjacencymatrixisnotusedatall,weassumethattheedgesinGareindexedsuchthatei,j2E(G)ifandonlyif(1)thereisanedgejoiningvertexviandvj,and(2)i>j.Thisbringsusto:defåei,j(did)(djd)rdeg(G)=n2åi=1(did)Thedrawbackofthisnotationisthatitislessexplicitinexactlywhichvertexdegreesweshouldtakeintoaccount.Ontheotherhand,youcouldarguethatitexpressesmoreconciselywhatdegreecorrelationis.AnevensimplermetricforcapturingvertexcorrelationsisproposedbyLietal.[2005]whodefinethescale-freenessofagraph:Definition6.3:LetGbeasimplegraphwithdegreesequence[d1,d2,...,dn]andadjacencymatrixA.LetV(G)=fv1,v2,...,vngbesuchthatd(vi)=di.Thescale-freenesss(G)ofGisdefinedasnns(G)=åådidjA[i,j]i=1j=i+1Animportantobservationisthats(G)ismaximalwhenhigh-degreever-ticesareconnectedtoeachother.Inotherwords,thescale-freenessislargerwhenhubsareattachedtootherhubs,formingakindofcluster.However,thedrawbackoftheformjustgiven,isthatitmakesitdifficulttocomparegraphswitheachother.Therefore,weagainneedsomekindofnormal-ization.Thiscanbeachievedbyconsideringwhatthemaximalattainablescale-freenessisforallgraphswiththesamedegreesequence:Definition6.4:LetGbeasimplegraphwithdegreesequenced=[d1,d2,...,dn]andadjacencymatrixA.LetV(G)=fv1,v2,...,vngbesuchthatd(vi)=di.LetG(d)bethecollectionofgraphswithdegreesequenced.Thenormalized139 scale-freenessS(G)ofGisdefinedasnnddåi=1åj=i+1ijS(G)=maxfs(H)jH2G(d)gOfcourse,theprobleminthiscaseistofindthemaximalscale-freeness,whichboilsdowntofindingagraphHhavingdegreesequencedandamaximalvalues(H).Theprocedureistooinvolvedforourpurposes,andtheinterestedreaderisreferredtoLietal.[2005]forfurtherinformation.6.2DistancestatisticsBesidesvertex-degreedistributions,variousdistancestatisticsformanim-portantclassfornetworkanalysis.Thedistancebetweentwoverticesvandwinagraphisexpressedintermsofthelengthoftheshortestpathbetweenvandw.Definition6.5:LetGbeadirectedorundirectedgraphandu,v2V(G).The(geodesic)distancebetweenuandv,denotedasd(u,v),isthelengthofashortest(u,v)-path.Notethatwehavegivenanalternativedefinitionfordistance:inthecaseofweightedgraphs,thedistancebetweentwoverticesuandvisgenerallydefinedintermsofa(u,v)-pathhavingminimalweight.Thelengthofsuchapath,however,neednotbeminimal.Inpractice,whichtypeofdistanceismeantisgenerallyeasytounderstandfromthecontextinwhichitisused.Furthermore,wediscussedinChapter5howtocomputeshortestpaths,anddemonstratedthatthereareefficientwaystofindthosepaths.Notethatinanundirectedgraph,d(u,v)=d(v,u),butthatthisneednotbethecaseforadirectedgraph.Whatcanwelearnfromdistancestatistics?Again,theycanbeusedtoseetowhatextenttwonetworksaredifferentornot,butalsotogiveanindicationoftherelativeimportanceofeachofthenodesinanetwork.Letusfirstconsiderafewsimplemetrics(seealso[BrinkmeierandShank,2005]).Theeccentricityofavertexutellsushowfarthefarthestvertexfromuispositionedinthenetwork.Theradiusofanetwork,definedastheminimumoveralleccentricityvalues,isanindicationofhowdisseparatetheverticesinanetworkactuallyare.Finally,thediametersimplytellswhatthemaximaldistanceinanetworkis.Formally,wehave:Definition6.6:ConsideraconnectedgraphGandletd(u,v)denotethedistancebetweenverticesuandv.Theeccentricitye(u)ofavertexuinGisdefinedasmaxfd(u,v)jv2V(G)g.Theradiusrad(G)isequaltominfe(u)ju2V(G)g.140 Finally,thediameterofGisthemaximalshortestpathbetweenanytwovertices:diam(G)=maxfd(u,v)ju,v2V(G)g.Notethatthesedefinitionsapplytodirectedaswellasundirectedgraphs.Althoughthediametergivesususefulinformation,itmaynotbepower-fulenoughtodiscriminateamonggraphs.Anequallyimportantandrelatedmetricfornetworkanalysisistoconsiderthedistributionofpathlengths.Inparticular,Theaveragedistancebetweenverticescanprovideusefulin-formation.Definition6.7:LetGbeaconnectedgraphwithvertexsetV,andletd(u)denotetheaveragelengthoftheshortestpathsfromvertexutoanyothervertexvinG:def1d(u)=åd(u,v)jVj1v2V,v6=uTheaveragepathlengthd(G)isdefinedasdef11d(G)=åd(u)=2åd(u,v)jVjjVjjVju2Vu,v2V,u6=vThecharacteristicpathlengthofGisdefinedasthemedianoveralld(u).Note6.4(Mathematicallanguage)Recallthatthemedianoverasetofnnondecreasingvaluesx1,x2,...,xnisequaltox(n+1)/2incasenisodd.Ifniseven,themedianisoftentakenequalto(xn/2+xn/2+1)/2.Inotherwords,themedianseparatesthehighervaluesfromthelowervaluesintotwoequally-sizedsubsets.Asweshallseelater,thecharacteristicpathlengthisparticularlyimportantwhendealingwithnetworkswithonlyafewhigh-degreeverticesandmanylow-degreevertices.Note6.5(Moreinformation)Whyevenbotheraboutthecharacteristicpathlength?Theproblemwiththeaveragepathlengthisthatitscomputationbecomesquitecumbersomeforverylargegraphs.AsweexplainedinChapter5,thetimetocomputeallshortestpathstoagivenvertexfollowingDijkstra’salgorithmisroughlyproportionalton2,withnbeingthenumberofvertices.Inordertocomputetheaveragepathlength,weneedtocomputetheshortestpathsbetweenallpairsofvertices,ofwhichthecomputationaleffortisproportionaltoroughlyn3.Itisnotdifficulttoimaginethatforlargegraphs,with,saymorethanafewthousandvertices,thiscanindeedberathertime-consuming.Toillustrate,suchacomputationfora10,000-nodenetworkcaneasilytaketensofhoursonamoderndesktopcomputer.141 Asanalternative,wecanalsotrytoestimatetheaveragepathlength.Asitturnsout,thereareextremelyefficienttechniquestodothisforthecharacteristicpathlength,butnotfortheaveragepathlength.Consideringthatformanycasesthetwometricsreturnapproximatelythesamevalue,consideringthecharacteristicpathlengthisoftenpreferred.LetusconsiderthesemetricsforthegraphGsimpleshowninFigure6.4.Regardingtheeccentricityofeachvertexandaveragedistancesbetweenvertices,thesecanbeeasilyderivedbyconsideringthelengthoftheshortestpathsbetweenpairsofvertices,asshowninFigure6.4.Asaconsequence,theradiusofthegraphisequalto5,whereasthediameterisequalto9.Likewise,wecancomputetheaveragepathlengthofthegraphtobe4.29.Byorderingtheaveragepathlengthsofthevertices,weobtainthesequence[3.17,3.50,3.67,4.00,4.33,5.33,6.00],fromwhichwecomputethecharacter-isticpathlengthtobe4.Vertex1234567e(u)d(u)1531015337273.501212105247373.6743550742374.33724327069595.3325355344606164.00266772960596.0037233515053.17Figure6.4:ThedistancebetweenverticesofthegraphGsimple(left)andtheresultingeccentricityandaveragepathlengths.Tocompletethissection,forourgraphsfromFigure6.1wefindthefol-lowingvaluesforthesedistancemetrics,againillustratingthatwearein-deeddealingwithtwoverydifferentgraphs:MetricGAcomplexGBcomplexAverageeccentricity4.594.09Radius43Diameter65Averagepathlength2.962.67Characteristicpathlength2.952.63142 6.3ClusteringcoefficientAnother,oftenusedmetriciswhatisknownastheclusteringcoefficient.Theideabehindthiscoefficientisrathersimple:wewanttosee,foragivenvertexv,towhatextenttheneighborsofvarealsoneighborsofeachother.Inotherwords,towhatextentareverticesadjacenttovalsoadjacenttoeachother.Beforewedelveintoallkindsofformalities,letusbrieflyconsiderwhymeasuringclusteringisimportant.SomeeffectsofclusteringAcommonwaytowardspreadinginformationissimplyhavinganodeup-dateitsneighbors.Inturn,neighborscaninformtheirneighbors,andsoon.Therearemanyvariationstothismodel,suchashavinganodeselectonlyoneorafewofitsneighbors,ordecidingtostopspreadingupdateswhenitnoticesthataselectedneighboralreadyhastheinformation.Informally,thistypeofdisseminationisoftendescribedintheformofgossipingmodels,alsoknownasepidemicdissemination[Eugsteretal.,2004].Themodelisverygeneral:insteadofinformationwecanalsoconsiderspreadingofdis-eases,butalsovirusesovertheInternet.Anotherexampleisthatofformingofopinions,whichoftendependsonwhatthemajorityofyourcommunitythinks.Weshallreturntotheseissuesinmoredetailwhendiscussingpeer-to-peernetworksinChapter8.Whenconsideringreal-worldnetworks,weoftenseethattheyareorga-nizedasacollectionofinterconnectedgroups.Intermsofsocialnetworks,thismeansthatwecanoftenclearlydistinguishcommunitiesofnodeswithmanylinksbetweenitsmembers,yetrelativelyfewlinksbetweennodesthatbelongtodifferentcommunities.Actuallyindicatingwhichnodesbe-longtowhichcommunitiesmaynotbeeasyatall.Also,nodesgenerallybelongtomorethanonecommunity.However,wecanexpresstheexistenceofcommunitiesbymeansofaclusteringcoefficient.AsshownbyXuandLiu[2008],itturnsoutthatthereisaclearrelationshipbetweenthespeedbywhichinformationisdisseminatedinsocialnetworksandtheclusteringco-efficient:thehigherthedegreeofclustering,theslowerthedissemination.Toacertainextent,thisresultmayseemquiteobvious,butfromaformal(i.e.,mathematical)pointofview,itturnsouttobenotsotrivial.Whatthismeansisthatifwewanttodesignadisseminationprotocol,wemayneedtotakespecialmeasuresinhighlyclusterednetworksinordertoguaranteeacertainperformanceregardingthedisseminationspeed.Thisalonehasbeenenoughreasonforresearcherstodefineandmeasuretheclusteringcoefficientofanetwork.Besidesthisreason,measuringtheclusteringcoefficientobviouslyal-143 lowsustosimplycomparedifferentnetworks,withoutnecessarilywantingtomakeuseoftheactualvaluesoftherespectivecoefficients.Inthissense,clusteringcoefficientscanhelpinclassifyingnetworks.LocalviewWefirstconsiderclusteringfromtheperspectiveofvertices,asoriginallyintroducedbyWattsandStrogatz[1998].Fromthisso-calledlocalview,thebestclusteringthatwecanachieveisthatallneighborsareadjacenttoeachother.Inotherwords,theneighborsetN(v)ofvformsacompletegraph.Lettingnv=jN(v)j,weknowthatN(v)willhaveamaximumofnv1n()=v(nv1)edges.Fortheclusteringcoefficient,wethensimplytake22alookattheratiobetweentheactualnumberofedgesandtheattainablemaximum.Definition6.8:Considerasimpleconnected,undirectedgraphGandvertexv2V(G)withneighborsetN(v).Letnv=jN(v)jandmvbethenumberofedgesinthesubgraphinducedbyN(v),i.e.,mv=jE(G[N(v)])j.Theclusteringcoeffi-cientcc(v)forvertexvwithdegreed(v)isdefinedas(mnv2mvifd(v)>1v/()=cc(v)def2nv(nv1)=undefinedotherwiseNotethatwerequirethatavertexisadjacenttoatleasttwootherdistinctvertices.Takingthisintoaccount,theclusteringcoefficientCC(G)fortheentiregraphisdefinedastheaverageoverall(welldefined)clusteringco-efficientsofitsvertices:Definition6.9:ConsiderasimpleconnectedgraphG.LetVdenotethesetofverticesfv2V(G)jd(v)>1g.TheclusteringcoefficientCC(G)forGisdefinedasdef1CC(G)=åcc(v)jVjv2VThisnotionofclusteringcaneasilybeextendedtodirectedgraphs,inwhich!casewemerelyneedtodistinguishthecasethatwehaveanarchv,wifrom!vtowfromanarchw,vi.TheneighborsetN(v)ofavertexvwillhaveamaximumnumberof2(nv)=nv(nv1)arcs,i.e.,twiceasmanyarcsin2comparisontothenumberofedgesintheundirectedcase.Thisbringsusto:Definition6.10:LetDbeasimpleconnected,directedgraphD.Considervertexv2V(D)withneighborsetN(v).Letnv=jN(v)jandmvbethenumberofarcsinthesubgraphinducedbyN(v),i.e.,mv=jA(G[N(v)])j.Theclustering144 coefficientcc(v)forvertexvwithdegreed(v)=din(v)+dout(v)isdefinedas(mnvmvifd(v)>1v/2()=cc(v)def2nv(nv1)=undefinedotherwiseInourdefinitionfortheclusteringcoefficientofagraphwedidnotmakeadistinctionbetweendirectedandundirectedgraphs.Indeed,thedefinitionstaysthesame.Nowconsiderthecaseofaweighted,undirectedgraph.Aswemen-tioned,theclusteringcoefficientindicatestheextenttowhichnodesinanetworkform(moreorless)closedgroups.Ifweightsrepresenttheinten-sitybywhich,forexample,interactionstakeplace,thenweightsarealsoindicativeforthestrength,orclosednessofagroup.Thisreasoningmoti-vatedBarratetal.[2004]tointroduceaweightedclusteringcoefficient.Tothisend,ratherthanmerelyconsideringthedegreeofavertexv,theyfirsttakeintoaccountaweightedformofthevertexdegree,calledthevertexstrength:Definition6.11:ConsiderasimpleweightedundirectedgraphGwithvertexsetV(G)=fv1,v2,...,vngandadjacencymatrixA.Thevertexstrengths(vi)ofvertexviisdefinedasthetotalsumoftheweightsofedgesincidentwithvi:ns(v)defw(hv,vi)A[i,j]i=åijj=1Wecannowdefinetheweightedclusteringcoefficientasfollows.Definition6.12:ConsiderasimpleweightedundirectedgraphGwithvertexsetV(G)=fv1,v2,...,vngandadjacencymatrixA.Theweightedclusteringcoefficientcc(vi)ofvertexviisdefinedas:8>>åw(ei,j)+w(ei,k)A[i,j]A[i,k]A[j,k]1i=>>2s(vi)d(vi)1:undefinedotherwisewhereei,jistheedgejoiningviandvj.Inotherwords,weconsideronlythoseedgeshu,vi,hu,wiincidentwithu,whoseotherendpoints,vandw,respectively,arejoinedaswell.Weleaveitasanexercisetothereadertoshowthatinthespecialcasethatallweightsareequalto1,theweightedclusteringcoefficientisequaltotheclusteringcoefficientforanunweightedgraph.145 Note6.6(Mathematicallanguage)Thenotationsusedforthelastclusteringcoefficientmayappearsomewhatin-tricate.Let’sinspectthemabitfurther.Firstnotehowwehaveagaincon-venientlymadeuseoftheadjacencymatrixtosimplifyournotation.Intheexpressionns(vi)def=åw(hvi,vji)A[i,j]j=1A[i,j]willbeequalto0whenthereisnoedgejoiningvertexviandvj,effec-tivelymeaningthatwewillbeignoringthetermw(hvi,vji)(recallthatforanonexistentedgee,weletw(e)=¥).AnequivalentdefinitioncouldhavebeenformulatedusingtheneighborsetN(v)ofvertexv,leadingto:s(vi)def=åw(hvi,vji)vj2N(vi)Somewhatmorecomplicatedistheactualexpressionfortheclusteringcoef-ficientinaweightedundirectedgraph.Inthiscase,theproductA[i,j]A[i,k]A[j,k]inthenominatoreffectivelyallowsustoconsideronlythosecasesinwhichverticesvi,vj,andvkareallpairwisejoined,i.e.,formingacompletesubgraph.Forthistriangle,weareactuallyinterestedintheedgejoiningvi’sneighborsvjandvk.Theweightthatweassigntothefactthatthesetwoneigh-borsarejoinedisdeterminedentirelybyhowimportantvjandvkaretovi,whichisexpressedbytherespectiveweightsoftheedgesei,j=hvi,vjiandej,k=hvj,vki.Intheend,theimportanceoftheadjacencyofvjandvkforviissimplyexpressedastheweightw(ei,j)+w(ej,k).Afewotherobservationsmayfurtherhelpunderstandthedefinitionofclusteringcoefficientforweightedgraphs.Notethatbecauseoftheunorderedwaywearesummingoveredges,wewillactuallybeconsideringallpairsofedgesincidentwithvitwice,andthusalsothetrianglesatvi.Thisexplainsthefactor2inthedenominator.Finally,thedivisionbythestrengthofviwillnowputarelativeweightontheimportanceoftwoofvi’sneighborsbeingadjacent,allowingtheclusteringarounddifferentverticestobecomparedtoeachother.GlobalviewAsexplainedbyNewman[2003a]thereisareasonablealternativedefinitionfortheclusteringcoefficientbasedonthenumberoftriplesandtrianglesinagraphG,whicharedefinedasfollows:Definition6.13:Considerasimple,undirectedgraphGandavertexv2V(G).AtriangleatvisacompletesubgraphofGwithexactlythreevertices,includingv.Atripleatvisasubgraphofexactlythreeverticesandtwoedges,wherevisincidentwiththetwoedges.146 WewillusethenotationsnL(v)todenotethenumberoftriplesatv,andnD(v)thenumberoftrianglesatv.Likewise,wecanconsiderthetotalnum-bernD(G)ofdistincttrianglesofagraphGanditsnumbernL(G)ofdistincttriples.Wedefinethetransitivityofagraphasfollows:Definition6.14:LetGbeasimple,connectedgraphwithnD(G)distincttrianglesandnL(G)distincttriples.Thenetworktransitivityt(G)isdefinedastherationD(G)/nL(G).Networktransitivityisconsideredaglobalviewonclustering,asitconsid-ersthenetworkasawholeinsteadofthesituationlocaltovertices.Toillus-tratethesetwoapproaches,letusreturntographGsimplefromFigure6.4.Itisnotdifficulttoseethatforeachvertexwehavethefollowing:Vertex:1234567cc:1/301/3undefined111/3nL:3330116ThisleadstoaclusteringcoefficientofCC(Gsimple)=3/6forthegraphitself.Regardingthetransitivity,weneedtofirstcountthenumberoftrian-gles,ofwhichthereareonlytwo.Thetotalnumberofdistincttriplesis17(bysimplysummingupnL(v)),whichmeansthatt(Gsimple)=2/17.Thismethodcan,ofcourse,alsobeappliedtoourlargerexamplesfromFigure6.1,forwhichwefind:MetricGAcomplexGBcomplexClusteringcoefficient0.2090.049Transitivity0.0640.019Thedifferencebetweenclusteringcoefficientandnetworktransitivityissubtle,yetimportanttomake,ifonlyforthereasonthatdifferentcom-munitiesoftenlooselyspeakabouttheclusteringcoefficientofagraphGwithoutmakingclearwhethertheymeanCC(G)ort(G).Inthecaseofsocialnetworks,theclusteringcoefficientofagraphisalsoknownasthenetworkdensity,whichisformallydefinedasfollows[HageandHarary,1983;WassermanandFaust,1994]:Definition6.15:Considerasimple,undirectedgraphGwithnverticesandmedges.Thenetworkdensityr(G)ofGisdefinedasm/(n).2Inotherwords,thenetworkdensitytellsustowhatextentagraphiscom-pleteornot,whichisintuitivelywhatwealsousedfordefiningthecluster-ingcoefficient.However,itisfairlyeasytoseethatthenetworkdensityandclusteringcoefficientarenotthesame,whichweleaveasanexercisetothereader.147 Note6.7(Moreinformation)Thetwonotionsofclusteringareclearlyrelated,especiallywhenconsideringthatwecanalsodefinetheclusteringcoefficientofavertexintermsoftrianglesandtriples.Clearly,wehavenD(v)d(v)cc(v)=andalsonL(v)=nL(v)2Furthermore,itshouldalsobeclearthat1nD(G)=ånD(v)3v2Vtoaccountforthefactthateachtriangleiscountedthreetimesifweconsidereachvertexofthegraph.However,onlyinspecialcaseswillweseethatnD(G)ånD(v)1nD(v)t(G)==andCC(G)=ånL(G)3ånL(v)jVjnL(v)areequal.xvvvv123nyFigure6.5:Agraphwithdifferentclusteringcoefficientandtransitivity.ThedifferencebetweenthetwometricsisalsoillustratedinFigure6.5.LetGkbethesubgraphinducedbyverticesfx,y,v1,v2,...,vkg.ItisnotdifficulttoseethatforeverysubgraphGkwehave(1ifu=vifor1ikcc(u)=k=2ifu=xoru=y0.5k(k+1)k+1Asaconsequence,weseethat12k2+k+4CC(G)=(2+k1)=k+2k+1k2+3k+2andthuslimCC(G)=1k!¥148 Tocomputenetworktransitivity,weneedtocountthenumberoftriangles,whichisequaltok.Withn(v)=1andn(x)=n(y)=(k+1)=1k(k+1),LiLL22wefindthatk1t(G)==20.5k(k+1)+kk+2andthuslimt(G)=0k!¥WecanextendthenotionoftransitivitytoweightedgraphsfollowinganapproachsuggestedbyOpsahlandPanzarasa[2009].Inthiscase,weneedtoassignaweighttotriplesandtriangles,afterwhichwecomputethetransitivityofagraphbyconsideringtheratioofthecumulativeweightsonthetrianglesandthatofthetriples.Letusstartwithdefiningpreciselywhattheweightofatripleortriangleis.Definition6.16:LetGbeasimple,undirectedweightedgraphandconsidervertexv2V(G).IfHisatripleoratriangleatvwhereedgese1ande2areincidentwithv,thenthetripleweightwL(H)andtriangleweightwD(H),respectivelyisequaltotheaverageoftheweightsofe1ande2,i.e.,11w(H)defw(e)+w(e)andw(H)defw(e)+w(e)L=12D=1222Inprinciple,thetripleoftriangleweightcanalsobedefinedas,forexam-ple,maxfw(e1),w(e2)g,butweshallnotconsidersuchdetailshere.Usingthesedefinitions,wecanthendefinethetransitivityofaweightedgraphasfollows.Definition6.17:LetGbeasimple,undirectedweightedgraphwithHDitssetoftriangles,andHLitssetoftriples.Thenetworktransitivityt(G)isdefinedasåwD(H)defH2HDt(G)=åwL(H)H2HLNotethatthisdefinitionisidenticaltothatoftransitivityinanunweightedgraphwhensettingweightsequalto1.Finally,OpsahlandPanzarasa[2009]extendtheirdefinitionoftransitiv-itytodirectedgraphs,betheyweightedornot.Inthiscase,theysimplyusethesamedefinitionofweightsfortriplesandtriangles,respectively,butrestricttheenumerationofthesesubgraphstoso-callednonvacuoustriplesandtransitivetriangles:149 Definition6.18:Considera(strict)directedgraphD.LetHbeatripleatv,with!!itsneighborsuandwinH.Hisanonvacuoustripleifeitherhu,vi,hv,wi2!!A(H)orhw,vi,hv,ui2A(H).IfHwasatriangleatv,thenHistransitiveif!!!!!!A(H)=fhu,vi,hv,wi,hu,wigorA(H)=fhw,vi,hv,ui,hw,uig.Inotherwords,Hasatripleisnonvacuousifthereexistseithera(u,w)-pathviavora(w,u)-pathviav,andHasatriangleistransitiveifwcan!bereachedfromuboththroughanarchu,wiandapathinHviav,oru!canbereachedfromwthroughanarchw,uiandadirectedpaththroughv.Figure6.6showsallpossible(non)vacuoustriplesand(non)transitivetriangles.VacuousNon-vacuousVacuousNon-vacuousNon-transitiveTransitiveNon-transitiveTransitiveNon-transitiveTransitiveNon-transitiveTransitiveFigure6.6:(Non)vacuoustriplesand(non)transitivetrianglesat(themarked)ver-texv.Wewilloftenusetheclusteringcoefficientornetworktransitivitytocomparedifferentrandomgraphs.Bothmetricsareusedinpractice,yetcomputingnetworktransitivityforlargegraphscanbesomewhatineffi-cientprovidedspecialmeasuresaretaken.Wewillnotgointodetailshere,butwillreturntovariousexampleswhendiscussingconcreteexamplesofrandomgraphsthroughouttheremainingchapters.6.4CentralityAnotherimportantmetricfornetworkanalysisisdecidingonwhetherthereareanyvertices“moreimportant”thanothers.Theimportanceofavertex150 is,ofcourse,dependentonwhatagraphisactuallymodeling.Forexam-ple,whendealingwithnetworksrepresentingrelationshipsbetweenpeo-ple,avertexwithahighdegreemaycharacterizeaninfluentialperson.Inacommunicationnetwork,however,theimportanceofavertexmaybede-terminedbythenumberofshortestpathsofwhichitismember,forinthatcaseitmaybeanindicationofitsworkloadregardingprocessingandfor-wardingmessages.Innetworkanalysis,thisconceptofimportanceisreferredtoascentral-ity[Kotschutzkietal.,2005].Perhapsoneofthesimplestnotionsofcentral-ityisidentifyingthecenterofagraph.Itisformedbythoseverticeswhoseeccentricityisequaltotheradiusofagraph:Definition6.19:Considera(strongly)connectedgraphG.ThecenterC(G)ofagraphGisthesetofverticeswithminimaleccentricity,i.e.,C(G)def=fv2V(G)je(v)=rad(G)g.Intuitively,avertexisatthecenterofagraphwhenitisatminimaldistancefromallothervertices.Usingtheeccentricityofavertexu,wecanthendefineitscentralityas:Definition6.20:LetGbea(strongly)connectedgraph.The(eccentricitybased)vertexcentralitycE(u)ofavertexu2V(G)isdefinedas1/e(u).Allverticesinthecenterofagraphhavemaximalcentrality,whereasindeedallverticesatthe“edges”ofagraphhaveverylowcentrality.ReturningtographGsimplefromFigure6.4,wefindthatthecenterconsistsonlyofvertex7.Withsomecomputationaleffort,itcanbeshownthatgraphGAcomplexfromFigure6.1hasnolessthan43verticesinitscenter,whereasGBcomplexhasonlytwoverticesinthecenter.Eccentricitycanbeusedfordeterminingwhethercertainfunctionsinanetworkhavebeenoptimallyplaced.Forexample,whendecidingonplacingcertainbuildingsinacity,wemaywanttotakeintoaccountthatthosebuildingsshouldbeconvenientlyreached,suchasfirestations.Ineffect,thedecisionistoplacecertainfunctionalitywithinaspecificrangeofallnodes.Eccentricitymeasuresthemaximumdistancefromonenodetoanyothernodeinanetwork.Insomecases,itismoreimportanttoknowhowcloseanodeistoallothernodes.Thismeansthatweneedtotakeintoaccountallthedistancesfromonenodetotheothers.Inthatcase,wesimplytakethetotaldistanceofthatnodetoeverynodeintoaccount,asfollows:Definition6.21:Considera(strongly)connectedgraphG.TheclosenesscC(u)ofavertexu2V(G)isdefinedasc(u)defd(u,v).C=1/åv2V(G)151 Returningtoourexample,itisclearthatafirestationshouldbeclosetoanyarbitrarilychosennode.Inthatcase,wewanttooptimizeonthetrav-elingdistancewhenafirebreaksout.However,mattersbecomedifferentinthecaseofservicesthatneedtobeaccessedsimultaneouslyfromdifferentnodes,suchaswithhospitals,atownhall,shoppingcenters,andsoforth.Thisiswhereclosenesscomesintoplay.Inthosecases,wewanttoplaceaserviceconvenientlyclosetoasmanynodesaspossible,whichisclearlyadifferentcriterionthanminimizingthemaximumdistancethatneedstobetraveled.ForGsimplewefindthefollowingvaluesfortheclosenessofitsvertices.Althoughvertex7formsthecenterofGsimple,itisnotthevertexclosesttoallothers,whichisvertex1.Vertex:1234567åd(u,)21222732243729cC(u):0.0480.0450.0370.0310.0420.0270.034Notethatcomparingclosenessbetweenverticesofdifferentgraphsmaynotbeveryuseful.Forexample,whenconsideringunweightedgraphs,weseethattheclosenessofavertexdecreasesasthegraphconsistsofmorevertices.Forthisreason,comparingtheclosenessofverticesisusefulonlyrelativetoagivengraph.Vertexcentralityandclosenessarebothrelatedtothereachabilityofavertex,andassuchmayindeedindicatetheimportanceofavertex.How-ever,wehavealsoseenanothertypeofimportantvertices,namelycutver-tices,whoseremovalactuallypartitionsagraph.Onecanarguethatsuchverticesformthecenterofagraph.Basedonthisobservation,notablyre-searchersinthesocialscienceshaveintroducedwhatisreferredtoasbe-tweenness.Thebasicideaissimple:ifavertexliesonmanyshortestpathsconnectingtwoothervertices,itisanimportantvertex.Thereasoningisthattheremovalofsuchavertexwilldirectlyinfluencethecostofthecon-nectivitybetweenothervertices,asother(i.e.,longer)shortestpathswillhavetobefollowed.Formally,wehave:Definition6.22:LetGbeasimple,(strongly)connectedgraph.LetS(x,y)bethesetofshortestpathsbetweentwoverticesx,y2V(G),andS(x,u,y)S(x,y)theonesthatpassthroughvertexu2V(G).ThebetweennesscentralitycB(u)ofvertexuisdefinedasjS(x,u,y)jcB(u)=åjS(x,y)jx6=yNotethatbecauseGis(strongly)connected,jS(x,y)j>0forallpairsofdistinctverticesxandy.152 Inthefollowingchapterswewillapplytheseandothermetricstospe-cifictypesofgraphs.Aswe’llsee,moremetricscanbedefinedtodiffer-entiateandcharacterizegraphs,butmanyofthesemetricsaremoreeasilyexplainedandmotivatedgiventhespecificcontextinwhichgraphsandnetworksareusedtomodelreal-worldsituations.153 CHAPTER7RANDOMNETWORKS Uptothispointwehavelargelycoveredthecoreoftraditionalgraphtheory.Thiscorecontainsmaterialthatismainlyrelatedtowell-structuredgraphs,oftenoflimitedsize,whichisusedforthetypeofapplicationswehavediscussedinthepreviouschapters.Wenowdrawourattentiontoanothertypeofgraphforwhichthetheoreticalfoundationswerelaiddowninthelate1950sbyPaulErdosandAlfr¨edR´enyi,namelygraphsthatwerecon-´structedbyrandomlyaddingedges.Thefieldremainedsomewhatesotericuntiltheturnofthecenturywhenscientistsbegantodiscoverthatmanynaturalphenomenacouldbedescribedintermsofrandomgraphs.Thiseventuallyleadtoaboostinresearchonwhathavebeencoinedcomplexnet-works,researchthatisfoundinamyriadoffields,rangingfromneurologytotrafficmanagementtocommunicationnetworks.Notwithoutreason,thisresearchisoftenreferredtoasthenewscienceofnetworks[Barabasi,2002;´Buchanan,2002;Watts,2004;Lewis,2009].Inthischapter,wewilltakeafirstlookattheserandomgraphs(orran-domnetworksastheyaremoreoftencalled).Itisalsoherethatthisbookstartsdeviatingfrommoretraditionaltextsongraphtheory.7.1IntroductionIntuitively,arandomnetworkisa(simple,connected)graphGinwhichpairsofverticesareconnectedbysomeprobability.Ingeneral,thismeansthatwestartwithacollectionofnverticesandforeachofthe(n)possible2edges,weaddedgehu,viwithsomeprobabilitypuv.Inthesimplestcase,puvisthesameforeverypairofdistinctverticesuandv.Initiallydrivenbycuriosityonly,randomnetworksarenowconsideredtobeimportantforthesimplereasonthattheyallowustomodelmanyreal-worldphenomena:Spatialsystems:Inmanycases,real-worldnetworkshaveaspatialdimen-sioninthesensethatthereissomenotionofdistancebetweennodes.Examplesincluderailwaynetworks,airlinenetworks,computernet-works,electricitynetworks,andneuralnetworks.Modelingsuchnet-worksasgraphsimpliesthatweneedtolettheprobabilityofaddinganedgebedependentonthedistancebetweennodesintherealnet-work:thelargerthedistancebetweentwonodes,thesmallertheprob-abilityofattachingtheminthecorrespondingrandomgraph.Asitturnsout,ifwetakethisspatialdimensionintoaccount,alongwithsomeotherpropertiesthatwediscusslateron,randomgraphscanbeusedtoaccuratelymodelreal-worldspatialnetworks.Foodwebs:Afoodweb(alsocalledafoodchain)describesthefeedingre-157 lationshipsbetweenorganisms,thatis,whoeatswhom.Obviously,wecanmodelfoodwebsasdirectedgraphs.Inparticular,itturnsoutthatinordertogetinsightintheresilienceofecosystemsintermsoftheextinctionofspecies,modelingfoodwebsasrandomgraphsisanappropriatetechnique.Unlikemanyotherreal-worldnetworks,foodwebsaregenerallyrelativelysmall(intheorderoftensofafewhun-dredsofnodes).Also,thereiscontroversyregardingtheirstructure,whichdeviatesfromsomeofthemorewell-knownrandomgraphs(see,e.g.,Dunneetal.[2002]).Inthiscase,usingtechniquesfromnet-workanalysisasintroducedinChapter2andthetheoryofrandomgraphsallowsustobetterunderstandthenatureoffoodwebs.Collaborationnetworks:Animportantclassofnetworksisformedbyvar-iouscollaborationsbetweenhumanbeings.Famousistheanalysisofnetworksofmovieactors,formedbycreatingagraphofactors,link-inganytwowhohaveeverplayedinthesamemovie(seealso[Watts,1999]).Likewise,itturnedoutthatmodelingcollaborationsbetweenscientistsusingnetworkanalysistechniquesandrandom-graphthe-oryhasprovidedinsightinhowscienceisformed.Inparticular,thereisabodyofworkoncitationnetworks,reflectingwhichscientificar-ticlesarecitedinotherarticles.Suchnetworksprovideinsightintheinfluenceofpublishedwork.Inordertostudyreal-worldnetworks,itisnecessarythatwedelveintothepropertiesofrandomnetworks.Thesepropertiescanbeexplainedus-ingtheterminologythathasbeenintroducedsofar,andformthebasisforpropernetworkanalysis.7.2ClassicalrandomnetworksAsmentioned,randomnetworkshavebeenintroducedandstudiedforseveraldecades.PaulErdosandAlfr¨edR´enyiintroducedwhatarenow´knownas“classical”randomnetworks,orErd¨os-R´enyinetworks[Erdos¨andRenyi,1959].Thebasicideaisthatweconsiderasimple,connected´graphonnvertices,andthateverytwoverticesareadjacentwithsomeprobabilityp.ErdosandR¨enyiintroducedtwodifferenttypesofrandom´graphs.Definition7.1:AnErdos-R¨enyimodel´ofarandomnetworkonnvertices,alsoreferredtoasanERrandomgraph,isanundirectedgraphGn,pinwhicheachtwo(distinct)verticesareconnectedbyanedgewithprobabilityp.ForagivennumberMofedges,theERrandomgraphGn,MisanundirectedgraphinwhicheachoftheMedgesisincidenttorandomlychosenpairsofvertices.158 ItisimportanttonotethattwographsGn1,pandGn2,pmaybeverydifferent.Althoughtheywillbothhavenvertices,becauseanedgee=hu,vibetweentwoverticesuandvexistsonlywithaprobabilityp,itmaywellbethate2E(Gn1,p),yetthate62E(Gn2,p).Inthislight,weusethenotationER(n,p)todenotethesetofallERrandomgraphswithnverticesandprobabilitypthattwodistinctverticesarejoined.NotethatanER(n,p)graphissimple:therearenoloopsandthereisatmostoneedgebetweentwodistinctvertices.Incontrast,theformaldef-initionofaGn,Mrandomgraphallowsloopsandmultipleedges,butinpracticeweoftenseethattheyarerestrictedtotheirsimplecounterparts.Inthisbook,weconcentrateexclusivelyonER(n,p)graphs.DegreedistributionLetusfirstseewhatwecanexpectwhenconsideringvertexdegrees.ForeachvertexuofanER(n,p)graph,weknowthatthereareatmostn1otherverticestowhichitcanbeconnected.LetP[d(u)=k]denotetheprobabilitythatthedegreeofvertexuisk.Becausethereareamaximumofn1otherverticesthatcanbeaneighborofu,itshouldbeclearthatthereare(n1)possibilitiesforchoosingkdifferentverticestobeadjacenttoku.Theprobabilityofhavingujoinedwithexactlykothervertices(andthusnotwithexactlyn1kvertices)isequaltopk(1p)n1k,sothatn1kn1kP[d(u)=k]=p(1p)kNotethatourreasoningforthedegreedistributionofuappliestoallverticesofanER(n,p)graph.Formally,thismeansthatwecantreatthevertexdegreeasarandomvariabled,forwhichwehavejustshownthatitfollowswhatisknownasabinomialdistribution.Inlinewiththisobservation,wecanspeakoftheprobabilitythatavertexdegreehasvaluek,andwriteP[d=k].Note7.1(Mathematicallanguage)Probabilityandstochasticsplayanimportantroleinrandom-graphtheory,al-thoughweshallconsideronlyafewconcepts.Thenotionofarandomvariableisimportant.Intuitively,itisavariablewhosevaluescaneachoccurwithacertainprobability.Inthecaseofdiscreterandomvariables,thereareonlyafinitenumberofpossiblevalues.Thisisthecase,forexample,whenconsider-ingthepossiblevertexdegreesinanER(n,p)graph.Throughoutthisbook,weconsideronlydiscreterandomvariables.Tocharacterizea(discrete)randomvariableX,weneedtoconsiderallitspossiblevalues.AsimpleexampleiswhereXdenotesthepossibleoutcomes159 offlippingacoin,forwhichthereareonlytwopossibleoutcomes:headortail.Normally,eachofthesevaluescanoccurwithequalprobability,whichisexpressedas:1P[X=head]=P[X=tail]=2Likewise,wecantreatthevertexdegreeofanER(n,p)graphasarandomvari-abledwithpossibleoutcomesanyvaluefromthesetf0,1,...,n1g,andsub-sequentlycomputetheprobabilitiesP[d=k]for0k1calledthescalingexponent.Inmathematicalterms,P[k]µka.172 Formostreal-worldscale-freenetworks,itturnsoutthat20:1.AddanewvertexvstoVs1(i.e.,VsVs1[fvsg).2.Addmn0edgestothegraph,eachedgebeingincidentwithvsandavertexufromVs1chosenwithprobabilityd(u)P[selectu]=åw2Vs1d(w)thatis,choosingavertexuisproportionaltothecurrentvertexdegreeofu.Vertexumustnothavebeenpreviouslychosenduringthisstep.3.Stopwhennverticeshavebeenadded,otherwiserepeattheprevioustwosteps.TheresultinggraphiscalledaBarabasi-Albertrandomgraph´,orsimplyaBAgraph.WealsorefertoaBA(n,n0,m)graph.Obviously,aftertstepswewillhaveagraphwitht+n0verticesandtm+jE(G0)jedges.(NotethatG0mayhavenoedgestobeginwith.)BarabasiandAlbert[1999]showthatforthismodel,theprobability´P[k]thatanarbitraryvertexvhasdegreekisproportionaltok3.Note7.4(Moreinformation)TogetabettergrasponwhythedegreedistributionofaBAgraphispropor-tionaltok3,weadoptthenotationsandapproachasfoundin[Vega-Redondo,2007](andwhichwereoriginallyintroducedbyDorogovtsevetal.[2000]).For-mally,wehavethefollowing:Theorem7.5:ForanyBA(n,n0,m)graphG,theprobabilitythatvertexv2V(G)hasdegreekmisgivenby:2m(m+1)1P[k]=µk(k+1)(k+2)k3Proof(*).Letqt(s,k)denotetheprobabilitythatatsteptvertexvshasdegreek(withss,certainlywhensislarge.ThismeansthatwecansimplycomputePt[k]as1tPt[k]=qt(,k)=åqt(s,k)ts=1Usingexpression(7.1)weneedtodistinguishtwocases.First,Ifk>mweknowthattheaddedvertexvtdoesnotbelongtothesetofverticeswithdegreek,sothatwehavetk1t1kt1åqt(s,k)=åqt1(s,k1)+1åqt1(s,k)2(t1)2(t1)s=1s=1s=1However,ifk=m,itmustbethecasethatvtisinthissetaswell.Inotherwords,tm1t1mt1åqt(s,m)=åqt1(s,m1)+1åqt1(s,m)+12(t1)2(t1)s=1s=1s=1Tokeepmatterssimple,wewillfirstconcentrateonthesituationthatk>m.Weareseekingtoexpressqt(s,k)intermsoftheprobabilityPt[k].Bystraight-forwardalgebraicmanipulation,weobtainthefollowing:tt11t11åqt(s,k)=2t1(k1)t1åqt1(s,k1)s=1s=1t11t112t1kt1åqt1(s,k)s=1t11+(t1)t1åqt1(s,k)s=11=2(k1)Pt1[k1]kPt1[k]+(t1)Pt1[k]Knowingthatt1tåqt(s,k)=tåqt(s,k)=tPt[k]ts=1s=1176 andlimPt[k]=P[k],wefindt!¥(k+2)P[k](k1)P[k1]=0Whenweconsiderthespecialcasek=m,weapplyexactlythesamealgebraicmanipulationstofindthat(m+2)P[m](m1)P[m1]=2Ofcourse,P[m1]=0,astherecanbenovertexwithadegreelowerthanm,whichmeansthatP[m]=2/(m+2).Wenowhave:k1k1k2m(m+1)(m+2)P[k]=P[k1]=P[k2]=...=P[m]k+2k+2k+1k(k+1)(k+2)SubstitutingP[m]inthisequationgivesus2m(m+1)P[k]=k(k+1)(k+2)whichcompletestheproof.ItshouldbementionedthatBAgraphsarenottheonlyonesforcon-structingscale-freenetworks.OneparticularinterestingextensiontotheBarabasi-Albertmodelisthefollowing:´Algorithm7.3(GeneralizedBarabasi-Albert)´:ConsiderasmallgraphG0withn0verticesV0andnoedges.Ateachsteps>0:1.AddanewvertexvstoVs1.2.Addmn0edges,eachedgebeingincidenttovsandavertexufromVs1chosenwithprobabilityproportionaltoitscurrentdegreed(u)(andnotpreviouslychoseninthisstep).3.Forsomeconstantc0addanothercmedgesbetweenverticesfromVs1,wheretheprobabilityofaddinganedgebetweenverticesuandwispropor-tionaltotheproductd(u)d(w),andundertheconditionthathu,widoesnotyetexist.4.Stopwhennverticeshavebeenadded.AsshownbyDorogovtsevetal.[2003],theresultinggraphcorrespondstoascale-freenetworkforwhichthevertexdegreeisproportionalto(2+1)P[k]µk1+2c177 Inotherwords,forc=0wehaveaBAgraph,butforincreasingvaluesofc,theexponentconvergesto2.Propertiesofscale-freenetworksAsitmayhavebecomeclearbynow,formalanalysisofrandomnetworksisgenerallyfarfromtrivial.Thisiscertainlyalsotrueforthescale-freenet-worksdiscussedpreviously.Theconsequenceofthisobservationisthatinordertoattaininsightinthepropertiesofscale-freenetworks,weneedtosimplyapplythenetworkanalysistoolsfromChapter6andseewhatwecanlearnfromexperiments.Letusfirstconsidertheclusteringcoefficient.AswehaveseenforERrandomgraphs,theclusteringcoefficientcanbeexpressedindependentlyofthesizeofthegraph.ForWatts-Strogatzgraphs,wehaveshownthattheclusteringcoefficientislargeandstaysalmostthesameevenforrelativelylargerewiringprobabilities.MoreimportantlyisthatforWatts-Strogatzgraphstheclusteringcoefficientisindependentofthenumberofvertices.Thesituationforscale-freenetworksismorecomplicated.Infact,find-ingananalyticalexpressionthatestimatestheclusteringcoefficientforgen-eralscale-freenetworkshasnotyetbeenfound.Fronczaketal.[2003]con-sideredthesituationforBArandomgraphs,and,inparticular,lookedattheclusteringcoefficientcc(vs)ofvertexvsaftertstepshadtakenplaceintheconstructionofaBA(t,n0,m)graph(ofcourse,st).Theyfind:m124m2cc(vs)=ppln(t)+ln(s)8(t+s/m)2(m1)2Whenevaluatingthissomewhatghastlyexpressionforfixedvaluesofmandt,yetvaryings,weobtainlowclusteringcoefficients,asshowninFig-ure7.10.ToseehowthesevaluescomparetothoseofanERrandomgraph,weconsideranERrandomgraphwiththesamenumberofverticesandthesameaveragevertexdegree.WefirstcomputetheaveragevertexdegreeofaBA(n,n0,m)graphforverylargen.AsproveninNote7.4,thedegreedistributionforaBAgraphisgivenby2m(m+1)P[k]=k(k+1)(k+2)TakingexactlythesameapproachforcomputingtheaveragevertexdegreeforanERrandomgraph,theaveragevertexdegreeforaBArandomgraphcanbecomputedas:¥¥kd(G)=E[k]=åkP[k]=2m(m+1)å=2mk(k+1)(k+2)k=mk=m178 0.001540.001530.001520.001510.00150Clusteringcoefficient0.0014920,00040,00060,00080,000100,000VertexvsFigure7.10:TheclusteringcoefficientforverticesvsinaBA(100000,n0,8)graph.(WeleaveitasanexercisetothereadertoshowthatthiscomputationofE[k]isindeedcorrect.)ForavertexofanERrandomgraph,weknowthatcc(v)=pandthatd(v)=p(n1).Inotherwords,togetthesameaveragevertexdegreeforanERrandomgraphasthatofaBAgraph,weneedtotakepequalto2m/(n1).ForourexamplefromFigure7.10,wethenfindthatcc(v)=16/99990.00016.Thismeansthatroughlyspeaking,theclusteringcoefficientinBAgraphsisanorderhigherthanthatofERgraphs,yetitremainsrelativelysmall.Consideringthatmanyreal-worldnetworkscombinescale-freenessandhighclustering,itisclearthatBAgraphsdonotformanadequatemodelofreallife.Wereturntothisissueshortly.Whataboutaveragepathlengths?Fronczaketal.[2004]derivethefol-lowingestimationoftheaveragepathlengthforaBA(n,n0,m)randomgraph:ln(n)ln(m/2)1gd(BA)=+1.5ln(ln(n))+ln(m/2)wheregistheEulerconstant,whichwealsocameacrosswhenestimatingtheaveragepathlengthforERrandomgraphs.Togetabetterideaofwhatthisestimationmeans,wecanmakeacomparisonwithERrandomgraphs.Tothisend,considerERandBArandomgraphshavingthesameaveragevertexdegree,andcomparetheirrespectiveaveragepathlengths.There-sultford=10isshowninFigure7.11(andagainusingalinearandalogarithmicscaleforthexaxis).WhatisillustratedinthisfigureisthatBAgraphstendtosystematicallyhavearelativelymuchloweraveragepathlengththanERrandomgraphs.Consideringthattheaveragepathlengthforrandomgraphsisalreadyverylow,thisisasomewhatremarkableresult.Ontheotherhand,unlikeERrandomgraphs,wearenowdealingwithgraphscontaininghubs:vertices179 5ERgraph4BAgraph32Averagepathlength120,00040,00060,00080,000100,000Numberofvertices(a)5ERgraph43BAgraph21100500100010,000100,000Numberofvertices(b)Figure7.11:ComparingtheaveragepathlengthofERandBArandomgraphswiththesameaveragevertexdegreeona(a)linearplotand(b)alog-linearplot.withhighdegrees,essentiallyactingasintermediatesbetweenother,lesswell-connectedvertices.Forexample,onemayexpectthattheeccentricityofahubisrelativelylow:ahubissimplyclosetoeveryvertex.Butthisalsomeansthatmostverticescaneasilyreachotherbymeansofapathcontainingahub.Wearethusdealingwithwhatarealsocalledsupersmallworlds.Andalthoughbeingabletoreachanothervertexinonlyafewstepsisanicepropertyofalargegraph,thehubsdoformapotentialbottleneck.Incom-municationnetworks,theywouldgenerallyneedtoprocessalotoftran-sienttraffic.Worseisthattheymayalsobevulnerabletoattacks.Intuitively,180 systematicallydisablinghubsshouldquicklypartitionanetworkintosev-eraldisjointcomponents,ahighlyundesirablesituation.Toillustratethesematters,Figure7.12showswhathappenswhenwesystematicallyremoveverticesfromascale-freegraphincomparisontore-movingthebest-connectedverticesfromanERrandomgraph.Wealsoshowtheeffectofremovingrandomlyselectedverticesfromascale-freegraph(whichisverysimilartorandomlyremovingverticesfromanERgraph).Ascale-freenetworkisthusseentobesensitivetoatargetedattack,butjustasrobustasanERrandomgraphinthecaseofarandomattack.1.00.8Scale-freenetwork0.6Randomnetwork0.4Scale-free0.2network,Fractionoutsidegiantclusterrandomremoval0.20.40.60.81.0FractionofremovedverticesFigure7.12:Thefractionofverticesoutsidethegiantcomponentwhenremovinghubsfromascale-freegraph,andthosefromanERrandomgraph.RelatednetworksAswementioned,theBarabasi-Albertapproachforconstructingascale-´freegraphhasoneimportantshortcomingwhencomparingittoreal-worldnetworks:itsrelativelylowclusteringcoefficient.Abetterunderstandingofreal-worldphenomenashouldnormallybereflectedbybettermodelsandinthissense,aBArandomgraphisdifficulttovalidateagainstmanyreal-worlddata.Therefore,researchershavebeenseekingsolutionsforcon-structingscale-freegraphsthathaveahighclusteringcoefficient.AsarguedbyDorogovtsevetal.[2003],constructingsuchgraphsisac-tuallyquitesimple.Thetrickistomakesurethattherearemanytriangles.Thiscanbeachieved,forexample,byaddinganedgetoatripleateachstepofthegrowingprocess.(Recallthatatriplewasasubgraphwith3verticesand2edges.)HolmeandKim[2002]provideaschemethatcombinesscale-freenessandatthesametimeallowstotunetowhatextentclusteringistobeprovided.Theiralgorithmproceedsasfollows:181 vswwww123kuFigure7.13:Thesubgraphinwhichanewlyaddedvertexiscontainedwhenat-tachingtovertexu.Algorithm7.4(Barabasi-Albertwithtunableclustering)´:ConsiderasmallgraphG0withn0verticesV0andnoedges.Ateachsteps>0:1.AddanewvertexvstoVs1.2.SelectavertexufromVs1thatisnotadjacenttovsandwithaprobabilityproportionaltoitsdegreed(u).Addedgehvs,ui.Addtheremainingm1edgesasfollows:a)Ifm1edgeshavebeenadded,continuewithStep3.Otherwise,pro-ceedwiththenextstep.b)Withprobabilityq:selectavertexwthatisadjacenttou,butnottovs.Ifnosuchvertexexists,continuewithStep2c.Otherwise,addedgehvs,wiandcontinuewithStep2a.c)Selectavertexu0fromVs1thatisnotadjacenttovsandwithaprobabilityproportionaltoitsdegreed(u0).Addedgehvs,u0iandsetuu0.ContinuewithStep2a.3.Stopwhennverticeshavebeenadded,otherwiserepeatfromStep1.Whathappensinthisapproach,isthatwithprobabilityqweexplicitlycon-structatrianglebetweenthenewlyaddedvertexvs,thevertexutowithitattaches,andoneofu’sneighborsw.Intuitively,itshouldbeclearthatwearemoreorlesscontrollingtheclusteringcoefficientofvertexvs.Forexample,ifwechooseq=1,andundertheassumptionthatuhaskmneighborsw1,w2,...,wk,vswillconnecttouasshowninFigure7.13.FromChapter6,whereweexaminedthesituationthatnoneoftheverticeswiwereadjacenttoeachother,weknowthattheclusteringcoefficientforuandvsishigh(andwhichwillgrowifedgeshwi,wjiexist).HolmeandKim[2002]showthattheirapproachyieldsgraphsinwhichthedistributionofthevertexdegreefollowsapowerlawwithscalingex-182 ponenta=3.Althoughtheydonotderiveananalyticalexpressionfortheclusteringcoefficient,experimentsshowthatbyvaryingq,clusteringcaneasilybevariedbetweentheoneobservedforpureBArandomgraphs,andhighvaluessuchas0.5.183 CHAPTER8MODERNCOMPUTERNETWORKS ModernlifeisdifficulttoimaginewithouttheInternet.Whatstartedinthelate1960sasasimplenetworkofahandfulofcomputershasnowgrownintoanimmenselycomplexcommunicationinfrastructurewithhundredsofmillionsofcomputersandwhichcontinuestogrow.TheInternetasacomputernetworkisoftentakentobethesameastheWorldWideWeb(orjustsimplyWeb),yettheyarefundamentallydifferent.Inthischapterwewillstartwithfirsttakingalookatcomputernetworks,inparticulartheInternet.Second,we’lldiveabitintowhatareknownasoverlaynetworks.Thesenetworksarecharacterizedbythefactthata(oftenverylarge)groupofcomputersmaintaintheirowncommunicationnetworkandassuchformaspecialtypeofsubnetworkusingtheInternetastheirfoundation.Thirdly,we’llpayattentiontotheWorldWideWebandexplainwhereandhowitdiffersfromtheInternet.8.1TheInternetTheInternetasacommunicationnetworkconsistsofahugecollectionofcomputersconnectedtoeachother.TheorganizationoftheInternetessen-tiallyfollowsahierarchicalstructureconsistingofhomenetworks,com-puternetworksinorganizations,networksthatareownedbyInternetSer-viceProviders,andbackbonenetworks,amongothertypesofcomputernet-works.Theyareallconnectedtogether,oftenusingthesameinfrastructureasusedfortelephony.Connectionsmayoccurthroughguidedmedia(i.e.,wires),butweareincreasinglyseeingwirelessconnectionsforcommunica-tionaswell.Inaddition,thecommunicationdevicesvarytremendously:ultra-smallnetworkedsensors,smartphones,laptopcomputersandwork-stations,servers,routers,andsupercomputers.OnemaywonderhowitisevenpossibletosayanythingsensibleaboutthestructureoftheInternet?Toanswerthisquestion,let’sfirstconsidersomeofthebasicsandthenmoveontothephenomenonofinterconnectednetworks.ComputernetworksSmall-areanetworksTherearedifferentwaysofcharacterizingnetworks,butonethatisconve-nientforourdiscussionhereissimplylookingatthephysicaldiameterofacomputernetwork.Typically,networksthatspanareasuptoatmost,say,afewhundredmetersarecharacterizedbyarelativelyhighdensityofnet-workedcomputers,alsoreferredtoashosts.Hostssendpacketstoeachotherthroughthenetworkthatconnectsthem.Thesenetworksdifferfromonesthatspanlargeareas,inthesensethatroutingplaysalessprominent187 role.RoutingapacketfromasourcehostAtoitsdestinationhostBmeansthatthepacketisrequiredtofollowacommunicationpathfromAtoB.Typically,suchpathsaresetupusingoneoftheshortestpathalgorithmswediscussedinChapter5.Withoutgoingintofurtherdetails,settinguporfindingarouteinasmall-areanetworkisrelativelyeasy.Moreover,thesesmall-areanetworksaregenerallyownedandmanagedbyasingleadmin-istrativeorganization.Togetanimpressionofwhatwe’redealingwith,Figure8.1showsthetypicalorganizationofasmall-areanetwork.Suchanetworkconsistsofseverallocal-areanetworks,orLANs,eachtypicallybeingacollectionof10-100computersconnectedbymeansofwhatisknownasaswitch.Theswitchensuresthatapacketaddressedtooneofitsconnectedcomputersisforwardedtothatcomputer.LAN1RouterLAN2SwitchSwitchServergroupRouterRouterR1InternetLAN3RouterSecuritygatewayFirewallFigure8.1:Atypicalexampleofasmall-areanetwork,consistingofacollectionofconnectedlocal-areanetworks.AddressesLANscanbeconnectedtoeachotherbydirectlyconnectingtheirrespectiveswitches,effectivelyleadingtoalargerLAN.Inaddition,itiscommonprac-ticetouseconnectLANsthroughinternalrouters,whichwewillexplainshortly.Whatisimportantforourdiscussionisthateachnetworkedhosthasanaddress.Havinganaddressallowsustosenddatapacketsfromone188 hosttoanother.Ifweconcentrateonthemostcommoncaseformodernnet-works,therearetwotypesofaddressesweneedtodistinguish.First,eachhosthasaworld-wideuniqueidentifierintheformofa48-bitnumber.Thisso-calledMACaddresscomeswiththehostwhenitismanufactured(or,moreprecisely,isassociatedtoahost’snetworkhardware).Whenahostisconnectedtoaportofaswitch(seeFigure8.2),theswitchcanautomaticallydiscoverthehost’sMACaddresstosubsequentlyuniquelyassociatethespecificportwiththataddress.Asaconsequence,whenahostwithMACaddressMA1(connectedtoportP1)requestsapackettobeforwardedtohostMA2(connectedtoportP2),theswitchusestheportidentifierstofor-wardthepacketfromportP1toP2,andthusimplicitlyfromaddressMA1toaddressMA2.Portto/fromhostFigure8.2:A16-portswitchasusedinlocal-areanetworks.Moreimportant,however,isthefactthatahostcanbeassignedanIPad-dress,whereIPstandsforInternetProtocol.UnlikeaMACaddresswhichispersistent,meaningthatitcannotbechanged,anIPaddressneedstobeexplicitlyassignedwhenahostisconnectedtoanetwork.Addressassign-mentcanbedonemanuallyorautomatically,andcanbedonestaticallyordynamically.Forexample,insomecasesaseparateaddressassignmentser-viceisusedtohandoutIPaddresseswithanassociatedleasetime.Whenaleaseexpires,thehostwillneedtogetanewIPaddress1.AhostwithIPaddressIA1normallyusesthataddresstosendapackettoadestination,sayahostwithIPaddressIA2.IncontrasttoMACad-dresses,anIPaddresscanbeusedtotrulyroutepacketsthroughacommu-nicationnetwork.Inthiscase,routersarerepresentedasthenodesofsuchanetwork,andphysicallinksbetweenroutersasitsedges.Inessence,when-everahostwantstosendapacket,itneedstomakesurethatthepacketgetstoarouter,whowillthentakecareoftherest.Tothisend,itsimplysendsthepacketusingtheMACaddressofalocallyaccessiblerouterasitsdesti-nation.Fromthereon,it’stherouter’sjobtoforwardthepackettowarditsdestination.1Themechanismjustdescribedisgenerallyimplementedbymeansofaso-calledDHCPserver,whereDHCPstandsforDynamicHostConfigurationProtocol.189 32bitsnetworkidentifierhostidentifierFigure8.3:ThestructureofanIPaddress,consistingofanetworkidentifierandahostidentifier.Toavoidthatroutersneedtodiscoverroutestoeveryindividualhost,asimpleaggregationtakesplacebysplittinganIPaddressintotwoparts:anetworkidentifierandahostidentifierasshowninFigure8.3.InthefollowingwewillnotdistinguishamongthedifferenttypesofIPaddressesandconsideronlytheonesthataremadeupof32-bitnumbers.Weassumethat16bitshavebeenreservedforthenetworkidentifierand16forthehostidentifier.Thismeansthattherecanbeatmost216=32,768differentnet-works,eachhavingatmost216hosts.Wheneveracompanywantstocreateanetwork,itneedstobeassignedoneorseveralnetworkidentifiers.Theseidentifiersareassignedbyaglobalorganization,andwillthereforeneedtoberequested.Steppingovermanypracticalmatters,inourexamplenet-workfromFigure8.1,wewouldneedatleastthreenetworkidentifiers:onefortheservergroup,oneforLAN#1,andonefortheconnectedLANs#2and#3.Whentakingroutingdecisions,arouterconsidersonlythenetworkaddressandcompletelyignoresthehostidentifier.So,forexample,whenrouterR1fromFigure8.1receivesapacketaddressedtoahostonLAN#2,itonlytakesalookatthenetworkidentifierinthataddressandsubsequentlyforwardsthepackettotheswitchofLAN#3,whowillthentakeovertheresponsibilityofgettingthatpackettoitsdestination.ItturnedoutthatthetotalnumberofavailablenetworkidentifiersintheInternetwasnotenoughtosupportitsgrowth.Therefore,alternativeschemesandtechnicalsolutionsarebeingusedtoensurethateachhostcanbeassignedanIPaddress.Nevertheless,thebasicapproachjustdescribed,namelythateachhostisaddressedbymeansofapairofidentifiershasbeenleftunaltered.Thisobservationisimportantasrouterstakedecisionsonwheretoforwardpacketstousingonlynetworkidenti-fiers.Othersmall-areanetworksBesidesthesesmall-areanetworks,therearetwoothertypesofnetworksworthmentioning.Thefirstoneisformedbyhomenetworks,whichtyp-icallyconsistofonetoseveralend-usercomputers,alongwithnetworkeddevicessuchasset-topboxesfordigitalTV,Internet-enabledtelephones,190 andmultimediacenters.Thesetypeofnetworksaregrowingfastintermsofwhattheyoffertoendusers.Typically,weareseeingthatmanydomesticappliancesarebecomingnetworkaware,ifalonetosmoothlyregulateen-ergyconsumption.Inaddition,manyhomenetworksfacilitateinstallationofsensorsformonitoringpurposes(thinkofburglarsystems,networkedsmokeandfiredetectors,surveillancecameras,andsoforth).Ahomenet-workgenerallyhasonlyasingleIPaddressassociatedwithit,whichissub-sequentlysharedbetweenallthedevices.Itisbeyondthescopeofthistexttoexplainhowthissharingisrealized.Whatisimportantisthatahomenet-workfromtheoutsideisoftenindistinguishablefromasinglenetworkedcomputer:bothhaveagloballyuniqueIPaddress.Secondly,therearealso(wireless)accessnetworks,whosesolepurposeistoallowdevicestoconnecttotheInternet.Typically,accessnetworkssupportwirelessconnectionsetupstomobiledevices.Whenmakinguseofsuchanetwork,adeviceisusuallyprovidedwithadynamicallyassignedIPaddresswhosenetworkidentifierisinheritedfromtheaccessnetwork.BykeepingtrackofwhichdevicewasassignedwhichIPaddress,packetsareroutedtotheaccessnetworkfromwherearouterorswitchcanforwardthepackettoitsdestination.Large-areanetworksSmall-areanetworksformwhatisknownastheedgeoftheInternet:net-worksbeyondwhichpacketsarenolongerforwarded.Inpractice,weseethesesmall-areanetworksbeconnectedtolargernetworksownedbyorga-nizationswhomakeittheirbusinesstoprovidemanyendusersandorgani-zationsaccesstotheInternet,orwhichoffertheservicestotransmitpacketsacrosstheInternet.TheseInternetServiceProviders,orsimplyISPs,gen-erallyspanmuchlargergeographicalareasthansmall-areanetworks.Incontrasttothesmall-areanetworksdiscussedpreviously,routingplaysanimportantrole.Thesmallestlarge-areanetworksconsistoftheaccessnetworkswejustdiscussed(andinthissense,thereisusuallynotaclear-cutdistinctionbe-tweensmallandlarge-areanetworks).Examplesincludemodernwirelessaccessnetworksthatspanawholeneighborhoodorevenacity.Inaddition,therearemanylocalISPsthatnotonlyprovideInternetaccess,butalsobasicservicessuchase-mail.Theseso-calledtier-3networkshavewhatisknownasapeeringrela-tionshipwithtier-2networks.ApeeringrelationshipbetweennetworksN1andN2mayoccurwhenN1hasarouterthatisconnectedthroughadirectlinkwitharouterofN2.Suchroutersarealsoknownasbordergateways,astheyallowfortraffictoflowintoandfromthenetwork,that191 is,theyoperateattheborderofanetwork.Tier-2networksareoftencon-nectedtootherTier-2networks,allowingpacketstocrosslargerareas.Assaid,routingplaysaprominentroleinthesecases.RegionalISPs,suchasthosecoveringa(small)countryaretypicalexamplesoftier-2networks.Finally,wedistinguishtier-1networks,whichprovidethebackboneoftheInternet.Endusersusuallyneverconnectdirectlytotier-1networks.Instead,thesebackbonesprovideservicesandroutingcapabilitiesonlytotier-2networks.Notethattheremaybeseveraltier-1networksoperatinginthesamearea.ThisallowsregionalISPstochoosefromwhichnetworktheywillmakeuse.Infact,ISPsmaychangetheirpeeringrelationshipswithoutendusersevennoticing.MeasuringthetopologyoftheInternetAllofthenetworkswediscussedsofarareusuallyeachmanagedbyasep-arateadministrativeunit.Thisiscertainlythecaseforlarge-areanetworks.Forsmall-areanetworks,weoftenseethatthenetworksarestillmanagedseparately(asistypicallythecaseforcorporatelocal-areanetworks),ormanagementispartlydelegatedtoendusers(aswithhomenetworks).Roughlyspeaking,acollectionofnetworksthatfallundertheregimeofthesameadministrationandthatfollowthesamepolicyregardinghowtoroutepackets,isknownasanautonomoussystemorsimplyAS.Byconnectingautonomoussystems,weessentiallyobtainthestructureoftheInternet.Inotherwords,theInternetcanberepresentedasagraphwhereavertexrep-resentsanautonomoussystem,andanedgethefactthattwoautonomoussystemshaveapeeringrelationship.Asofthiswriting,therearemorethan25,000autonomoussystems.TheAStopologyDiscoveringwhatisknownastheAStopologyoftheInternetisonthesur-facerelativelyeasyprovidedcertaindetailsarenottakenintoaccount(andwhichwewillindeedskipfornow).EachautonomoussystemisassignedauniquenumbercalleditsASnumber.Notethatthisassignmentisdonethroughacentralauthority,asisthecaseforassigningnetworkaddresses.EachASannounceswhichnetworksfallunderitsregimebyessentiallyadvertisinghASnumber,networkidentifieripairs.SuchannouncementsaremadebytheAS’sbordergatewaysdiscussedpreviously,andarepickedupbytherespectiveneighboringbordergatewayofanadjacentAS.Asanex-ample,assumethatAS1managesanetworkwithidentifiernid.AbordergatewayconnectingAS1toAS2maysendthepairhAS1,niditoAS2.Atthatpoint,AS2willhavediscoveredaroutetonetworknid.AS2,inturn,192 mayadvertisethisinformationtoitsownneighbors,inwhichcaseitwouldsendthetuplehAS2,AS1,niditoitsneighbors.YoumayhavenoticedthatthisapproachtowarddiscoveringroutesisessentiallythesameastheoneappliedintheBellman-FordalgorithmwediscussedinChapter5.Andindeed,thecoreoftheso-calledBorderGate-wayProtocol(BGP)whichisdeployedfordiscoveringroutesbetweenau-tonomoussystemsisexactlythisroutingalgorithm.However,insteadofonlyreportingdistances,BGPrequiresthatanASadvertisesthecompletepathitfoundtoadestination.Thisinformationwillallowarecipienttode-cidewhetheritwillactuallyusethatpathforroutingpackets.Generally,agatewaywillkeeponlyinformationonitsdiscoveredshortestpathtoanet-work.Informationonpathsthatarelongerissimplydiscarded.Whatwearethusseeingisthat(1)bordergatewayslearnaboutshortestpathstonet-worksinotherautonomoussystems,and(2)advertisethisinformationtotheirneighbors,allowingeach,inturn,todiscoverpathstothosenetworksaswell.Withover25,000autonomoussystems,eachhavingmanynetworkstowhichpacketsmustberouted,itisclearthattheinformationthatmustbestoredatabordergatewaycanbehuge.Inprinciple,eachgatewayisre-quiredtohaveanentryforeverydiscoverednetwork.Evenwithusingmanysophisticatedtechniquestocombineroutinginformation,abordergatewayiscurrentlyrequiredtostorecloseto300,000entries.TheseentriesareexclusivelyusedtodecidetowhichnextASanincomingpacketistoberouted.Inaddition,everygatewaystoresinformationonwellover800,000routes.Inprinciple,thoseroutescoverallpathsbetweennetworksintheInternet.Withthisinmind,itmaynowbeclearhowwecandiscovertheAStopol-ogyoftheInternet:wesimplyretrievetheroutingtablesfrombordergate-waysinordertocollectasmanyroutesaspossible.Ofcourse,thisismucheasiersaidthandone.AsexplainedbyHuston[2006],manyASesusemul-tipleASnumbersresultinginapproximatelytwiceasmanyobservedASesasthereareinreality.Inaddition,anASmaydecidenottoadvertisealinktooneofitsneighborsbecauseitsimplydoesn’twanttosupporttrafficofotherASesoverthatlink.Inotherwords,theremaybeaconnectionbe-tweentwobordergatewaysfromdifferentASes,butthisisnotreflectedinBGProutingtables.Anothersourceoferrorsisthedynamicityoftheta-bles:whenalinkistemporarilyout-of-order,itmaynotshowupinroutingtables.Asanaside,thediscoveryoftheAStopologybringsupanimportantscientificquestion,namelytowhatextentdoesourinputdataaccuratelyrepresentwhatwearetryingtomodel.WewillreturntothisissuewhenwediscusshowtoconstructagraphmodeloftheWeb.193 AsnapshotoftheAStopologyEssentiallyusingthemethodjustdescribed,Chietal.[2008]havecollecteddataonhowautonomoussystemslinktoeachother.Takingasinglesnap-shotfromOctober2008,weobtainanetworkconsistingofover30,000ver-ticesandmorethan100,000edges.Figure8.4illustratesthatweareap-parentlydealingwithascale-freenetwork,althoughthedatapointsdonotquitefitastraightline.1000100Vertexdegree10110100100010,000NodeID(rankedaccordingtodegree)Figure8.4:ThedegreedistributionoftheAStopologyusingBGProuterdata.Thexandyaxisarescaledlogarithmically.Thereareanumberofinterestingpointstoobserveaboutthistopology.First,itmaybesomewhatsurprisingtoseehowwellconnectedsomeoftheautonomoussystemsare.Ifweconsiderthedegreesofthetop-10ASes,wefindthefollowing:Rank:12345678910Degree:3309237122322162181615121273118010291012NotonlydoweseethatthetopASisconnectedtomorethan10%ofallotherASes,wecanalsoobservethatthistypeofconnectednessdropsrapidlyasonewouldexpectfromascale-freenetwork.Aswediscussedbefore,suchadegreedistributionmayhaveaseriousadverseeffectontherobustnessofthenetwork,inthesensethatatargetedattackbywhichweremovewell-connectednodesmayeasilyleadtopartitioningthenetwork.Haddadietal.[2008]haveanalyzedotherpropertiesoftheAStopologyfoundfromBGProuters.Notonlydidtheyfindhighclusteringcoefficientsforthetop1000nodes,thesenodesarealsoconnectedtoeachotherforminganalmostcompletegraph.Inlinewiththeseobservationsisthedistribu-tionofshortestpaths:mostpathsarenolongerthanthreeorfourhops,194 andvirtuallyallASesareseparatedbyashortestpathofmaximumlengthsix.Again,weseethesmall-worldphenomenonoccurinthenetworkofautonomoussystems.Note8.1(Moreinformation)Unfortunately,justtakingasnapshotoftheAStopologymaynotprovideenoughinformationofwhatisgoingon.Therearetwoproblemsthatneedtobeaddressed.ThefirstoneiscausedbythefactthatevenwithdatafromalargenumberofBGProuters,onecanneverbesuretohavecapturedallex-istingpeeringrelationshipsbetweenautonomoussystems.Infact,itturnsoutthatfindingtheactuallinksatagiventimemayindeedbeverydifficult.Thesecondproblemhastodowiththefactthatlargereal-worldnetworksareincontinuousflux:linksandnodesmayappeartocomeandgoallthetimeduetointermittentfailures,makingitmoredifficulttoidentifytrulynewpeeringrelationshipsorthosethathavebeendiscontinued.Consequently,whenwe’reinterestedinidentifyingtherealtopologyoftheASnetwork,weneedtodoabitmorethanjustanalyzeafewsnapshots.Wewillnotgointofurtherdetailshere,butrefertheinterestedreadertoChietal.[2008]andRazandCohen[2006].Thelatterprovideevidencethatmorethan30%oftheexistinglinksaremissingfromtheAStopologiesderivedfromBGProuters.Infact,Oliveiraetal.[2008]arguethatonlytheobservedlinksbetweentheautonomoussystemsfortier-1networksarereasonablyaccurate.Fortier-3networks,usingBGProutinginformationisarguedtobehighlyincomplete.8.2Peer-to-peeroverlaynetworksAswillhavebecomeclearbynow,theInternetissimplyhuge.Inpractice,weseethattheInternetisusedasauniversalplatformforawidevarietyofapplications.Perhapsthemostwell-knownapplicationistheWeb,whichwewilldiscussinSection8.3.Inmanycases,Internetapplicationsareor-ganizedaccordingtowhatisknownasaclient-serverarchitecture.Inthiscase,thecoreofanapplicationishostedbyaspecialcomputer,knownasaserver.Therestoftheapplicationconsistsofaprogramhostedonaso-calledclientcomputer.Thisclientprogramcansendarequesttotheserver,whereitisprocessed,afterwhichtheserversendsareplybacktotheclient.Awell-knownexampleofthisclient-serverarchitectureisactuallytheWeb:theclientprogramisformedbyaWebbrowser;theserveristhecomputermaintainingaspecificWebsite.Aclient-serverarchitecturecanberepresentedbyasimplegraphinwhichclientsandserverarerepresentedbyvertices,andwhereeachclientvertexisjoinedwiththevertexrepresentingtheserver,asshowninFig-ure8.5.195 ClientServerClientClientClientFigure8.5:Representingaclient-serverarchitectureasagraph.Inthisexample,therearefourclients.Althoughitwouldseemthattheserverinaclient-serverarchitecturemayeasilybecomeaperformancebottleneck,youneedtorealizethatclientscomeandgoquickly.Inmostcases,aclientmerelysendsarequesttotheserver,theserverprocessesthatrequest,andsubsequentlysendsananswerback.Afterthat,theclientandservereachgotheirownway.Ingraph-theoreticalterms,theedgebetweenaclientandserverwilleventuallybebrokenagain.Nevertheless,incasewearedealingwithrequeststhatre-quiresubstantialserverprocessingtime,orwhenresponsesrequirereturn-inghugeamountsofdata,serverscanindeedbecomeabottleneckbecausetheycanonlyprocessalimitednumberofrequestspertimeunit.Itisbe-yondthescopeofthistexttogointothesemattersinmoredetail.SeeTanen-baumandvanSteen[2007]formoreinformation.Sincethelate1990s,researchershavebeenexploringalternativearchi-tecturestoaddressscalabilityproblemsforlarge,distributedapplicationswhoseconstituentsarespreadacrosstheInternet.Inprinciple,eachcon-stituent,calledapeer,consistsofaprogramthatisbeingexecutedonasinglecomputer.Eachpeermaintainsalist,calledapartialview,ofotherpeersthatformpartofthedistributedapplication.Thispartialviewhasthesolepurposetoallowfortheexchangeofapplication-specificdatabetweentwopeers.Ifweweretorepresentsuchadistributedapplicationasagraph,eachpeerwouldberepresentedbyavertexandanedgewouldrepresentthefactthattwopeerswouldhaveeachotherintheirrespectivepartialviews.Takingallthesepeersandtheirrespectivepartialviewsintoaccountleadstowhatisknownasa(peer-to-peer)overlaynetwork:acommunica-tionnetworkbetweentheconstituentscomprisingadistributedapplication.StructuredoverlaynetworksOneimportanttypeofoverlaynetworkisformedbynetworksthatareor-ganizedinastructuredfashion.Inparticular,thepartialviewofeachpeerisfilledwithreferencestoveryspecificpeersasopposedtohavingapartialviewwithreferencestorandomlychosenpeers.Wewilldiscussthelatter196 typeinthefollowingsection.TheChordpeer-to-peernetworkTomakethesemattersconcrete,let’sconsidertheChordpeer-to-peernet-work[Stoicaetal.,2003].TheprinciplebehindChordisrelativelysimple,whichisalsothereasonwhywe’lluseittoexplainstructuredpeer-to-peernetworks.Asurveyofother,similarsystems,isprovidedbyLuaetal.[2005].Chordisadistributedapplicationthatcanbeusedtoefficientlystoreandlocatedataacrossahugecollectionofhosts.Eachhostisrequiredtohaveauniqueidentifier,representedbyanm-bitnumber.Typically,m=128,meaningthattherecanbeasmuchas21283.41038identifiers.That’senoughtofilleverysquaremillimeterlandoftheEarthwithmorethan21018hosts.Itshouldsufficeforawhile.Inpractice,thismeansthatwhenahostneedstojoinaChordnetwork,itcansimplygenerateitsownrandomidentifierwithoutrunninganyseriousriskthatsomeotherhosthasgeneratedthesameidentifier.AhostinaChordnetworkisassumedtostoredata.Tokeepmatterssimple,weassumethatdataisstoredinafile,witheachfilehavingauniquekey.Likehostidentifiers,eachkeyisanm-bitnumber.ThefundamentalprincipleinChordisthatthefilewithkeykisstoredonthehostwiththesmallestidentifieridgreaterorequaltok.ComputingifidkisdoneinmoduloMarithmetic,whereM=2m.Note8.2(Mathematicallanguage)RecallthatmoduloMarithmeticisappliedtointegernumbers,mappingallnumberstovaluesbetween0andM1.AcommonnotationiskmodM.So,withM=32,wewouldhave:kkmod32443131320527311Toillustrate,consideraChordnetworkwithm=5,meaningthatM=25=32.Supposewehavepeers(i.e.,hosts)withidentifiers1,4,9,11,14,18,20,21,and28.Itisconvenienttorepresentthissystemasaring,asshowninFigure8.6.Wesimplydenotethepeerwithidentifierpaspeerp.Theactualpeersareshownasgray-coloredcircles;therestoftheunused197 identifiersarerepresentedbydashedcircles.Asshown,thepeerwithid=1willberesponsibleforstoringfileswithkey29,30,31,0,and1,respectively.Indeed,inmoduloMarithmeticwemayhavethat131.3101302293284Peer1stores27fileswithkeys529,30,31,0,1266257Peer9stores24fileswithkeys85,6,7,8,9239Peer21stores22filewithkey21102111201219131814171615Figure8.6:TherepresentationofaChordnetworkasaring.Thepeerresponsibleforstoringafilewithkeykiscalledthesuccessorofk:Definition8.1:Considerafilewithkeyk.InaChordpeer-to-peernetwork,thepeerwiththesmallestidentifierpkiscalledthesuccessorofk,denotedassucc(k).Perhapsabitconfusing,butitisimportanttonotethatifp=k,succ(k)=p.CentraltothedesignofChordisefficientlylookingupdatabymeansofkeys.Anaivewayofdoingalookupisasfollows.Assumethatthepeerwithidentifierp(i.e.,peerp)isrequestedtolookupafilewithkeyk.Ifpk,peerpcanstillsimplyforwardtherequesttoitsleft-handneighbor,untilapeerqisfoundwiththesmallestidentifierqk.Itisnotdifficulttoseethatthissearchstrategywould,onaverage,requirethatarequestisforwarded1ntimes,wherenis2thetotalnumberofpeers.Ifn=10,000,itwouldtakeforevertolocatethefile.Amuchmoreefficientapproachistoleteverypeerstore“shortcuts”tootherpeersatincreasinglylongerdistances.Theseshortcutsarestoredin198 apeer’spartialview,whichiscalledafingertableinChord.EachfingertableFTpofpeerpconsistsofmentries,numbered1,2,...,m,anddenotedasFTp[1],...,FTp[m].Entryicontainsthesuccessorofkeyp+2i1:FTi1p[i]=succ(p+2).Inotherwords,entryicontainsashortcuttothepeerresponsibleforkeyp+2i1.ThefingertablesforourexampleChordnetworkfromFigure8.6areshowninFigure8.7.3101isucc(p+2)i-13021911291432921243931283944144449520514275185266257248111211239314121418228528223281042812854120118228212202181111432832831821441204284281231859545142019135281814171615Figure8.7:FingertablesforthepeersfromFigure8.6.Let’scheckafewofthesefingertables:•ConsiderFT4=[9,9,9,14,20].FT4[1]shouldcontainsucc(4+211)=succ(5).Thepeerresponsibleforkey5isindeed9.ThesameholdsforFT4[2]=succ(4+221)=succ(6)andFT4[3]=succ(4+231)=succ(8).LikewisewithFT4=succ(4+241)=succ(12),theresponsi-blepeerforkey12isindeedpeer14.Finally,FT5=succ(4+251)=succ(20),whichbringsustopeer20.•Forpeer21,wehaveFT21=[28,28,28,1,9].Forthefirstthreeentries,weareseekingthesuccessorpeersfor21+1,21+2,and21+4,re-spectively,whichisindeedpeer28.FT21[4]=succ(21+8)=succ(29),forwhichpeer1isresponsible.Finally,FT21[5]=succ(21+16)=succ(37).Becauseweneedtoapplymodulo32arithmetic,wefindthatFT21[5]=succ(37mod32)=succ(5),whichleadsustopeer9.199 Itisnownothardtoimaginehowanarbitrarypeerpreceivingarequesttolookupkeykproceeds:itlooksinitsfingertabletoidentifypeerqsatisfyingq=FTp[i]k2i1,butatthesametimejqzj2i2i1=2i1.Inotherwords,qliesnumericallyclosertozthantop,i.e.,dM(p,q)>dM(q,z)whereM=2m.22ThisalsomeansthatdM(q,z)<1dM(p,z).Thelatterobservationisimportant,222202 7.57.06.56.0Averagepathlength5.52468101214161820Networksize(x1000)Figure8.11:TheaveragepathlengthforaseriesofChordnetworkswithm=28andincreasingnumberpeers.foritmeansthateachtimearequestisforwarded,thedistancetozmeasuredaccordingtometricdMisatleasthalved.2Whatdoesthismeanafterhavingforwardedtherequest2log(n)timesand2reaching,say,peerr?Consideringthatwehalfthedistancetozineverystep,12log(n)2log(n)log(n2)2wewillhaveatotalreductionof(2)2=22=22=1/n.Thedistancebetweenpandkwillbeatmost2m,meaningthatin2log(n)steps,2thedistancebetweenrandzwillbeatmost2m/n2.Becauseweareassumingthatpeeridentifiersandkeysaredrawnuniformlyatrandom,theprobabilitythatwehavechosenapeeridentifierfromanintervaloflengthLforann-peerChordnetwork,isequaltonL/2m.Inotherwords,theprobabilitythatthereispeerwithanidentifierbetweenkandr,isequalton(2m/n2)/2m=1/n,whichisnegligibleforlargen.Weconcludethatthenumberofpeersthatneedtobecontactedbeforeresolvingalookuprequestisproportionaltolog(n).2LetustakealookatsomeotherpropertiesofChordnetworks,startingwiththedegreedistribution.Becausewearedealingwithadirectedgraph,weshouldmakeadistinctionbetweenthedistributionindegreesandout-degrees.ConsideraChordnetworkwithn=10000peersandusingm-bitidentifiers.Figure8.12showsthehistogramsfortheindegreesaswellastheoutdegrees.Whenitcomestotheindegrees,thedistributionseemstofollowanexponentialcurve(note,however,thatwearenotdealingwithapower-lawdistribution).Thisalsomeansthatthereareafewpeerswithmanyincomingarcs,inturn,meaningthattheymayneedtoprocessmanylookuprequests.Theoutdegreesaremoreorlesssymmetricallycentered203 around13.5.AsarguedbyStoicaetal.[2003],eachfingertablewillhaveonlyapproximatelylog(n)uniqueentries,whichinourexamplecomes2downtolog(10,000)=13.3.2600350050025001500400Occurrences50030011121314151617OccurrencesOutdegree20010020406080100IndegreeFigure8.12:ThedistributionsofindegreesandoutdegreesforaChordnetworkwithn=10000peersusing28-bitidentifiers.Whatabouttheclusteringcoefficient?Tokeepmatterssimple,wedroptheorientationofaChordnetworkandcomputetheclusteringcoefficientofthecorrespondingundirectedgraphforvariousnetworksizes.TheresultisshowninFigure8.13.First,comparedtoanErdos-R¨enyirandomgraph,´weseethattheclusteringcoefficientisveryhigh.Moreover,theclusteringcoefficientonlyslowlydecreaseswhenthenetworkgrows.Combinedwiththefactthattheaveragepathlengthislow,wemayindeedconcludethatChordnetworksconstitutesmall-worldnetworks.RandomoverlaynetworksProcessesindistributedapplicationssuchasChordapplystrictrulesformaintainingpartialviews,effectivelyleadingtoawell-structuredoverlay.Incontrast,inthecaseofrandomoverlaynetworks,alsoreferredtoasun-structuredpeer-to-peernetworks,thegoaliskeepahighdegreeofran-domnessinthepartialview.Inotherwords,thegoalistoletentriesrefertoseeminglyrandomlychosenpeers.Inthissection,wewilltakeacloserlookataclassofrandomoverlaynetworksthatareconstructedthroughwhatisknownasgossiping.204 0.120.110.100.0915101520Figure8.13:TheclusteringcoefficientChordnetworksofvarioussizesusing28-bitidentifiers.Aframeworkforepidemic-basednetworksAssaid,manyunstructuredpeer-to-peernetworksmaintainanoverlaythatresemblesarandomgraph.Therearenumerouswaystodothis,andinmanycasesweseethatthismaintenanceisdoneusingcentralizedcompo-nents.Inotherwords,specialcentralserversareusedassistinmaintainingsomeformofrandomnessintheoverlaynetwork.Afullydecentralizedap-proachcanbeachievedbymakinguseofwhatareknownasepidemicpro-tocols.Inanepidemicprotocol,apeer(again,meaningahost)uniformlyatrandomchoosesanotherpeertoexchangedatawith.It’sassimpleasthat.Moreformally,wehavethefollowing.ConsideracollectionofpeersP=fp1,p2,...,png,eachcapableofstoringapotentiallyverylargecollectionoffiles.Eachfilefhasaversionnumberv(f)tellinghowoftenthefilehaschanged.Tokeepmatterssimple,weassumethateachfilehasexactlyoneassociatedpeerown(f)thatisallowedtochangethatfile.Letv(f,p)denotetheversionoffilefcurrentlystoredatpeerp,andFS(p)thesetoffilesstoredatp.Iffisnotstoredatpeerp,thenv(f,p)=0.Itshouldbeobviousthat8f,p:v(f,own(f))v(f,p)Theprincipalgoalofanepidemicprotocolistomakesurethateveryupdatetoafileisdisseminatedtoallpeers.Tothisend,eachpeerp2Pperiodicallychoosesuniformlyatrandomanotherpeerq2P,andproceedsasfollows2:1.forallf2FS(p):ifv(f,p)>v(f,q),thenFS(q)FS(q)[ff@pg,possiblyreplacinganolderversionoffthatwasstoredatq.2Weusethenotation“f@p”todenotethefilefasstoredatpeerp.205 2.forallf2FS(q):ifv(f,p)”canbeusedtoemphasizeapieceoftextonadisplay.Mostimportantforourpurposes,isthataWebpagecancontainareferencetoanotherpage,suchas:mainpagewhichtellsabrowserthatifthatreferenceisactivated(e.g.,byclickingwithamousepointeronthetext“mainpage”shownonthedisplay),thatitshouldfetchthepagenamedwww.distributed-systems.net/main.html.Lifewouldbesomuchsimplerifallreferenceswouldbesoexplicitasinthisex-ample.Unfortunately,discoveringhowWebpagesarelinkedtoeachotherturnsouttobeabitmorecomplicated.Tounderstandwhythisisthecase,weneedtodelveintohowtheWebstructureisactuallymeasured.AcrucialtoolfordiscoveringWebstructureisaso-calledcrawler:aprogramthatautomaticallyfetchespagesthatarereferencedfromagivenpage.ThebasicprincipleofacrawlerisshowninFigure8.21.Startingfromasetofseedpages,itprocessesapagebyextractingthereferencestootherpages.Eachofthesereferencesisappendedtoalist,calledthefrontier,reflectingthepagesthathavebeenfoundbutnotyetinspected.Whenapagehasbeenprocessed,itisstoredinalocalrepository.214 Seeddocument(s)RemovereferenceFrontierfromheadoflistFetchpageExtractreferences;appendtofrontierStorepageFigure8.21:TheprincipaloperationofaWebcrawler.Afterhavingprocessedtheseedpages,thecrawlerremovestherefer-encethatisattheheadofthefrontierandfetchesthereferencedpage.Itthensimplyextractsthereferencesagain,appendingeachofthemtothefrontier,afterwhichthepageisstoredlocally.Itshouldbeclearthatinthisway,oneshouldindeedbeabletofetchandstoreallpagesthatarereachablefromtheseedpages.ThattherepositoryforcrawlingandsearchingneedstobehugeisexemplifiedbyGoogle’sapproach.Ithasbeenestimatedthatby2006,Googleusedapproximately500,000servers,spreadacrosstheIn-ternet(seealsoBarrosoetal.[2003]).However,ifweareinterestedonlyindiscoveringthetopologyoftheWeb,pagesobviouslyneednotbestored.Inthatcase,weneed“merely”buildupadirectedgraphinwhicheachvertexrepresentsafetchedpage,andeveryreferenceisrepresentedbyanarc.AsexplainedbyThelwall[2004]andLiu[2007],thereareseveraldifficul-tiesthatneedtobedealtwith.First,modernWebpagesarenolongersimpledocumentsformattedinHTML.Instead,theymayconsistofdifferentparts,someofwhicharecompleteprograms(writtenin,forexample,JavaScript).Findingreferencesinsuchdocumentscanbeclosetoimpossible,certainlyiftheircreatorshavedeliberatelyappliedtechniquestoobfuscatereferences.ObscuringreferencesissometimesdoneonpurposetopreventWebpagesfrombeingindexed.Second,manyWebpagesnowadaysarenotstoredstaticallyinfilesys-temsataserver’ssite,butareinsteadconstructedandcomposeddynami-callyfromadatabasequerythatiseffectivelypartoftheHTTPrequest.Theproblemisaggravatedwhentheserverisusingprogramstocompletelygeneratepagestobereturnedtotherequestingclient.Asaconsequence,215 weseethatmanyreferencesinthereturnedpageareoftenpersonalized(i.e.,basedonspecificinformationassociatedwiththeclient),butalsothatthesamerequestmayreturndifferentpages(i.e.,pagesarealsodependentonwhentheywererequested).Conceptually,thismeansthatthegraphthatrepresentstheWebofpagesthatrefertoeachother,changesnotonlybe-causeedgesaredifferentallthetime,butalsobecauseverticeseffectivelyoftenexistonlyonceandthendisappearagainforgood.Thirdly,andrelatedtodynamicWebpages,crawlersneedtobeawareofspidertraps.Inthiscase,thereferencesreturnedtoacrawlerdependontheorderinwhichthecrawlerhasvisitedpagesfromagivensite.ItmaythushappenthatwhenacrawlerhasfetchedpageAanddiscoveredareferencetopageB,thattheserverhostingBmaygenerateareferencerAtopageAagainthatiscontainedinB,butthatisinterpretedbythecrawlerasanewreference(i.e.,itfailstorecognizethatrAreferstoA,whichithadalreadyanalyzed).Finally,WebsitesmaysimplyinstallspecialfilesthatarerequiredtobereadbyallcrawlersandwhichspecifyexactlywhichpartsoftheWebsitearenottobeinspectedbycrawlers.Althoughthereisnothingthatpreventsacrawlertostillinspectthoseparts,whensuchbehaviorisdiscovered,anadministratorwillmostlikelypreventanytrafficfromthesitefromwhichthecrawlerisoperating.SamplingtheWebtopologyThereareotherissuesthatmakeWebpagediscoverydifficult,butoneinparticularisimportantwhenfocusingondiscoverytopologies.ItwillcomeasnosurprisethatbeingabletofetchallWebpages,andthusbuildinganaccurateWebgraphispracticallyimpossible.Bytheendof2008,thenum-berofWebpagesthathavebeendiscoveredandindexedbysearchengines(alsoreferredtoasthesurfaceWeb),isestimatedtobeapproximately25billion(i.e.,25109).TheactualsizeoftheWebislikelyanorderofmag-nitudelarger.Therefore,togetanimpressionofanynetworkstatisticsre-gardingtheWebgraph,weareforcedtoconsideronlyasample.Inotherwords,todiscovercertainpropertiesoftheWebgraphwenecessarilyneedtoresorttocollectingasubgraph.ThequestionishowtomakesurethatsuchasubgraphisrepresentativeforthestructureoftheentireWebgraph.Tothisend,Becchettietal.[2006]madeacomparisonbetweenseveralcrawlingstrategies.Notethatwhenacrawlercollectspages,itappendsthereferencesitfindstothefrontier.Thisopensupseveralalternativesforinspectingnextpages.InFigure8.21wesuggestedthatpagesarefetchedfromtheheadofthefrontier.Thisisonecommonstrategy,whichleadstowhatisknownasabreadth-firstinspection.Whathappensisthatfirst216 allseedpagesareinspected.Whenthisiscompleted,thecrawlerinspectsthepagesthataredirectlylinkedfromtheseedpages,thatis,atdistance1.Subsequently,thepagesatdistance2fromtheseedpagesareinspected,andsoon.Analternativeapproachisnottoselecttheheadofthefrontier,buttorandomlyselectareferencefromthefrontiereachtimeanewpageistobeinspected.Also,onecantakethepopularityofapageintoaccount,forex-amplebyconsideringthenumberofpagesthatareknowtopointtoit(i.e.,theindegreeofapage).ThislatterstrategyiscloselyrelatedtothestrategyfollowedbyGoogletodeterminetheimportanceofaWebpage,knownasPageRank[BrinandPage,1998].Animportantconclusionfromtheirstudy,isthatbreadth-firstinspec-tionofpagesleadstoreasonablesubgraphs,providedthatthesegraphsbythemselvesarerelativelylarge.Formanyoftheirnetworkstatistics,itturnedoutthatasubgraphhadtocontainapproximately50%oftheorigi-nalsetofverticesinordertoproducerepresentativeresults.Thisisactuallyquiteadramaticresult,asitseemstoimplythatobtainingarepresentativesampleoftheWebmayturnouttobeextremelydifficult.Andindeed,arecentstudybySerranoetal.[2007]showsthattheremaybesignificantdifferencesbetweenvarioussamples.Beforewegointode-tails,letusfirstconsidersomeimportantstructuralpropertiesofaWebsub-graph.Bythelatter,wemeanagraphthathasbeenobtainedbycrawlingasubstantialnumberofWebpagesandsubsequentlyrepresentingthepagesandlinksbetweenthemasadirectedgraph.IntheirfamousstudyoftwocrawlsoftheAltaVistasearchenginecom-prisingasetofover200millionpagesand1.5billionlinks,Broderetal.[2000]suggestedtorepresenttheWebasthebowtieshowninFigure8.22.Aninterestingaspectoftheirstudywasthattheirsamplemostlikelycov-eredcloseto16%ofthesurfaceWebatthattime,whichmaybearguedtobelargeenoughtobeconsideredrepresentative.Broderetal.madeadistinctionbetweenthefollowinggroupsofWebpages:SCCTheStronglyConnectedComponent(SCC)consistsofagroupofWebpagesofwhichthecorrespondingdirectedgraphisstronglyconnected.Inotherwords,betweenanypairofverticesthereexistsadirectedpathfromonevertextotheother.INThisgroupofINpagescannotbereachedfromanypageintheSCC,buttheSCCcanbereachedfrompagesinIN.Moreformally,foreveryvertexv2INandw2SCC,thereexistsadirected(v,w)-pathbutnodirected(w,v)-path.217 TendrilINSCCOUT44Million56Millionnodes44MillionnodesnodesTendrilTubeDisconnectedcomponentsFigure8.22:ThemacroscopicstructureoftheWeb[Broderetal.,2000].OUTPagesinOUTcanbereachedfromtheSCC,butarenotpartoftheSCC.Inparticular,thismeansthatforanyvertexv2OUTandw2SCC,thereexistsadirected(w,v)-path,butno(v,w)-path.TENDRILSAtendrilisacollectionofpagesconnectedtoeitherINorOUT,butwhosepagesdonotbelongtoeitherIN,OUT,orSCC.Forexample,atendrilTENconnectedtoINconsistsofpagesthatcanbereachedfromoneormorepagesinIN,butanypathfromapagev2INtoapageinTENwillneverleadtoapageinSCC.Notethataten-drilitselfmayformastronglyconnectedcomponent.Furthermore,itmayverywellbethecasethatcertaintendrilscanbereachedfromapageinIN,butalsoofferapathtoapageinOUT,whilenoneofthepagesinthattendrilbelongtoSCC.Inthiscase,thetendriliscalledatube.DISCONNECTEDThisgroupconsistsofpagesthatcannotbereachedfromanyoftheotherfourgroups.Typically,thesepagesareneverfoundwhencrawlingtheWeb.Alternatively,ifacrawlerstartsfromadisconnectedpage,itwillneverreachanypageinIN,SCC,OUT,oratendril.Broderetal.foundthattherewereapproximately44millionpagesinIN,OUT,andallthetendrils.TheSCCconsistedofroughly56millionpages,andatotalofsomecloseto17millionpagesweredisconnected.IfweweretoconsiderthissamplerepresentativefortheentireWeb,itshouldbeclearthatanycrawlercaneasilymissasubstantialpartofallavailableWebpages.218 Forexample,whenthecollectionofseedsisdrawnfromOUT,oranyofthetendrils,itwillbeimpossibletoreachSCC.ReturningtoSerranoetal.[2007],theseauthorshaveshownthatthese-lectionofseedpagesisimportantwhenitcomestofindingthepagesthatmatter.Infact,itturnsoutthatevenwhenconsideringverylargesamples,theratioofpagesinIN,OUTandSCCmayvarywidely.Togiveanideaofwhatwe’redealingwith,Serranoetal.consideredfourdifferentlargesamples,ofwhichthecharacteristicpropertiesareshowninFigure8.23.InFigure8.23(b)wevisualizetherelativedifferencesbetweenIN,SCC,andOUT,andcompareittothestructurefoundearlierbyBroderetal.Thecon-clusionisclear:despitethefactthatwemaybesamplingaverylargepartoftheWeb,itisdifficulttoconcludethatthesamplemayberepresentativefortheentireWebgraph.Apparently,wehavenotyetfoundavalidtechniqueforrepresentativesampling(seealsoCothey[2004]).ComponentSample1Sample2Sample3Sample4SCC56.46%65.28%85.87%72.30%IN17.24%1.69%2.28%0.03%OUT17.94%31.88%11.26%27.64%Other8.36%1.15%0.59%0.02%Totalsize80.57M18.52M49.30M41.29M(a)AltaVista1234(b)Figure8.23:ComparingtherelativesizesofIN,OUT,andSCCfordifferentWebsubgraphs.(a)Theactualfigures;(b)Relativecomparison.FromSerranoetal.[2007].CharacteristicsofWebgraphsLetusnowtakealookatsomeofthepropertiesofWebgraphs.VariousstudiesarebasedontheStanfordWebBaseproject[Choetal.,2006],inwhichvariouscrawlsarebeingconductedandmadeavailabletothepublic.219 Basedononesuchcrawl,comprisingmorethan200millionpages,Donatoetal.[2007]analyzedsomeofthecharacteristicsofWebgraphs.Asmentioned,Webgraphsaredirected:ahyperlinkcontainedinpageAreferringtopageB,isnaturallyrepresentedbyanarcfromvertexAtoB.Inthecaseofvertexdegreedistributions,itisimportanttomakeadis-tinctionbetweenindegreesandoutdegrees.Figure8.24showstheindegreedistributionoftheDonatoetal.WebBasecrawlafterremovingthenodeswithverylowindegree.Inthiscase,aswehavedonebefore,thenodeshavebeenrankedindescendingorderaccordingtotheirindegree.They-axisshowstherelativeindegree,withthehighestindegreelabeledas“1.”Weseethatthiscurveagainfitsapower-lawdistributionquitereasonably,whichhasindeedbeenconfirmedbyDonatoetal..1.0000.5000.1000.050Relativeindegree0.0100.005110100100010,000NodeID(rankedaccordingtodegree)Figure8.24:ThedistributionofindegreesofaWebBasecrawl,asderivedfrom[Do-natoetal.,2007].ItisinterestingatthispointtocomparetheactualindegreedistributionwiththePageRankalgorithmthatisusedtodistinguishimportantpages,i.e.,pagesthatapparentlycontainmuch-wantedinformation.PageRankisusedinGoogleandisbasedonindegrees.Inparticular,therankofapageiisrecursivelydefinedas:rank(j)rank(i)=(1d)+då!dout(j)hj,ii2Ewhered2[0,1)isknownasadampingfactor.Whatweseeisthattherankofpageiisdeterminedbythepagerankofthepagesreferringtoi.Intuitively,thismeansthatapageisconsideredimportant,notonlyifmanyotherpagesarereferringtoit,butnotablywhenitisreferredtobymanyotherimportantpages.ItisbelievedthatforPageRankasusedinGoogle,d=0.85.220 Whattheoptimalvaluefordshouldbeisunclear,butneitherd=0ordcloseto1producesgoodranks[Boldietal.,2005].Asitturnsout,thereisonlyaweakcorrelationbetweentherankofapageanditsindegree[Pan-duranganetal.,2006].Inotherwords,itisnotnecessarilythecasethatapagewithahighrankalsohasahighindegree,andviceversa.Ontheotherhand,severalstudiesshowthatifwecomputethedistributionofPageRankvalues,weagainfindapower-lawdistributionwithscalingexponent2.1.Again,weareconfrontedwiththedifficultyofdrawingstrongconclusionsonthestructureoftheWebgraph,evenwhenusingapparentlyreasonablemetricsandsamplingtechniques.Fortheoutdegreedistributionweobserveaverydifferentbehavior,asshowninFigure8.25.Thereisnotaclearexplanationwhytheoutdegreedoesnotfitapower-lawdistribution,butonepossibilityisthatlinkstootherpagesneedtobeprovidedbythemaintainersofWebpages.Thesemain-tainersmaysimplynothavethepatience(ortheneed)toincludemanyhyperlinksintheirpages.1.000.700.500.30Relativeoutdegree0.200.1551050100500NodeID(rankedaccordingtodegree)Figure8.25:ThedistributionofoutdegreesofaWebBasecrawl,asderivedfrom[Donatoetal.,2007].LetusnowconsidersomeothercharacteristicsofWebgraphs.InastudybasedonasimpleWebcrawlfrom1998,Adamic[1999]constructedagraphbyconsideringWebsitesinsteadofpages.Inparticular,agraphwascon-structedbywhichvertexAhasanarctovertexB,iftherewasaWebpagehostedbysiteAthatreferredtoapagehostedbyB.Inthisway,agraphwasconstructedcomprisingroughly150,000vertices(afterdiscardingleafvertices,i.e.,havingdegree1).Fortheunderlyingundirectedgraph,theaveragepathlengthwasestimatedtobe3.1,whiletheclusteringcoefficientwasfoundtobe0.1078.Clearly,wearedealingwithasmall-worldnetwork.Whenconsideringthedirectedgraph,thelargeststronglyconnected221 component(SCC)consistedofapproximately65,000sites,whichisofthesameorderastheWebgraphexaminedbyBroderetal..However,Adamicfoundanaverageshortestdirectedpathlengthof4.2,whereasBroderetal.foundthistobeequaltoapproximately16.FortheSCCofthelatter,theaverageshortestpathlengthintheunderlyingundirectedgraphwasesti-matedtobe6.83.Thedifferencebetweentheseobservationsmaybecausedbyconsideringsitesversuspages.222 CHAPTER9SOCIALNETWORKS Sofar,ourapplicationsofgraphtheoryhavebeentakenfromfairlytech-nicalcommunicationnetworks.Inthesenetworks,thenodesaregenerallyformedbycomputersorotherdevices.However,graphtheoryhasalsobeenextensivelyusedtoanalyzesocialstructures,alsoknownassocialnet-works.Inasocialnetwork,anoderepresentsasocialentity,typicallyaper-son,anorganization,andsoon.Anedgestandsforaspecificrelationshipbetweenitsincidentnodes.Incontrasttootherareasinsocialsciencesinwhichitisimportanttounderstandwhatcharacterizessocialentities(e.g.,byconsideringtheirattributes),socialnetworkanalysisconcentratesonthestructureofrelationshipsandtriestoexplainsocialphenomenafromthosestructures.Itshouldcomeasnosurprisethatgraphtheoryplaysakeyroleinsocialnetworkanalysis.9.1Socialnetworkanalysis:introductionLetusstartourdiscussionwithamotivatingexampletoillustratetheappli-cabilityofsocialnetworkanalysis.Wealsobrieflyconsidersomehistoricalbackgroundbeforedelvingintothespecificmetricsthatareusedtoanalyzesocialnetworks.ExamplesAnillustrativeexampleofhowsocialnetworkanalysiscanbeeffectivelyusedisdescribedin[Michael,1997].TheexamplehasalsobeenusedasacasestudyindeNooyetal.[2005]fromwhichwetaketheresultsoftheanalysis.Thecaseisaboutasmallwood-processingfirminwhichmanage-mentproposedanewcompensationpackage.Thisledtoastrike,lettingmanagementbelievethatthecommunicationtotheworkershadbeenfarfromoptimal.Theydecidedtohavethesocialnetworkanalyzed.Tothisend,theworkerswereaskedtoindicatehowoftenandwithwhomtheydiscussedthestrike.Frequencywasmeasuredona5-pointscale,leadingtoagraphinwhichtwopeoplewerelinkediftheyfrequentlytalkedtoeachother.ThisgraphisshowninFigure9.1.Thereareanumberofpropertiesthatcanbederivedfromthisgraphandwhichcanbeexplainedwhenwetakeacloserlookattheindividualmembers.First,thereareapparentlythreeclusters.Thesmallestoneisformedbyfourworkers,namelyEduardo,Domingo,Carlos,andAlejandro.TheseworkersallusedSpanishastheirfirstlanguage.Ofthese,AlejandrowasmostproficientinEnglish.Inaddition,BobspokesomeSpanish,whichmostlikelycontributestothelinkwithAlejandro.AnotherclusterisformedbyFrank,Gill,Ike,Mike,Bob,Hal,John,Lanny,andKarl(allrepresentedasagray-coloredvertex).Itturnedoutthattheseworkersformedagroup225 TedRussXavierWendleVernUtrechtQuintSamUnionnegotiatorsPaulNormOzzieCarlosLannyBobAlejandroKarlDomingoMikeJohnHalEduardoGillIkeFrankFigure9.1:Therelationshipbetweenworkersonstrikeinawood-processingfirm.ofyoungerpeople,whodidnotspeakthatoftenwiththeolderco-workers.Thelatterformedthethirdcluster,consistingofNorm,Ozzie,Paul,Sam,Wendle,Xavier,Vern,Ted,Utrecht,Russ,andQuint.Thisclusteringreflectswhatisknowninsociologyashomophily:thetendencyofpeopletomaintainstrongerrelationshipswiththosewhoaresimilartothemselves.Thetwounionnegotiators,SamandWendle,wereinitiallyresponsibleforproposingandopeningthediscussiononthenewpackage.However,bytakingalookatthenetwork,itisnotdifficulttoseethatneitherofthemactuallyformsanidealsourceforinitiatingcommunication.Intuitively,BobandNorm,andtoacertainextentalsoAlejandro,formthemostimportantpeopleinthisnetwork.Andindeed,whenmanagementapproachedBobandNormdirectlytoexplainwhatthenewpackagewasallabout,withinonlyshorttimeallworkersunderstoodthedealandwerewillingtonegoti-ate.Thestrikeended.Letusconsideranotherexample,thistimeconcentratingontheMedicifamily.ThishighlyinfluentialandpowerfulfamilyoriginatedfromFlo-rencewhereGiovanidiBiccicreatedtheMediciBank,makinghimoneofthewealthiestmenofFlorence.Hisson,Cosimode’Medici,contin-uedalongthesamepathashisfatherandisconsideredasthefounderoftheMedicidynasty,adynastywhichlastedforapproximately200years.Cosimode’Mediciunderstoodwhatittakestogetpowerandstayinpower:makesurethattherightpeoplegetmarriedtoeachother.PadgettandAnsell[1993]analyzedtheMedicidynastyduringthefirsthalfthe1400s,includinganoverviewofmarriagesbetweentheMedici’sandotherfami-226 lies,leadingtothesocialnetworkasshowninFigure9.2.PeruzziBischeriGuadagniLambertesPucciStrozziTornabuonAlbizziCastellanRidolfiBarbadoriMediciGinoriAcciaiuolSalviatiPazziFigure9.2:TherelationbetweeninfluentialFlorentinefamiliesinthebeginningofthe15thcentury.FollowingJackson[2008]weprovideasimpleanalysisofthisnetwork.Aseriousandin-depthanalysisoftheactualsocialrelationshipsisgivenbyPadgettandAnsell[1993].ForouranalysisitisinterestingtonotethattheStrozzifamilynotonlyhadmoremoney,butwerealsobetterrepresentedinthelocallegislature.Nevertheless,theMedici’seventuallybecamemorepowerful.Let’sseewhatapossiblereasoncouldbe,bylookingatthebe-tweennesscentrality.RecallthatthebetweennesscentralitycB(u)ofaver-texuisdefinedasjS(x,u,y)jcB(u)=åjS(x,y)jx6=ywhereS(x,u,y)isthecollectionofshortest(x,y)pathscontainingu,andS(x,y)isthesetofshortestpathsbetweenverticesxandy.IfwenormalizecB(u)bythepossiblepairsoffamiliesthatucanconnect,i.e.,by(n1)(n2)/2,onecancomputethatthebetweennesscentralityfortheMedici’sisequalto0.522,whereasthisvalueisonly0.103fortheStrozzi’s.Phrasingthisdifferently,theMedici’swereonmorethan50%ofallshortestpathsinthenetwork,whereastheStrozzi’scoveredonly10%.Indeed,whenitcomestoexertingpower,theMedici’swereseeminglyinamuchbetterposition.HistoricalbackgroundAlthoughsocialnetworkanalysissometimesappearstobeanoveldisci-plinethatrecentlyemergedasanotherpartofthescienceofnetworks,itis,infact,sincelongawell-establishedareaofresearch.Alreadyinthebeginningofthepreviouscentury,psychologistswereusingdiagramstorepresentrelationshipsbetweensocialentities.AnimportantcontributionwasmadebyJacobMorenowhointroducedthesociograminthe1930s.In227 asociogram,anindividualisrepresentedbyapoint,andrelationshipsbe-tweenindividualsbylines–indeed,agraph.TheimportanceofMoreno’ssociogramsliesinthefactthathesuggestedthatonecouldderivespecificcharacteristicsfromsociograms,likeidentifyinginfluentialpeople,identi-fyingflowsofinformation,andsoon.Andindeed,theyhaveproventobeapowerfultoolfordiscoveringstructureinsocialgroups.Wewillreturntoonespecificusebelow.WithMoreno’ssociograms,thescenewassetforfurtherworkinwhatisknownassociometry,whichisallaboutquantitativelymeasuringsocialre-lationships.Animportantconceptthatarousewasthatofatriad.Atriadisasubgraphofasociogramconsistingofthreepointsthatcouldbeconnectedtoeachother.Obviously,triadsarerelatedtotriangles,whichwediscussedinChapter6.Formally,thedistinctionbetweenatriadandatriangleisthatinthelatterthethreeverticesarejoinedwitheachother.Foratriad,thisneednotbethecase.Triadsbecameimportantforstudyingthepresenceandevolutionofsocialsubgroups.Forexample,CartwrightandHarary[1956]developedatheoryonsocialbalanceinwhichtheyconsideredsub-groupsofatleastthreeindividuals,asshowninFigure9.3.+/-AB+/-+/-CFigure9.3:Atriadtobeanalyzedforsocialbalance.Inthisparticularcase,therelationshipsbetweenindividualswasas-sumedtobesymmetric:ifAlicelikedBob,thenBobwouldalsolikeAl-ice.Ifwerepresent“likeeachother”witha“+”and“dislikeeachother”witha“,”wecanspeakofbalancedandimbalancedtriadsasreflectedinFigure9.4.Theimportantobservationhereisthatasociogramisusedtoanalyzeasocialgroupasawholebyconsideringallitsmembers’perspec-tivesontheirrelationshipssimultaneously.Inotherwords,thefocusisondiscoveringstructureswithinthesocialgroup.Inthisway,onewouldbeabletomakestatementsabout,forexample,thestabilityorbalanceofanentiregroup,andtowhatextentonecouldexpectthatrelationshipswouldchange(undertheassumptionthatgroupsaimforbalance).Wewillreturntothisphenomenonlaterinthischapter.Theideaoffocusingonthediscoveryofglobalstructuresthroughtheanalysisofsmall-scaleinteractions,suchasoccurredintriads,ledtonewanalysistechniques.Inparticular,researchersbecameinterestedinbeingabletoidentifydifferentsubgroups.Intermsofgraphs,thismeantthat228 A–BB–CA–CB/IDescription+++BEveryonelikeseachother++IThedislikebetweenAandCstressestherela-tionBhaswitheitherofthem++IThedislikebetweenBandCstressestherela-tionAhaswitheitherofthem+BAandBlikeeachother,andbothdislikeC++IThedislikebetweenAandBstressestherela-tionChaswitheitherofthem+BBandClikeeachother,andbothdislikeA+BAandClikeeachother,andbothdislikeBINobodylikeseachotherFigure9.4:Thepossiblebalanced(B)orimbalanced(I)relationsinatriadbasedonlikingordislikingeachother.techniquesneededtobedevelopedthatwouldallowtheidentificationofcomponents,yetallowingcomponentstosometimesstillbeconnectedtoeachother.Toillustrate,considerourexampleoftheworkersatthewood-processingfirmagain.Sociologistswereinterestedtoseewhichpeopleac-tuallyformedgroupswithinthatcommunityandwereabletoidentifythreeofthem,asmentionedbefore.Thesegroupscanbemoreeasilyvisualizedwhenconsideringtheadjacencymatrixoftheassociatednetwork,asshowninFigure9.5(a).Forclarity,weomitthenamesoftheworkers.Acell(i,j)iscoloredblackifworkeriandjarelinkedtoeachother.Bysimplyre-orderingtherowsandcolumns,weobtainanequivalentmatrix,showninFigure9.5(b).Thislastmatrixrevealsmorestronglythanthefirstonethatthereareindeedsubgroupsamongtheworkers.Althoughwehaveonlyvisualizedgroupboundaries,formalmethodswillindeedrevealthatsuchgroupscanbeidentified.WhatwehaveshowninFigure9.5isknownasblockmodeling,whichwasoneoftheearliertechniquesforidentifyingsubgroups.Moretechniqueswereeventuallyde-velopedtoallowforsometimessophisticatedclusteringofnodes(seealsoPorteretal.[2009]).Itwasnotuntilthe1950sthatresearchersstartedtalkingmoresystem-aticallyaboutnetworksandwouldstartusinggraph-theoreticalconceptstoexpressstructuralaspectsofnetworks.Therelationshipbetweenso-ciogramsandthemorerigorousapproachimpliedbytheuseofmathemat-icswasthusgraduallyintroduced.However,itwouldtakeatleastanotherdecadeuntilthetiesbetweensocialnetworksandmathematicshadcome229 (a)(b)Figure9.5:(a)TheadjacencymatrixofthenetworkfromFigure9.1,and(b)thesamematrixafterreorderingrowsandcolumns.From[deNooyetal.,2005].tosubstantialstrength.OfparticularinfluencewastheworkbyMarkGra-novetteronwhathecalledweakties:linksbetweendifferentsocialclustersthatprovedtobeessentialforinformationdissemination,andthusreach-ingouttoothergroupsthanone’sown[Granovetter,1973].UnderstandingGranovetter’sworkrequiredamathematicalapproachtosocialnetworks.Socialnetworkanalysisevolvedsteadilyeversincethen,andmanyrig-oroustechniqueshavebeendeveloped.Wehavenowreachedanewpoint.Asmentioned,sociologistsdevelopedvariousmodelsonhowgroupsofpeopleorganizethemselves.Oneparticularfamousoneisthesmall-worldorganization,whichwediscussedinChapter7.Theproblemthatresearchersfacedwashowtovalidatethosemodels:settingupsociologicalexperimentswithmanyparticipantsisfarfromtrivialasMilgramexperiencedinthelate1960s(recallthatwediscussedMilgram’sexperimentsinChapter7).Withonlinecommunities,researcherssuddenlyhavetremendoussociolog-icaldatasetsintheirhands.Aswewillalsodiscussinthischapter,wecanapplysimilaranalysestothesesetsnotonlytovalidatemodelsofhowso-cialnetworksevolveorhowtheyarestructured,butalsotodiscovernewpropertiesthatareinherentlytiedtothesizeofanetwork.AsarguedbyKleinberg[2008],itisequallyimportantthattheanalysisoftheseonlinesocialcommunitieswillperhapsputusinamuchbetterpositiontodeviselarge-scaledistributedcomputersystemssuchasthefullydecentralizedpeer-to-peersystemsdiscussedinChapter8.Wearealreadyseeingbettersearchstrategiesthatarebasedongroupingpeersbyanotionofsimilarity,andmanyotherphenomenarelatedtosocialnetworking.230 Sociogramsinpractice:ateacher’saidLetusconsideranexampleofasociogram.Oneparticularuseofsociogramsisinclassroomsallowingateachertoobtainbetterinsightinthesocialstruc-tureoftheclass.Insuchcases,eachchildmaybeaskedtolistthethreepersonsheorshelikesthemost(knownasapositivenomination)ortheleast(i.e.,negativenominations).AnexampleisshowninFigure9.6,whichisbasedonmaterialfromSherman[2000].Anentry(i,j)marked“+”indi-catesthatchildilikedchildj,whereasa“”indicatesthatidislikedj.SexID123456789101112131415161718192021222324F1+++M2+++F3+++F4+++F5+++F6+++M7+++F8+++M9+++M10+++M11+++F12+++F13+++F14+++M15+++F16+++M17+++M18+++M19+++F20+++F21+++M22+++M23+++M24++++2414214010883146307602324201044049111231207610433Figure9.6:Dataonthethreemostlikedordislikedclassmates.Whenconsideringonlythepositivenominations,weobtainthesocialnetworkshowninFigure9.7(a).Inthiscase,boysarerepresentedbyblack-coloredverticeswhereasgirlsareshownaswhite-coloredvertices.Wein-stantlyseethatthetwogroupsaremoreorlessseparated:boysandgirlseachtendtoformtheirownsubgroup,asisfurtherillustratedafterreorder-ingtheadjacencymatrix,showninFigure9.7(b).Thereareotherissuesthatmakethisaninterestingcase.Forexample,bysimplyconsideringthedistributionofindegrees,onecangetanimpres-sionofthepositionofcertainchildren.Inthiscase,weshouldalsocon-siderthenegativenominationsasgivenbyFigure9.6.Weseethatchildren#11and#12areverypopular(havingveryhighindegreesforthepositive231 102315191721122971143241251820621161384(a)134568121314162021279101115171819222324(b)Figure9.7:(a)Thesociogramforpositivenominationsrepresentedasadirectedgraph.Boysarerepresentedbyblack-coloredvertices;girlsbywhite-coloredver-tices.(b)Afterreorderingtheadjacencymatrix,thetwosubgroupsbecomemoreapparent.232 nominations),whereas#10and#21areveryunpopular.Thereismuchcon-troversyregardingchild#19(andtoalesserextent#20),whoreceivedrel-ativelymanypositiveandnegativenominations.Therearealsoneglectedchildren,namelythosewhoarenotmentionedatall(children#8and#18).Letusconcentratesomewhatmoreonwhoisimportantandwhoisnotbyconsideringthelargeststronglyconnectedcomponentofourclassroomgraph.Thiscomponentconsistsofallchildrenexcept#3,#6,#8,#10,#18,#21,#22,and#24.TheeccentricityofamemberwasdefinedinChapter6asthemaximumdistanceofthatmembertoanyothermember.Foroursubgroup,weobtain:Child:1245791112Eccentricity:56647775Child:1314151617192023Eccentricity:63656546Interestingly,child#14isclosesttoanyotherchild,whereasthepopularonesdonotreallydifferentiatefromtheothers.WhenreconsideringFig-ure9.7(a),wecanseethatchild#14isoneofthefewchildrenwhonomi-natedaboy(#7)andagirl(#20).Toseetowhatextentachildisclosetoeveryothermemberofthegroup,wecomputetheclosenessvalues:Child:1245791112Close:0.0230.0210.0180.0250.0180.0180.0180.022Child:1314151617192023Close:0.0180.0300.0210.0210.0210.0250.0250.021However,aswehavearguedbefore,closenessmaynotalwaysbeagoodindicatorofimportance.Forexample,ifchild#14wasremovedfromtheclass,howharmfulwouldthatbeforpassingoninformation?Infact,itturnsoutthatbecause#14isreallynotthatwellconnected,shealsodoesnotplayacrucialroleinthesematters.Sociologistshaveintroducedbe-tweennesscentralityasanindicatorforimportance.AsexplainedbeforeandinChapter6,thismetrictakesintoaccountwhetherornotavertexislyingontheshortestpathbetweentwoothervertices.Ifwecomputethebe-tweennesscentralityforeachofourgroupmembers,wegetthefollowingvalues:Child:1245791112Betweenness:0.1400.1530.0500.1050.0830.0070.1550.220Child:1314151617192023Betweenness:0.0160.0540.0830.1400.0170.4660.4690.029233 Theresultsareinteresting:withoutdoubtchildren#19and#20playcrucialroleswhenitcomestoconnectingthetwogroupsofboysandgirls,andthusinpassinginformationbetweenthetwosubgroups.Indeed,ifwewouldremoveeitheronefromthesubgroup,itwouldfallapartinthesensethatwewouldnolongerhaveastronglyconnectedcomponent.9.2SomebasicconceptsNowthatwehavegivenanoverviewofsocialnetworksandatypicalex-ampleofhowtheycanbeapplied,let’stakeastepfurtherandconsiderafewofthemoreimportantconceptsinsocialnetworkanalysisandhowtheseconceptsrelatetothetheoreticalframeworkofferedbygraphs.Inourdiscussion,welargelyfollowthestructureaspresentedbyWassermanandFaust[1994].CentralityandprestigeAswehavementioned,identifyingimportantsocialentitiesformsarecur-ringtopicinsocialnetworkanalysis.Uptothispointwehaveintroducedthefollowingmetricstoassistinfindingthoseentities:Vertexcentrality:Ametricthattellsustowhatextentavertexisatthecenterofagraph,byconsideringitsmaximumdistancetoallothervertices.Typically,vertices“attheedge”ofthenetworkaregenerallyconsideredlessinfluentialthanthoseatitscenter.Closeness:Thismetricconsidersthecentralityasmeasuredbythedistancetoeachothervertexinthegraph.Thehigherthevalue,thecloseravertexistoeveryothervertex.Betweennesscentrality:Thisimportantmetricdefinescentralityofaver-texubyconsideringthefractionofshortestpathsthatcrossu.Themoresuchpaths,themoreimportantuistobeconsidered.Allofthesemetricsshouldbeconsideredwithcare,asweillustratedintheprevioussectionwithourclassroomexample.Forinstance,wesawthatapopularpersonmaynotbetheonethatismostefficientforspreadinginformation.Notefurtherthatthesemetricscanbedefinedfordirectedaswellasundirectedgraphs,astheyareallbasedonanotionofdistancebetweenvertices.However,whenconsideringdirectedgraphs,itisusefultomakeadistinctionbetweenthedistancetoothernodes(asonewoulduseformea-suringcentrality),andthedistancefromothernodes.Inparticular,ifwewanttoindicatetheprestigeofavertexu,countinghowmanyothervertices234 refertouasametricforprestigeseemstomakesense.Inparticular,wehave:Definition9.1:LetDbeadirectedgraph.Thedegreeprestigepdeg(v)ofavertexv2V(D)isdefinedasitsindegreedin(v).Onecanarguethatdegreeprestigeisarathercrudemetricasitconsid-ersonlydirectrelationships,namelytheverticesthatareadjacenttov.Amoresubtlewayofmeasuringprestigeistoalsoconsidertheverticesthatcanreachvthroughadirectedpath.Insociologicalterms,theseverticesarecalledv’sinfluencedomain.Inthatcase,wecancomputetheaveragedistancetovertexvoftheverticesinitsinfluencedomain,leadingtothefollowingdefinition.Definition9.2:LetDbeadirectedgraphwithnvertices.TheinfluencedomainR(v)isthesetofverticesfromwherevcanbereachedthroughadirectedpath,thatis,R(v)def=fu2V(D)jexistsa(u,v)-pathg.Theproximityprestigepprox(v)ofavertexvisdefinedasdefjR(v)j/(n1)pprox(v)=åu2R(v)d(u,v)/jR(v)jwhered(u,v)denotesthelengthoftheshortest(u,v)-pathinD.Notethatforproximityprestigeweconsider(1)thefractionofallverticesthatcaninfluencev(andexcludev),i.e.,jR(v)j/(n1)and(2)theaveragedistanceofthoseverticestov.Note9.1(Mathematicallanguage)Thedefinitionofproximityprestigemaynotbeinstantlyobvious,forwhichreasonitisimportanttomakesurethatyouunderstandwhatitmeans.Thedefinitionisalsoagoodexampletoillustratetheprecisionofmathematicsoveramoreverbalexplanation.First,itisimportanttorealizewhyweareconsideringthefractionofin-fluentialvertices,i.e.,jR(v)j/(n1).Indoingso,proximityprestigecanbeexpressedindependentofthesizeofagraph,whichisobviouslyanadvantageasitallowsustomoreeasilycomparedifferentnetworks.ItshouldalsobeclearwhywedividejR(v)jbyn1andnotn:becausewedonotconsideravertextobeinitsowninfluentialdomain,thereareatmostn1verticeswhocan.Second,ifwearegoingtoconsiderthefractionofinfluentialvertices,weshouldalsoconsidertheaveragedistanceofthoseverticestovandnotjustmerelythetotaldistance.Again,thismethodofmeasurementallowsustobettercomparegraphs.235 Finally,notethatproximityprestigeisalwaysavaluebetween0and1.Tothisend,wefirstrewriteitsdefinitionto:jR(v)j2/(n1)pprox(v)def=åu2R(v)d(u,v)sothatwecanmoreeasilyconsiderthecasewheretherearenoverticesinv’sinfluentialdomain.Inthatcase,jR(v)j=0,andsoispprox(v).Attheotherendofthespectrumisthesituationthatwecanreachvfromeveryvertex,butmoreover,eachoneisanin-neighborofv.WethenhavethatjR(v)j=n1andåu2R(v)d(u,v)=n1.Asaconsequence,weseethatpprox(v)=1.Let’sreconsiderourclassroomexampleandtakealookatproximityprestigewithinthelargeststronglyconnectedcomponent.Wemakethefollowingassumption:ifchildihaspositivelynominatedchildj,thenthebehaviorofchildjwillaffectchildi.Inotherwords,thedirectedgraphofpositivenominationscanbeseenasadirectedgraphofwhoinfluenceswhombysimplyreversingtheorientationofeacharc.Usingthisreversedorientation,Figure9.8showsthedistancebetweenpairsofvertices,i.e.,acell(i,j)givestheshortestdistancefromvertexjtovertexi.ThesedistanceshavebeencomputedusingthedirectedgraphobtainedbyreversingtheorientationofthegraphfromFigure9.7.Thevariousvaluesforproximityprestigeliequiteclosetoeachother,butagainweseethatchildren#19and#20havethehighestscore.Consid-eringthatthesetwoalsohadthehighestbetweennesscentrality,thesocialpictureisbecomingconsistentlyclear.Oneoftheproblemsthatsocialscientistshavebeenstrugglingwithisthatthemetricswehavebeendiscussingsofarconsiderimportancewith-outtakingintoaccounttheimportanceofthenominatingvertex.Inpartic-ular,itseemsreasonabletorankapersonhigherwhenthatpersonhasbeennominatedbyanotherhighlyrankedperson.NotethatthisisanalogoustothePageRankmetricdiscussedinChapter8.Theideaasusedinsocialnetworksisquitesimpleandbringsustothefollowingdefinitionofrankedprestige:Definition9.3:ConsiderasimpledirectedgraphDwithvertexsetf1,2,...,ng!withadjacencymatrixA(i.e.,A[i,j]=1ifandonlyifthereisanarchi,ji).Therankedprestigeofavertexkisdefinedas:np(k)defA[i,k]p(i)rank=åranki=1,i6=k236 DistancefromjtoiID12457911121314151617192023pprox(v)104314531223242140.366240452123561331220.341425035641144153250.294515205642114253250.313751560214672422310.294951561014672422320.2941151562204671412320.2941214224530333142140.3571325135641044153250.2941414314532203242140.3661542451323560321210.3411624134531243042140.3491742453313562301210.3331932342312451220120.4052023233421342131030.4052342453313562311200.333Figure9.8:Computingtheproximityprestigefortheclassroomexample.Eachcell(row,column)denotesthedistancefromcolumntorow.Notethatinordertocomputeprank(k),weneedtocomputetherankedprestigeofeveryvertex.Fortunately,theaboveequationisoneofatotalofn(oneforeachvertex),givingrisetoasetofnequationsinnunknowns.Stan-dardmathematicaltechniquescanbeappliedtosolvetheseequations,al-thoughforevenrelativelysmallvaluesofn,usingsoftwarepackagescomesinhandy.Toillustratetheprinciple,letusconsiderasmallsocialnetworkwithonlythreepeopleA,B,andC.Eachpersonisaskedtogiveaweight0w1totheothertwo,expressingtherelativepreferenceofonepersonovertheother.So,forexample,ifAprefersBoverC,shemayexpressthisbyassigningaweightof0.7toBand0.3toC.Likewise,ifBhasnopref-erenceforeitherAorC,heshouldassignaweightof0.5tobothofthem.Notethatthetotalweightthatapersoncanassigntotheothersisalwaysequalto1.Let’sassumethattheweightshavebeenassignedasfollows:IDABCA—0.50.4B0.1—0.6C0.90.5—237 whereweusethesamenotationasinFigure9.8:cell(i,j)denotestheweightassignedbypersonjtopersoni.Wenowneedtosolvethefollowingequa-tions:prank(A)=0.5prank(B)+0.4prank(C)prank(B)=0.1prank(A)+0.6prank(C)prank(C)=0.9prank(A)+0.5prank(B)Tosimplifyournotationabit,weusethevariablesx,y,andzinplaceofprank(A),prank(B),andprank(C),respectively.Thisthenleadsto:x=0.5y+0.4z(1)y=0.1x+0.6z(2)z=0.9x+0.5y(3)Ifwewouldtrytosolvethissetofequations,wewouldfindonlydepen-denciesbetweenx,y,andz.Thisiscausedbythefactthatwerequirethatthesumofthevaluespercolumnisalways1.Inparticular,bysubstituting(2)into(3),wefindthatz=19x.Likewise,bysubstituting(3)into(2),we14findthaty=32x.Itiscommonpracticetoensurethat35q2åprank(i)=1whichinourexamplewouldmeanthat2221932x+x+x=11435which,inturn,leadsto:x=0.52y=0.48z=0.71ThesevaluesnowexpresstherankedprestigeofA,B,andC,respectively.Note9.2(Moreinformation)Whatwehaveactuallybeendoingiscomputingwhatisknownasaneigen-vector.Toexplain,letWdenotethematrixofnonnegativeweightsassignedbetweenn>1people,suchthatW[i,j]istheweightassignedbypersonjtoi.Asinourexample,werequirethatforeachpersonj,åNW[i,j]=1andthati=1W[j,j]=0.Letpbethevectorofrankedprestiges:p(p1,p2,...,pn)def=(prank(1),prank(2),...,prank(n))Usingtheabbreviationwij=W[i,j],weneedtosolvethesetofequationsw11p1+w12p2++w1npn=p1w21p1+w22p2++w1npn=p2......wn1p1+wn2p2++wnnpn=pn238 whichcanbemoreconciselywritteninmatrixformas010101w11w12w1np1p1BBw21w22w2nCCBBp2CCBBp2CCB...CB.C=B.C@......A@..A@..Awn1wn2wnnpnpnor,equivalentlyWp=pInmathematicalterms,pistheeigenvectorthatcorrespondswiththeeigen-pvalue1.Asmentionedabove,wegenerallyrequirethatå(pi)2=1,sothatwecanoftenfindauniquesolutionforaneigenvector.Forsocialnetworkanal-ysis,thiseigenvectorcorrespondstotherankedprestiges.Ingeneral,eigenvectorsarecomputedbyfirstfindingsolutionstothemoregeneralequationWp=lpwithlbeingascalar.Severalsolutionsmayexist,eachknownasaneigenvalue.Inourcase,becausewedemandthatåiwij=1,onecanshowthatthelargesteigenvalueisl=1.Wewillnotgointothismaterialanyfurther.Agoodintroductioncanbefoundin[Williams,2001].Letusfinallyseehowwecancomputetherankedprestigeforeachofthechildreninourclassroomexample.Again,weconcentrateonthestronglyconnectedcomponent,consistingof16children.Weneedtoconstructamatrixthatreflectstheweightthatchildjassignstochildi.Wefollowtwoapproaches.First,weconsiderthepositivenominationsandassignanequalweighttoeachnominationgivenbythesamechild.Inotherwords,ifAhasnominatedthreeotherchildren,weassumethateachofthesethreehasthesameinfluenceonA.FromFigure9.8,wecanseenthateachchildwithinthestronglyconnectedcomponentnominatesexactlythreeotherchildreninthesamecomponent,sothateveryweightisequalto1.Inthatcase,the3rankedprestigeturnsouttobeasfollows:Child:1245791112Rankedpres.:0.1480.1710.1320.0560.1230.0570.3320.369Child:1314151617192023Rankedpres.:0.0620.0180.3130.3320.1790.4330.4340.205Oursecondapproachentailsthedistancebetweenchildren.Inpartic-ular,reconsiderthegraphrepresentingthepositivenominationsshowninFigure9.7.Wenowtakethedistancefromchilditochildj(inthisgraph)asanindicationofthehowhighlyiranksj.Inparticular,thelargerthe239 distance,thelowertheranking.LetMbethemaximumeccentricitybe-tweentwochildreninthelargeststronglyconnectedcomponent.Fromourpreviousobservations,weknowthatM=7.Ifd(i,j)denotestheshortestdistancefromchilditoj,wedefinetheweightwijthatiassignstojas:defMd(i,j)wij=åMd(i,j)j2R(i)Usingtheseweights,wecanthencomputetherankedprestigesas:Child:1245791112Rankedpres.:0.2400.2530.2300.1870.2380.1980.2860.282Child:1314151617192023Rankedpres.:0.1950.1340.2820.2790.2450.3150.3110.252Beforewecometoconclusions,wesummarizeourfindingsfortheclass-roominFigure9.9.Wealsoshowthenormalizedvalues,obtainedbydi-vidingthemeasuredimportancebythefoundmaximumimportanceforaspecificmetric.Whatweseeisthatdifferentmetricsleadtosometimesverydifferentresults.Forexample,therelativeimportanceofchildren#4and#5dependsonwhichmetricweuse:inthecaseofbetweenness#5ismoreim-portantthan#4,butthischangeswhenrankedprestigeasmetric.Further-more,itappearsthatrankedprestigegenerallyleadstoagreatervariation(whichisgood).Allmetricsshowtheimportanceofchildren#19and#20.StructuralbalanceAsstatedbyWassermanandFaust[1994],afirstimportantresultfromso-cialnetworkanalysiswasthetheoryofstructuralbalance.Thetheorycon-sidersthesentimentrelationshipsbetweenpeoplewithinagroup,whicharecommonlymodeledaspositiveofnegative.Inparticular,thetheoryisconcernedwithexaminingwhethertherelationshipsbetweenpeoplearesuchthatthegroupasawholecanbeconsideredstable,orinbalance.Initssimplestform,thetheoryconsiderstriads,thatis,groupsofthreepeople.WebrieflydiscussedtriadsandbalanceinSection9.1andwillconsideritinmoredetailhere.Letusfirststartwithpreciselydefiningbalance.Tothisend,weneedthedefinitionofasignedgraph:Definition9.4:AsignedgraphisasimplegraphGinwhicheachedgeislabeledwitheitherapositive(“+”)ornegative(“”)sign.Wedenotethesignofanedgeeassign(e).240 ChildEccentricityClosenessBetweennessProximityprestigeRankedprestige1Rankedprestige215(0.714)0.023(0.767)0.140(0.299)0.366(0.904)0.148(0.341)0.240(0.762)26(0.857)0.021(0.700)0.153(0.326)0.341(0.842)0.171(0.394)0.253(0.803)46(0.857)0.018(0.600)0.050(0.107)0.294(0.726)0.132(0.304)0.230(0.730)54(0.571)0.025(0.833)0.105(0.224)0.313(0.773)0.056(0.129)0.187(0.594)77(1.000)0.018(0.600)0.083(0.177)0.294(0.726)0.123(0.283)0.238(0.756)97(1.000)0.018(0.600)0.007(0.015)0.294(0.726)0.057(0.131)0.198(0.629)117(1.000)0.018(0.600)0.155(0.330)0.294(0.726)0.332(0.765)0.286(0.908)125(0.714)0.022(0.733)0.220(0.469)0.357(0.881)0.369(0.850)0.282(0.895)136(0.857)0.018(0.600)0.016(0.034)0.294(0.726)0.062(0.143)0.195(0.619)143(0.429)0.030(1.000)0.054(0.115)0.366(0.904)0.018(0.041)0.134(0.425)156(0.857)0.021(0.700)0.083(0.177)0.341(0.842)0.313(0.721)0.282(0.895)165(0.714)0.021(0.700)0.140(0.299)0.349(0.862)0.332(0.765)0.279(0.886)176(0.857)0.021(0.700)0.017(0.036)0.333(0.822)0.179(0.412)0.245(0.778)195(0.714)0.025(0.833)0.466(0.994)0.405(1.000)0.433(0.998)0.315(1.000)204(0.571)0.025(0.833)0.469(1.000)0.405(1.000)0.434(1.000)0.311(0.987)236(0.857)0.021(0.700)0.029(0.062)0.333(0.822)0.205(0.472)0.252(0.800)Figure9.9:Summaryoftheimportancemeasuresfortheclassroomexample,withthenormalizedvaluesshownbetweenbrackets.Asignedgraphcanbeundirectedordirected.ForasignedgraphG,wewillusethenotationE+(G)todenotethepositive-signededgesandE(G)fornegative-signededges.ThecommoninterpretationofapositivelysignededgebetweenverticesAandBisthatthetwopeoplerepresentedbytheverticeslikeeachother.Analogously,anegativesignistobeinterpretedasthattheydislikeeachother.Inthecaseofasigneddirectedgraph,thelikenessneednotbesym-!metric.IfAlikesB,thenthisisrepresentedbyapositivelysignedarchA,Bi.AnegativelysignedarcfromAtoBmeansthatAdislikesB.Theabsenceofanarc(oredgeinthecaseofanundirectedgraph)impliesthattwopeo-pleneitherlikenordislikeeachother.Inthefollowing,wewillconcentrateonlyonundirectedsignedgraphs.InFigure9.4wediscussedhowthevariouscombinationsoflikinganddislikingbetweenpeopleinatriadwouldleadtoan(im)balancedsituation.Itcanbereadilyseenthatthebalancedsituationofatriadoccursifand241 onlyiftherearezerooranevennumberofnegativesignededges.Thisobservationisgeneralizedasfollows:Definition9.5:ConsideranundirectedsignedgraphG.Theproductoftwosignss1ands2isagainasign,denotedass1s2.Itisnegativeifandonlyifexactlyoneofs1ands2isnegative.ThesignofatrailTistheproductofthesignsofitsedges:sign(T)=Pe2E(T)sign(e).Notethattheeffectofmultiplyingsignscanbeeasilyunderstoodifwesub-stitute+1for“+”and1for“.”Note9.3(Mathematicallanguage)Bynow,youshouldbeusedtothefactthatfromtimetotimenewmathematicalsymbolsfindtheirwayintothetext.Inthepreviousdefinition,wehaveusedthesymbol“P”asanabbreviationformultiplication,analogouslytousingthesummationsign“å.”Inparticular,wehavePnxdefi=1i=x1x2xnNote9.4(Mathematicallanguage)Thedefinitionoftheproductofasignisacrudeexampleofhowmathemati-ciansdefinewhatareknownas(abstract)algebras.Algebrastellushowwecanmanipulateconceptssuchassigns,byprovidingbasicrulesconcerning,forexample,additionormultiplication.Inthecaseofsigns,weareinterestedonlyinmultiplications.Addingmoreprecision,wecouldhavealsoincludedthefollowingrules:Commutative:s1s2=s2s1Associative:(s1s2)s3=s1(s2s3)NotefurthermorethatthesignI=“+”actsasanidentity,i.e.,forallsignss,wehavethatIs=sI=s.Thissameroleofidentityisplayedbythenumber“1”inourusualnumberingsystems.Apath(orcycle)ispositiveifithaszerooranevennumberofnegative-signededges.Anegative-signedpath(orcycle)isonethatisnotpositive.Weleaveitasanexercisetoprovethefollowingtheorem:Theorem9.1:ConsideranundirectedsignedgraphG.ForanytrailTofGande2E(T),sign(T)=sign(e)sign(Te).Withthesedefinitionsathand,wecannowconsiderwhensociogramsthatarerepresentedassignedgraphsarebalanced:242 Definition9.6:Anundirectedsignedgraphisbalancedwhenallitscyclesarepositive.Animportantcharacterizationofabalancedgraphisthatitsvertexsetcanbepartitionedintotwosubsetssuchthatalledgesbetweenthetwosubsetshavenegativesign,andnootheredges.Inotherwords,agroupofpeopleisbalancedifitcanbesplitintotwosubgroupssuchthatmembersofthesamesubgrouplikeeachother,yetmembersofdifferentgroupsdislikeeachother(ordon’tcare).ThischaracterizationwasformallyprovenbyHarary[1953],andisformalizedbythefollowingtheorems.Theorem9.2:AnundirectedsignedcompletegraphGisbalancedifandonlyifV(G)canbepartitionedintotwodisjointsubsetsV0andV1suchthateachnegative-signededgeisincidentwithavertexfromV0andonefromV1,andeachpositive-signededgeisincidentwithverticesfromthesameset.Inotherwords:E(G)=fhx,yijx2V0,y2V1gE+(G)=fhx,yijx,y2V0orx,y2V1gProof.AssumethatGisbalanced.Letu2V(G)andletN+(u)consistofallverticesadjacenttouthroughapositive-signededge.SetVfug[N+(u)andVV(G)nV.010Considertwoverticesv0,w02V0,otherthanu.Becausetheedgeshu,v0iandhu,w0ihavepositivesigns,andbecauseGisbalanced,wemustalsohavethathv0,w0ihasapositivesign(notethatedgehv0,w0iexistsbecauseGisacompletegraph).Likewise,consideranytwoverticesv1,w12V1.Again,becauseGisbalanced,weknowthatthetriadwithverticesu,v1,w1mustbepositive,andbecauseedgeshu,v1iandhu,w1ihavenegativesigns,edgehv1,w1imusthaveapositivesign.Finally,considertheedgehv0,v0i,whichispartofthetriadwithverticesu,v0andv1.Withthesignofhu,v0ibeingpositiveandthatofhu,v1inegative,andGbeingbalanced,edgehv0,v1imusthaveanegativesign.WeconcludethatV0andV1partitionV(G)asrequired.Conversely,assumethatE(G)andE+(G)satisfythestatedconditions.EverycycleinGcontainsanevennumberofedgesfromE(G),implyingthatthesignofeverycycleispositive.Bydefinition,Gisbalanced.Note9.5(Studytip)TheproofofTheorem9.2ismucheasiertounderstandwhenusingadrawing.Asmentionedbefore,studyinggraphtheorygenerallyrequiresyoutovisualizesituationsbysketchinggraphs.Dothesameforthisproof.243 Weleaveitasanexercisetoshowthateverysubgraphofabalancedsignedgraphisagainbalanced.Wewillneedthispropertyforthefollowingtheo-rem:Theorem9.3:ConsideranundirectedsignedgraphGandtwodistinctverticesu,v2V(G).Gisbalancedifandonlyifall(u,v)-pathshavethesamesign.Proof.FirstassumethatGisbalanced.LetPandQbetwodistinct(u,v)-paths.ConsiderthesetofedgesE0obtainedfromPandQafterremovingtheonestheyhaveincommon,thatisE0=E(P)[E(Q)nE(P)E(Q).WhatcanwesayaboutthesubgraphHinducedbyE0?Firstnotethattherecanbenocycleshavingedgesincommon.Ifthatwerethecase,thosecommonedgeswouldhavebeenpartofbothPandQ,whichbyconstructioncannothappen.Inotherwords,anytwocyclesinHhavenoedgesincommon.BecauseHisasubgraphofG,itmustalsobebalanced.Asaconsequence,allcyclesinHarepositive.Furthermore,eachcycleCinHconsistsofexactlytwosubpathsPˆfromPandQˆfromQ.Thatis,E(C)=E(Pˆ)[E(Qˆ).BecausePˆandQˆhavenoedgesincommon,andbe-causesign(C)=sign(Pˆ)sign(Qˆ)ispositive,weconcludethatthesignsofPˆandQˆmustbethesame.TakingallcyclesofHintoaccount,alongwiththeedgescommontobothPandQ,weconcludethatPandQmusthavethesamesign.Conversely,assumethat(u,v)-pathshavethesamesign.Becauseuandvhavebeenchosenarbitrarily,andbecauseeverycycleCcanbeconstructedastheunionoftwoedge-disjointpathsPandQ,wenecessarilyhavethatsign(C)=sign(P)sign(Q)mustbepositive.Hence,Gisbalanced.Combiningtheoremsnowallowsustoprovethefollowinggeneralchar-acterizationofbalancedsignedgraphs,againduetoHarary[1953].Theorem9.4:AnundirectedsignedgraphGisbalancedifandonlyifV(G)canbepartitionedintotwodisjointsubsetsV0andV1suchthatthefollowingtwocon-ditionshold:(1)E(G)=fhx,yijx2V0,y2V1g(2)E+(G)=fhx,yijx,y2V0orx,y2V1g.Proof.First,letusassumethatGisbalanced.Withoutlossofgenerality,wealsoassumethatGisconnected.Thetheoremisproventoholdbyinduc-tiononthenumbermofedgesofG.Clearly,thetheoremisseentoholdforthecasethatm=1,soassumeitholdsform>1.Consideranytwo244 nonadjacentverticesuandvofG.Fromtheprevioustheorem,weknowthatall(u,v)-pathshavethesamesign.Therefore,extendGbyaddingtheedgee=hu,viwiththesamesignasany(u,v)-pathinG,leadingtothenewgraphG=G+e.AnynewlyintroducedcycleCinGwillconsistofeanda(u,v)-pathPfromG.Becausesign(C)=sign(e)sign(P),andsign(e)=sign(P),Cmustbepositive,andthustheextendedgraphisalsobalanced.ContinueinthiswaywithaddingedgesbetweennonadjacentverticesuntilwehaveasignedcompletegraphG,whichweknowisbal-anced.FromTheorem9.2,itfollowsthatwecanpartitionthevertexsetofG,andthusalsoGintothetworequiredsubsets.Conversely,assumewecanpartitionGintotwosubsetsV0andV1asdescribed.ExtendGbyaddinganedgee=hu,vibetweentwononadjacentvertices,leadingtoG=G+e.Ifuandvlieinthesamesubset,sign(e)becomespositive,otherwisenegative.ContinueinthiswayaddingedgesuntilwehaveasignedcompletegraphG.Again,fromTheorem9.2weknowthatthisgraphisbalanced,andbecauseGisasubgraphofG,weknowGisalsobalanced.Withthischaracterization,itisnowrelativelyeasytocheckwhetherasignedgraphisbalanced.Thefollowingalgorithmwilldothetrick.Algorithm9.1(Balancedgraphs):Consideranundirected,connectedsignedgraphG.Foranyvertexv2V(G),denotebyN+(v)thesetofverticesadjacenttovthroughapositive-signededge,andbyN(v)thesetofverticesadjacentthroughanegative-signededge.LetIbethesetofinspectedverticessofar.1.Selectanarbitraryvertexu2V(G)andsetV0fugandV1Æ.SetIÆ.2.Selectanarbitraryvertexv2(V0[V1)nI.Assumev2Vi.•Forallw2N+(v):ViVi[fwg.•Forallw2N(v):V(i+1)mod2V(i+1)mod2[fwg.•Also,II[fvg.3.IfV0V16=Æstop:Gisnotbalanced.Otherwise,ifI=V(G)stop:Gisbalanced.Otherwise,repeatthepreviousstep.Note9.6(Mathematicallanguage)Forthepreviousalgorithmwehaveusedaconcisenotationthatmayrequiresomeefforttounderstand:V(i+1)mod2V(i+1)mod2[fwg.245 NotethatwehaveassumedthatthearbitrarilyselectedvertexvisinsetVi.Asaconsequence,whenv2V0,V(i+1)mod2isequaltoV1,whereasforv2V1,weseethatV(i+1)mod2isequaltoV2mod2=V0.Inotherwords,V(i+1)mod2referstotheothersetthantheonecontainingv.Toseewhythisalgorithmiscorrect,firstnotethatifmaybepossibleforavertextobeaddedtoV0andlateralsotoV1(orviceversa).Wheneverthishappens,wewillnotbeabletopartitionthevertexsetanymoreasisrequiredforasignedgraphtobebalanced.Instep3,wewilldecidetostopinspecting(uninspected)verticesfromV0[V1ifthetwosetsarenotdisjointanymore,oruntileachvertexhasbeenplacedineitherV0orV1,atwhichpointitmustbethecasethatV0V1=Æ,sothatGisindeedbalanced.CohesivesubgroupsGivenasocialnetwork,researchershavealwaysbeenkeenonidentifyinggroupsofcloselyboundpeople,orbetterknownascohesivesubgroups.Typicalexamplesofsuchgroupsinpracticeareformedbyfamiliesandfriends.Morerecent,interesthasgrowninidentifyinggroupsof,forex-ample,terrorists.Andalthoughitseemsnaturallyevidentwhatacohesivesubgroupactuallyentails,formalizingtheconceptingraphtheorysuchthatitmatcheswhatoneexpectsinreallifeislessobvious.Letustakealookatafewproposals(seealso[Mokken,1979]).Oneoftheearliestproposalsformodelingcohesivesubgroupswastoconsider(maximal)cliques:Definition9.7:ConsideranundirectedsimplegraphG.A(maximal)cliqueofGisacompletesubgraphHofatleastthreeverticessuchthatHisnotcontainedinalargercompletesubgraphofG.Acliquewithkverticesiscalledak-clique.Notethatagraphcanhaveseveralcliques.Consider,forexample,thegraphinFigure9.10.Inthiscase,weseethattherearetwocliques:the3-cliquein-ducedbythesetofverticesf2,4,5gandthe4-cliqueinducedbyf1,2,3,5g.Thisexamplealsoshowsthatavertexmaybecontainedintwodifferentcliques.Theproblemwithusingcliquesasameansformodelingcohesivesub-groupsisthattheyaregenerallytoorestrictive.Inthefirstplace,manysubgroupsexistinrealityinwhichnotallmembersrelatetoeachother.Intermsofgraphs,thismeansthatthatasubgroupcannotalwaysbead-equatelyrepresentedbyacompletesubgraph.Relatedtothisstrictnessisthatbyconsideringonlycliques,itturnsoutthatonlysmallsubgroupscanbeidentified.Consideringthatinmanycasessociogramsarebasedonques-246 54123Figure9.10:Agraphwithtwomaximalcliques.tionnairesinwhichpeopleareaskedtoidentifytheirkbestrelations,wealsoseethatthedegreeofavertexcanneverbemorethank,andthusthatamaximalcliquecanhaveonlyk+1members.Withsuchrestrictions,itmayevenbeimpossibletoidentifyanyclique.Forthesereasons,researchershavebeenlookingforothermetricsfordefiningsubgroups.Oneapproachistorelaxhowstrongthebondsbe-tweenmembersofasubgroupshouldbe.Inparticular,onecanalsode-fineasubgroupasthemaximalsubgraphinwhichthedistancebetweenitsmembersislessorequaltoaconstantk.Thisleadstowhatareknownask-distance-cliques:Definition9.8:LetGbeanundirectedsimplegraph.Ak-distance-cliqueofGisamaximalsubgraphHofGsuchthatforallverticesu,v2V(H),thedistancedG(u,v)k.(Wehaveintroducedthistermtoavoidconfusionwithk-cliques.Note,however,thatk-distance-cliquesareoftenalsoreferredtoask-cliques[Scott,2000;WassermanandFaust,1994].)Itisimportanttonotethatthedistancebetweentwoverticesinak-distance-cliqueismeasuredrelativetotheorig-inalgraphG,asisindicatedbythenotationdG(u,v).Thismeansthattwoverticesuandvinak-distance-cliqueHmaybeconnectedthroughashort-estpathinHthatislongerthanashortest(u,v)-pathinG.Thisimpliesthatthediameterofak-distance-cliquemaybelargerthank,whichissomewhatcounter-intuitive.Anotherproblemwithk-distance-cliquesisalsocausedbythefactthatdistanceismeasuredwithrespecttotheoriginalgraph:itispossibletoconstructagraphinwhichak-distance-cliquemaybediscon-nected(seeexercises).Toensurethatthediameterofasubgraphmatchesone’sintuition,Mokken[1979]proposedk-clans:Definition9.9:LetGbeanundirectedsimplegraph.Ak-clanofGisak-distance-cliqueHofGsuchthatforallverticesu,v2V(H),thedistancedH(u,v)k.Theonly,yetimportant,differencewithk-distance-cliquesisthatdistance247 ismeasuredrelativetoHinsteadofG.Bydefinition,everyk-clanisalsoak-distance-clique.Ifwetakethediameterasthesolecriterion,weobtainwhatareknownask-clubs:Definition9.10:LetGbeanundirectedsimplegraph.Ak-clubofGisamaximalsubgraphHofGsuchthatdiam(H)k.Inotherwords,maxfdH(u,v)ju,v2V(H)gk.Wewillshowthateveryk-clanofagraphGisalsoak-clubofG.However,noteveryk-clubisalsoak-clan,ascanbeseenfromFigure9.11.Inthisexample,wehavetwo2-distance-cliques:H1=G[f1,2,3,5,6g]andH2=G[f2,3,4,5,6g].H2isalsoa2-club,aswellasa2-clan.Inaddition,bothH3=G[f1,2,5,6g]andH4=G[f1,2,3,6g]are2-clubs,butneitherare2-distance-cliques,andthusarenot2-clans.231465Figure9.11:Graphillustratingcliques,clans,andclubs.Nowconsiderak-clubHofagraphG.Becauseforallverticesu,v2V(H),weknowthatdG(u,v)dH(u,v)k,Hmustbecontainedinak-distance-cliqueofG.Weusethispropertytoprovethefollowing:Theorem9.5:Everyk-clanofagraphGisalsoak-club.Proof.Fromthedefinitionsofk-clanandk-club,onecaneasilyseethatforak-clanHwecertainlyhavethatforallverticesu,v2V(H),dH(u,v)k.Therefore,wemerelyneedtoshowthatHisalsomaximalwithrespecttothedefinitionofak-club.Tothisend,assumethatHisnotmaximal.ThismeansthatthereisasetofverticesSV(G)nV(H)suchthatforallu2V(H)ands,t2S,wehave:dG(u,t)dH(u,t)kanddG(s,t)dH(s,t)kwhereH=G[V(H)[S].However,becauseHisalsoak-distance-clique,thiswouldviolatethemaximalityofHasak-distance-clique,contradictingourassumptionoftheexistenceofS.Hence,Hisalsomaximalasak-club,completingtheproof.Therealproblemwiththesedefinitionsisthatallofthemarestillverystrictwhenitcomestoselectingwhetheravertexbelongstoagroupornot.248 Inreality,cohesivenessofagroupismuchmorefuzzy:ifAliceconsidersBobtobeherbestfriend,itmayverywellbethecasethatBob’sbestfriendChuckisconsideredbyAlicetobejustanacquaintanceofher.Inotherwords,wewouldnormallypresentalinkbetweenAliceandChuck,butthemeaningisdifferentthantheonebetweenAliceandBob.Suchrelationshipscanbecapturedthroughweightedgraphs,butthedefinitionsofcohesivegroupsdonotcaterforsuchsituations.Inthesamelight,wecouldconsideranalternativeformulationofk-cliquesbydefiningagroupbasedontheminimaldegreeofeachvertex:Definition9.11:LetGbeanundirectedsimplegraph.Ak-coreofGisamaximalsubgraphHofGsuchthatforallverticesu2V(H),thedegreed(u)k.Inotherwords,eachvertexinak-coreisjoinedwithatleastkothermem-berofthatgroup.Again,itturnsoutthatsuchadefinitionisoftenjusttoostrict:itdrawsboundariesaroundgroupsthatcannotaccountforthenatural“exceptionstotherule.”Amuchbetterapproachistofollowdata-clusteringtechniquesforiden-tifyingcommunities.AsreportedbyPorteretal.[2009],alargevarietyofolderandnewertechniqueshavebeenproposedleadingtomuchbetterre-sults.Letusdiscussonesuchmethod,knownascliquepercolation[Pallaetal.,2005].Cliquepercolationisbasedonidentifyinggroupsbasedonmaximalcliques,yetwiththeimportantdifferencethatgroupsmayoverlap.Inotherwords,verticesmaybelongtodifferentcliqueswithoutthenecessityofhav-ingamaximaldegree(asdefinedbythesizeofthecliqueitismemberof).Wecanthendefineak-cliquecommunity:Definition9.12:LetGbeanundirectedsimplegraph.Twok-cliquesC1andC2aresaidtobeadjacentiftheyhaveatleastk1verticesincommon:jV(C1)V(C2)j=k1.Ak-cliquecommunityofGisaunionofk-cliquesC=fC1,...,Cngsuchthatforeverytwok-cliquesCu,Cv2C,thereisaseries[Cu=Cu,Cu,...,Cu=Cv]inwhichCuandCuareadjacentk-cliquesofC.01mii+1Thisdefinitionisbestunderstoodbytakingalookatanexample.Let’sconsideroursocialnetworkofFigure9.1,whichweshowagaininFig-ure9.12alongwiththevarious3-cliquesandsingle4-clique.Notethatinthisexampletherearenok-cliquesfork5.Whatcanwesayabouttheadjacencyofcliques?First,itisnotdifficulttoseethatoursingle4-clique,denotedC1,isnotadjacenttoanyothercliqueforthesimplereasonthatitdoesnothaveasinglevertexincommonwithanyoneofthem.Likewise,ifweconsider3-cliquesC7andC8,weseethatSamismemberofbothofthem.However,becauseSamistheonlymemberthatissharedbetweenthetwocliques,theyarenotconsideredtobeadjacent:two3-cliquesareadjacent249 onlyiftheysharetwovertices.Forthesamereason,weseethat3-cliquesC8,C9,andC2arenotadjacenttoanyotherclique.TedRussXavierWendleC7VernUtrechtC9QuintSamC8PaulNormOzzieCarlosLannyBobAlejandroC5C6KarlDomingoC4MikeC2JohnC1HalC3EduardoGillIkeFrankFigure9.12:ThesocialnetworkfromFigure9.1,showingthevariousk-cliques.ThestoryisdifferentforcliquesC3andC4:becauseV(C3)[V(C4)=fHal,Johng,thetwoareadjacent.Infact,C3,C4,C5,andC6forma3-cliquecommunity,asshowninFigure9.13.WeseethatbesidesC3andC4thatalsoC4andC5,aswellasC5andC6arepairsofadjacent3-cliques.Theresultisthatusingthismethodofidentifyingcohesivegroups,wefindourselvesdealingwithsixcommunities:fC1g,fC2g,fC3,C4,C5,C6g,fC7g,fC8g,andfC9g.C3C4C5C6C3—fHal,JohngC3C4C5C3C4C5C6C4fHal,Johng—fBob,JohngC4C5C6C5C3C4C5fBob,Johng—fLanny,JohngC6C3C4C5C6C4C5C6fLanny,Johng—Figure9.13:A3-cliquecommunity.Everyentryshowseithertheintersectionbe-tweentwoadjacent3-cliques,orthepathof3-cliquesbetweentwononadjacentcliques.Note9.7(Moreinformation)Pallaetal.[2007]haveextendedcliquepercolationtodirectedgraphs.Intheundirectedcase,acliquerepresentsamaximalgroupinwhichallverticesare250 consideredequallyimportant.Inadirectedgraph,weneedtoaccountforthefactthatrelationsarenolongersymmetric,butthattheyreflectsomeorderingbetweenvertices.ForthisreasonPallaetal.havebeenlookingforanorderingoftheverticesintheirdefinitionofadirectedk-clique.Inthefollowing,weusethenotationuvtoindicatethatvertexuprecedesvertexvinanorderingofvertices.Definition9.13:ConsideradirectedgraphD.Adirectedk-cliqueisadirectedsubgraphHwithkverticessuchthat(1)theunderlyinggraphofHiscomplete,and!(2)thereisanorderingoftheverticesofH,suchthatifuvthenhu,vi2A(H).Toillustrate,considerdirectedacyclicgraphs,whichweencounteredinChap-ter3.Inthiscase,foradirectedcliqueH,anaturalorderingofverticescanbefoundbyconsideringtheoutdegreeofeachvertex.Inparticular,uvifu’soutdegree(inH)islargerthanv’s.Itcanbeshownthatinthiscasesuchanorderingalwaysexists.Toillustrate,Figure9.14showshowwecancometosuchanorderingadirectedacyclicgraph.2(2)!PositionVertexhu,vi2A?!1(4)11h1,5i2A5(3)!25h5,2i2A!32h2,4i2A!44h4,3i2A4(1)3(0)53irrelevant(a)(b)Figure9.14:(a)A(complete)directedacyclicgraph.Theoutdegreeofeachvertexisshownaswell.Anorderingoftheverticesisshownin(b).Toexaminedirectedsubgraphsinwhichtwoverticesuandvaremutually!!joined(i.e.,bothhu,vi,hv,ui2A(H)),wemerelyneedtoremoveonethearcsfromeitherutovorfromvtou.Inmanycases,theremainingsubgraphwillbeacyclic,inwhichcasewecanusetheorderingbasedonavertex’soutdegree.Theremayalsobecasesinwhichanorderingcannotbefound,meaningthatwearenotdealingwithadirectedk-clique.Again,twodirectedk-cliquesareconsideredadjacentiftheysharek1ver-tices.Then,usingthesedefinitions,itturnsoutthatforourclassroomexampleshowninFigure9.7(a),thedirectedcriticalpercolationmethodwillfindexactlytwodirected3-communities:oneconsistingofallthegirls,andoneconsistingofalltheboys.Noneofthemethodswehavediscussedsofarwouldhavebeencapableofcomingtosuchanidentificationofsubgroups.Furtherinformationoncriticalpercolationfordirectedgraphscanbefoundin[Pallaetal.,2007].251 AffiliationnetworksAsalastexampleofimportantconceptsinsocialnetworks,weconsiderwhatareknownasaffiliationnetworks[WassermanandFaust,1994;KnokeandYang,2008].Insuchanetwork,peoplearetiedtoeachotherthroughamembershiprelation.Forexample,AliceandBobmaybememberofthesamesportsclub,orarebothmemberofthesamemanagementteam.Ingeneral,affiliationnetworksareconstructedfromasetofactorsandasetofsocialevents,whereeachactorissaidtoparticipateinoneorseveralevents.Anaffiliationnetworkcanbenaturallyrepresentedasabipartitegraph,witheachvertexrepresentingeitheranactororanevent.Anedgerepresentstheparticipationofanactorinaspecificevent.Affiliationnetworkshavebeenstudiedforavarietyofreasons,buttwoareparticularlyimportantforourdiscussion[WassermanandFaust,1994].First,itisarguedthatthereisalotofinformationtodiscoverbetweenindi-vidualsbyconsideringtheeventsthattheyshare,andlikewise,correlationbetweeneventscanbediscoveredbyconsideringthesharedparticipationbyactors.Inotherwords,theindirectrelationshipbetweenindividualsthatiscausedbytheeventstheyshareisanimportantobjectofstudy,andthesameholdsfortheindirectrelationshipbetweentwoeventscausedbyindi-vidualsparticipatinginbothevents.Thesecondreasonisthatsociologistsbelievethatparticipationincom-moneventshelpstoexplaintheexistenceoftiesbetweentwoindividuals.Forexample,itisbelievedthatinfluencepatternsareestablishedbythefactthatpeopleparticipateinsharedevents.Asaconsequence,understandinghowinformationisdiffused,orhowinnovationsareadopted,mayrequireanunderstandingofsharedeventsbetweenpeople.Becauseaffiliationnetworksconsistoftwodifferentsets,theyarealsoreferredtoastwo-modenetworks.However,whenconsideringthetwomainreasonsforstudyingthem,weseethattheyareeffectivelyusedtostudythe(indirect)relationshipsbetweenindividualsorevents.Thisbringsusbacktoouroriginalconceptionofsocialnetworks,nowreferredtoasone-modenetworks.Letusfirstconsidertheadjacencymatrixrepresentinganaffiliationnet-work.LetVAdenotethesetofverticesrepresentingtheactors,andVEthesetrepresentingevents.Weconsideronlythe(actor,event)submatrixAEconsistingofnA=jVAjrowsandnE=jVEjcolumns.Clearly,wehavethatAE[i,j]=1ifandonlyifactoriparticipatesineventj.Further-more,åi2VAAE[i,j]tellsushowmanyactorsparticipateineventj,whereasåj2VEAE[i,j]tellsusinhowmanyeventsactoriparticipates.LetusconsiderthesimpleaffiliationnetworkshowninFigure9.15,along252 e1e2e3e1e2e3a1110a2100a3010a4011a5011a1a2a3a4a5Figure9.15:Anexampleaffiliationnetworkwithadjacencysubmatrix.withitsadjacencysubmatrix.Nowconsiderthefollowingsum:nENE[i,j]=åAE[i,k]AE[j,k]k=1NotethatAE[i,k]AE[j,k]=1ifandonlyifbothactorsiandjparticipatedineventk.Inotherwords,NE[i,j]countsthenumberofeventsinwhichbothactoriandjparticipated.Likewise,wecancompute:nANA[i,j]=åAE[k,i]AE[k,j]k=1inwhichcasewearecountingthenumberofactorsparticipatinginbotheventiandj.NotethatAE[k,i]AE[k,j]=1ifandonlyifactorkpartici-patedinbotheventsiandj.ThevaluesforthesetwotablesareshowninFigure9.16.Ofcourse,forbothtableswehave:NE[i,j]=NE[j,i]andNA[i,j]=NA[j,i]Furthermore,itisnotdifficulttoseethatNE[i,i]=d(ai)andNA[i,i]=d(ei).NEa1a2a3a4a5NAe1e2e3a121111e1210a211000e2142a310111e3022a410122a510122Figure9.16:ThematricesNEandNAfromFigure9.15.Howdoesthisworkinpractice?In2006amajorDutchnewspapercon-ductedaninvestigationtoidentifythemostinfluentialpeoplewithinthe253 Netherlands[DekkerandvanRaaij,2006].Theresearchwasinspiredbyastatementin1968byJanMertens,atthetimeaunionleader,thattheNetherlandswaseffectivelygovernedbyapproximately200people.Since2006,identifyingthetop-200mostinfluentialpeoplehasbecomeayearlyre-turningevent,withthenotperhapssosurprisingresultthatthetophasn’tchangedalot.Thecoreoftheworkiscenteredaroundatwo-modenet-work,forwhichthetechnicalsetupandanalysisisdescribedindeNooy[2006].Actuallydeterminingwhichpeoplearethemostinfluentialcannotbedonebyinterpretationofrawnetworkdata.Instead,severalmetricsthathavebeendescribedsofarhavebeenadjustedtomorerealisticallyreflectrelationships.Forexample,ratherthantakingthedistanceasthelengthdofashortestpath,itwastakenproportionalto2d.Forourpurposes,wetakeasimpleapproachandmerelyconsiderthelargestconnectedcomponentofthetwo-modenetworkofapproximately200people.Thisleadstoanaffiliationnetworkrepresenting197actorsand391events.Aneventistypicallyaboardofdirectors,asupervisoryboard,etc.ThegraphisshowninFigure9.17wherepeoplearerepresentedbyboxesandeventsbycircles.Ofcourse,bymerelylookingatthisgraphitisalreadyverydifficulttodrawanyconclusions.However,whenweconsiderthematricesNEandFigure9.17:Thegraphof2006top-200mostinfluentialpeopleinTheNetherlands.254 NA,weseethatmorethan1250pairsofactorsshareatleastoneeventthatbothparticipatein.Inparticular,thereisnotasingleactorwhodoesnotparticipateinatleastoneeventwithanotheractor.Infact,thereareanumberofactorswhoparticipateinatleastthreesameevents.WhenwetakealookatthematrixNAweseethatthereishardlyanyeventforwhichitsparticipantsdonotparticipateinanotherevent.Apparently,itiscommonforthetoptoparticipateinatleasttwoevents.Thereisevenapairofeventswithasmuchasnineactorsincommon.Onecouldarguethatinsuchcases,participatinginoneeventimplicitlymeansthatyou’llbeparticipatingintheotheraswell.9.3EquivalenceSofar,wehaveessentiallybeenconcentratingonidentifyingthepropertiesofaspecificperson,oragroupofpersons,inasocialnetwork.Animpor-tant,yetsometimesdifficultquestionisidentifyingthepositionorrolethatsomeonehas.Forsocialnetworks,answeringsuchaquestionisrelatedtoidentifyingsimilaritybetween(groupsof)peoplebasedonthestructureofthenetworkorstructureofsubnetworks.Inthissection,wewilltakeacloserlookatthreerelatedconceptsthathavebeenusedforthispurpose.StructuralequivalenceConsiderthesituationthatinasocialnetworktwopeople,oractorsAandB,haveexactlythesamerelationshipstotheotheractorsinthenetwork.Inotherwords,ifAislinkedtoC,thensoisB,andifthereisnolinkbetweenAandD,thenthereisalsonolinkbetweenBandD.Fromtheperspectiveofthenetwork,youcanarguethatAandBareessentiallyindistinguish-able:theyapparentlyplaythesamerole.Thisnotionofsimilarityhascalledstructuralequivalence,firstformallydefinedbyLorrainandWhite[1971]:Definition9.14:LetDbeadirectedgraph.Twoverticesuandvarestructurallyequivalentiftheirrespectivesetsofin-neighborsandout-neighborsarethesame:Nin(u)=Nin(v)andNout(u)=Nout(v).Inotherwords,twoverticesuandvarestructurallyequivalentifuhasarcstoexactlythesameverticesasv,butalsoallverticesthatarelinkedtouarelinkedtov.Indeed,fromtheperspectiveofanetwork,verticesuandvareindistinguishable.Structuralequivalencecaneasilybedefinedforundirectedgraphsaswell,inwhichcasewerequirethatN(u)=N(v).Figure9.18showsasimplesocialnetworkwithtwostructurallyequivalentverticesuandv.255 uv11uv22Figure9.18:Asimplesocialnetworkwithstructuralequivalentverticesu1andu2.Theformaldefinitionofstructuralequivalenceisratherstrict.Forexam-ple,ifuandvareeachother’sneighbor,thenbydefinitiontheycanneverbestructurallyequivalent.Forthisspecificsituation,equivalencebetweentwoverticesuandvmayexcludethesetwoverticesfromtherespectivesetsofneighbors.Inthatcase,verticesv1andv2fromFigure9.18wouldalsobestructurallyequivalent.Buteventhenitishighlyunlikelytoseeanytwoactorsinpracticalsituationstohaveexactlythesameneighbors.Forthisreasonitmakessensetonotlookforstrictequivalencebuttoseekforaweakerforminwhichtwoverticesare“almost”equivalent.Tothisendwecandefinethefollowingdistancemetrictoexpresstheextentthattwoverticesarethesame.Definition9.15:Considera(strict)directedgraphDwithvertexsetV(D)=fv1,...,vngandadjacencymatrixA.TheEuclideandistanced(vi,vj)betweentwoverticesviandvjisdefinedas:snd(v,v)def(A[i,k]A[j,k])2+(A[k,i]A[k,j])2ij=åk=1Recallthatforastrictdirectedgraph,A[i,j]=1ifandonlyifthereisanarcfromvitovj.Asaconsequence,d(vi,vj)=0ifandonlyifverticesviandvjarestructurallyequivalent:foreachk,A[i,k]=A[j,k]andA[k,i]=A[k,j].TheEuclideandistancebetweentwoverticesnowgivesusameasuretoseetowhatextenttwoverticesarestructurallyequivalent.ConsiderthegraphshowninFigure9.19(a).Itisnotdifficulttoseethatv1andv2arestructurallyequivalent,butitwouldalsoappearthatv3andv4arestruc-turallyverysimilar.IfwecomputetheEuclideandistances,showninFig-ure9.19(b),weseethatindeedv3andv4arerelativelyclosetoeachotherincomparisontootherpairsofnonequivalentvertices.WeleaveitasanexercisetothereadertoactuallycomputethevariousEuclideandistances.Note9.8Togetanimpressionofwhatthechancesareofbeingstructurallyequivalent,256 vv36v1v2v3v4v5v6v1v10.0000.0002.2362.6462.2362.236v20.0000.0002.2362.6462.2362.236v5v32.2362.2360.0001.4142.8282.449v42.6462.6461.4140.0002.4492.000v2vv52.2362.2362.8282.4490.0001.4144v62.2362.2362.4492.0001.4140.000(a)(b)Figure9.19:(a)Adirectedgraphand(b)theEuclideandistancesbetweenitsver-tices.let’sconsideradirectedER(n,p)randomgraphforwhichpindicatestheprob-!abilitythatthereisanarchu,viforanarbitrarilychosenpairofverticesuandv.Theprobabilitythattwoverticesuandvhaveanarctothesamevertexw,isobviouslyp2.Ifbothhaveoutdegreekout,thentheprobabilitythattheyhaveexactlythesamesetofout-neighborsisequalto(n2)(p2)kout(1p2)n2kout.koutLikewise,iftheybothhaveindegreekin,theprobabilityofhavingexactlythesamesetofin-neighborsisequalto(n2)(p2)kin(1p2)n2kin.Giventhefactkinthatevenhavingthesamevertexdegreecanberatherlow,itisnothardtoseethatfindingtwostructurallyequivalentverticesinadirectedgraphisindeedverylow.Therefore,theimplicationoffindingsuchnodesinrealnetworksmeansthatsomethinginterestingmaybegoingon.300025002000150010005001818.51919.52020.521Figure9.20:ThedistributionofdistancesinadirectedER(500,0.25)randomgraph.257 Asafurtherillustration,Figure9.20showsthedistributionofEuclideandistancesbetweenpairsofverticesinadirectedER(500,0.25)randomgraph.WeconcludethatonlyveryfewverticeslieclosetoeachotherwhentakingtheEuclideandistanceasmetric.Again,thismeansthatifwedofindverticesclosetoeachother,thenthisshouldbetreatedasquiteexceptional,whichisexactlywhatwehopetofindwhenlookingforwhatcouldbecalledstructuralsimilarity.AutomorphicequivalenceAsmentioned,structuralequivalenceisratherstrictasitdemandsthattheneighborsetsoftwoverticesareexactlythesame.Ineffect,twostructurallyequivalentverticesareconsideredtobeinterchangeableandhavethesamepositioninanetwork.However,weareoftenlookingfornodesinasocialnetworkthathavesimilarroles(seealsoWassermanandFaust[1994]).Forexample,wemaywanttoidentifywhoareteachersinaschool.Thebasicassumptionunderlyingsuchanidentificationisthatweshouldlookatthestructureofthesubgraphsurroundingspecificvertices.Indeed,thisbringsustoconsideringgraphisomorphismsagain,whichwediscussedinSec-tion2.2.Inparticular,wearelookingforawaytoexchangetwovertices,alongwiththeirrespectiveneighbors,suchthattheresultinggraphremains“thesame.”Tomakethismoreprecise,recallfirstthedefinitionofgraphisomor-phism:Definition9.16:ConsidertwographsG=(V,E)andG=(V,E).GandGareisomorphicifthereexistsaone-to-onemappingf:V!Vsuchthatforeveryedgee2Ewithe=hu,vi,thereisauniqueedgee2Ewithe=hf(u),f(v)i.Keepingagraph“thesame”isessentiallyaskingwhetheragraphisiso-morphicwithitself,butusinganontrivialremappingofvertices.Nontrivialmeansthatatleastsomeverticesarenotmappedontothemselves.Formally,wespeakofanautomorphism,whichisdefinedasfollows:Definition9.17:ConsideranundirectedgraphG=(V,E).Anautomorphismisaone-to-onemappingf:V!Vsuchthatforeveryedgee2Ewithe=hu,vi,thereisauniqueedgee2Ewithe=hf(u),f(v)i.Anautomorphismfiscallednontrivialifatleastforonevertexv2Vwehavethatf(v)6=v.Notethatthedefinitionofautomorphismcanbeeasilyextendedtodirectedgraphs.Wecannowdefinewhentwonodesinasocialnetworkplaythesamerolebyconsideringtheassociated(directedorundirected)graph:258 Definition9.18:ConsideragraphG.Twodistinctverticesuandvareauto-morphicallyequivalentifandonlyifthereisanautomorphismfforGwithf(u)=v.Toillustratetheideaofautomorphicalequivalence,considerthesocialnetworkshowninFigure9.21.Inthisexample,itisnotdifficulttoseethatthetwosubgraphsH1andH2arenotonlyisomorphic,butthattheycanalsobe“swapped”toobtainessentiallythesamegraph.Inparticular,themappingf(ui)=viwilldothejob.Thisalsomeansthateachpairofver-tices(ui,vi)areautomorphicallyequivalent.Finally,notethatjustasinthecaseofgraphisomorphism,findinga(nontrivial)automorphismmaybeadifficulttasktoaccomplish.Hu31u4u5u2u1v1H2v4v2v5v3Figure9.21:Anexampleofadirectedgraphwithautomorphicallyequivalentver-tices.RegularequivalenceBothstructuralandautomorphicalequivalencehaverelativelysimplegraph-theoreticalformulations,yetmayberatherdifficulttouseinpractice.Asitturnsout,forsociologicalresearch,anothertypeofequivalenceisoftenmoreimportantasitmorenaturallyreflectsthenotionofarole[HannemanandRiddle,2005]:regularequivalence.Informally,twonodesinasocialnetworkareregularlyequivalentiftheyfulfillthesamerole.Thelatterisdecidedbytakingalookatthenodestowhichthetwonodesarelinked:iftherespectivedestinationsarealsoregularlyequivalent,thensoarethesources.Forexample,twopeoplemaybeidentifiedasregularlyequivalentbecausebothhavealinktotwonurses,whichhadalreadybeenidentifiedasbeingregularlyequivalent.Inthiscase,thetwosourcesmayturnouttobedoctors.Anissuewiththisdefinitionisthatitisrecursive:beingregularlyequiv-alentdependsontheequivalenceofthetargets.Formally,wehave:259 Definition9.19:LetGbeanundirectedgraph.Twoverticesu1andu2aresaidtoberegularlyequivalentifforalledgeshu1,v1i2E(G)thereisanedgehu2,v2i2E(G)suchthatv1andv2arealsoregularlyequivalent.Anotherwayoflookingatregularequivalenceiscoloringtheverticesofagraphsuchthatiftwoverticesuandvhavethesamecolor,thenforeachneighborofutherewillbeaneighborofvwiththesamecolor.ConsiderthegraphshowninFigure9.22(a),takenfromBorgattiandEverett[1992].Clearly,eachblack-coloredvertexisadjacenttoeitheranotherblack-coloredvertexorawhite-coloredvertex.Aninterestingcaseisformedbythewhite-coloredvertices.Clearly,eachsuchvertexmaybejoinedwithavertexofanycolor.However,what’simportantisthatforeverywhite-coloredver-texjoinedwithanyvertexofcolorc,anotherwhite-coloredvertexwillbejoinedwithavertexofcolorcaswell.Thisistheessenceofbeingregularlyequivalent.(a)(b)Figure9.22:(a)Coloringtheverticesofagraphtoidentifyregularequivalence.(b)Analternativecoloringthatalsoreflectsstructuralequivalence.Figure9.22(b)showsanalternativecoloringthatalsoreflectsstructurallyequivalentvertices.Ingeneral,iftwoverticesarestructurallyequivalent,theywillalsoberegularlyequivalent.260 CONCLUSIONS Wehavecomealongway,andforsometheroadhasinevitablybeenrough.Summarizing,thereareessentiallythreemajortopicsthatshouldhavebeenpickedupbynow:1.Basicgraphtheory2.Metricsforgraphanalysis3.BasicsofcomplexnetworktheoryLet’sconsidereachofthesebriefly.GraphtheoryChapters2to5coverthebasicmaterialthatyouwouldfindinmostin-troductorycoursesongraphtheory.Wehavediscussedthefoundationsofgraphs,includingtheirrepresentationsandembeddingsintheplane,al-lowingustospeakmoreaccuratelyaboutgraphsthatarethesame.Under-standingthefoundationsisimportantinordertomakeanystepsinunder-standingreal-worldnetworks.Chapter3canbesomewhatconsideredasacollectionofrandomlycol-lectedgeneraltopicsongraphtheory,butwhichformessentialextensionstothefoundationsdiscussedinthepreceedingchapter.Again,weseethatreal-worldnetworkscanbemucheasiermodeledwhenedgescanbedi-rectedorhaveaweight.Whenitcomestocoloring,notablythevertexcol-oringprovestocomeinhandywhencertainpropertiesofnetworksneedtobeshown.Therearemanytopicsthatwehavenotdiscussedthatcouldeasilybecategorizedasextensionstoourfoundations.Examplesincludematchingsandindependentsets,aswellasadiscussionofmanyspecialgraphs.Alsoimportantisthetopicofnetworkflowswhichwehavecompletelyignored.Thereareseveralmoreforwhichitcanbearguedthattheydeserveaplaceinanyintroductorybookongraphtheory.However,manyofthesetop-icsarelessrelevantinlightofunderstandingreal-worldnetworks.Instead,theyoftenturnouttobeparticularlyusefulinthecontextofoptimizationproblems,whichliesattheheartofafieldofmathematicsknownasopera-tionsresearch.Inthislight,onemayarguethatdiscussingEulertoursandHamiltoncycles,thetopicsofChapter4,couldequallywellhavebeenskipped.How-ever,thesetopicswerefelttobesofundamentalthatskippingthemwouldnothavebeentherightthingtodo.Notablythesubtledifferencebetweenthetwoconceptsisimportant,andwhenrealizingthatthetravelingsales-manproblemaloneiswholefieldofresearchbyitself,andimportantforinformationandcomputerscientists,skippingitisnotreallyanoption.263 InChapter5wediscussedtrees,whichformarecurringsubjectinmanycoursestaughtforITstudents.Notablytheissueofrouting,i.e.,shortestpaths,isfundamentaltounderstandinghowinformationmaybedissemi-natedthroughanetwork.GraphanalysisWhenwestartedtodiscussmetricsforgraphs,westartedtodeviatefromclassicaltextsongraphtheory.Althoughconceptssuchaseccentricityandthecenterofagraphcanbefoundinmanystandardtextbooksongraphtheory,theyarenotasmuchemphasizedastheyarehere.Understandingandbeingabletosayanythingsensibleaboutreal-worldnetworksrequiresathoroughunderstandingofgraph-theoreticalmetrics.Itisthroughthesemetricsthatitbecomespossible,forexample,toassessthecomplexityofanetwork.Itisactuallysurprisinghowfewmetricsaregenerallyused.Whatshouldbelesssurprisingisthatwithdiscretestructuressuchasgraphsfindingmet-ricsthatleadtoa“soft”classificationofnetworksismuchmoredifficult.Forexample,findingthecenterofagraphmaynotbeveryusefulforarandomnetwork,asitmayeasilyconsistofonlyoneorveryfewnodes.Moreim-portantisthatwewouldbeabletoidentifyallthenodesinacenterbyamorerelaxedcriterionthanhavingminimaleccentricity.Thisisespeciallytrueforlargenetworks,butitmayalsoholdforrelativelysmallnetworks.Webumpedintothisphenomenonafewtimes,notablyinthecaseofsocialnetworkswhendiscussingcohesivesubgroupsandlaterstructuralequivalence.Whatisneededaremetricsthatcanadequatelycapturethefuzzinessofwhatwetendtocallinourdailylives“cliques,”the“mostimportant”peopleororganizations,andsoon.Thisbookhasonlybrieflytoucheduponsomeoftheattemptsatgraspingsuchmetrics.ComplexnetworksFinally,wehavecompletelydeviatedfromstandardintroductorytextbooksongraphtheorywiththematerialcoveredbyChapter7through9,withtheexceptionofsomematerialaboutsocialnetworks.Complexnetworksinmanysensescapturewhatwecanobserveinreallifeandforthatreasonalonetheyareimportanttostudy.Graphtheoryformsthefoundationsforunderstandingcomplexnetworksandthefirstpartofthebookshouldbesufficienttomakeanextstep.Chapter7providesthefundamentalsforgoingintothereal-worldnet-worksdiscussedlaterinthebook.Thethreetypesofrandomnetworks—Erdos-R¨enyigraphs,Watts-Strogatzgraphs,andscale-freenetworks—form´264 thebasisforclassifyingreal-worldnetworks.Notablythecombinationofsmall-worldpropertiesandhighclusteringaswitnessedinthecaseofscale-freenetworksisimportant.Weactuallydiscussedonlyveryfewreal-worldcomplexnetworkswiththeInternetandWebfromChapter8beingthemostillustrative,alongwiththestructuredandunstructuredpeer-to-peernetworks.Therearemanyexamplesofcomplexnetworkstobefoundinfieldssuchastransporta-tion,neurology,biology,financialmarkets,language,etc.Withtheexam-plesgiveninChapter8itshouldnotbetoodifficulttotakefurtherstepsinunderstandingsuchnetworks.Again,itisimportanttoconsiderhowex-actlycomplexnetworksaremeasured.Weencounteredthatinmanycasesthenetworkscanbesolargethatweneedtoresorttosamplingtechniques,whichimmediatelybringsuptheproblemofdatavalidity.Inotherwords,isoursamplegoodenoughtoberepresentativefortheentirenetwork?WesawthatinthecaseoftheWeb,answeringthisquestionmaybefarfromtrivial.Socialnetworks,discussedinChapter9,inmanywaysbroughtusbacktomoretraditionalgraphtheory.OnecouldconsidermanysocialnetworktoolstoformanextensiontographtheoryasdiscussedinChapter2,buttargetedtoaspecificfieldofstudy.Whatwehavenotdiscussedisthelinkbetweentraditionalsocialnetworksandsocialonlinecommunities,anemergingsubdisciplineinthefieldofnetworkscience.Asmaybeexpected,socialonlinecommunitiesexhibitmanyofthepropertiescommontocom-plexnetworks,yetatthesametimebecomeinterestingwhenweattempttodiscoversocialstructures.Again,withthematerialcoveredinthesecondhalfofthebook,thereadershouldbeabletoeasilyfollowtheliteratureoncomplexsocialnetworks.NextstepsandsuggestedtextbooksAfterhavingploughedthroughthisbook,awiderangeoftopicslieopenforfurtherexploration.Inthefirstplace,forthosewhohavebecomemoreinterestedinmathandgraphtheory,therearemanyexcellenttextbooksthatcanbepickedupfromhere.AgoodstartingpointisformedbyWest[2001],althoughmanywillstillappreciatethesomewhatoutdated,yetexcellentworkdescribedinBondyandMurty[1976].AnothergoodandcertainlygentleintroductionisprovidedbyAldousandWilson[2000],whoputmuchlessemphasisonformalnotationsthanwehavedone.Whenitcomestoun-derstandingproofsandmathematicalnotations,Velleman[2006]ishighlyrecommended.Therearefewtopicsingraphtheorytowhichentirebookshavebeendevoted.Someoftheonesthatwillnowmakealotofsenseandwillcer-265 tainlybeappreciatedarethefollowingtwo.First,Wilson[2004]providesaveryniceandinterestinghistoricalreadonthefour-colorproblem.Theimportanceofthetravelingsalesmanproblemshouldhavebecomeclearbynow.Anexcellentdescriptionofwhatittakestoputittopracticeisde-scribedinApplegateetal.[2007],althoughitdoesrequiregoingthroughsomemoreseriousmath.Therearenotmanybooksthatconcentrateentirelyonnetworkanalysis.BrandesandErlebach[2005]containsacollectionofarticlesthatdescribedifferentaspectsofnetworkanalysis,includingachapteronvariousmet-rics.Itmaybeusefulasbackgroundmaterialandatthesametimeaconcisereferencetovariousgraph-theoreticalfoundations.Whenitcomestocomplexnetworks,anexcellentstartingpointisformedbyBarabasi[2002],anexceptionallywell-writtenbookthatwillmostlylikely´triggerfurtherinterestintothetopic.EquallyrecommendedisWatts[2003]whichalsoconcentratesoncomplexnetworks.Complexityingeneralisdis-cussedinMitchell[2009],averyaccessiblereadintothefascinatingfieldofwhatiscalledcomplexityresearch.Whenitcomestogettinganoverviewofalltheimportantpublicationsthatgaveformtotheresearchintocom-plexnetworks,Newmanetal.[2006]istheplacetogo.Thiseditedbookisacollectionoforiginalpapersonrandomgraphs,scale-freenetworks,thestructureoftheWebandsoon.Goingthroughsomeseriousmathissome-timesneeded,butrewarding.Finally,forthosewhohavepickedupaninterestinsocialnetworkanal-ysis,agoodpointtostartisKnokeandYang[2008].Scott[2000]pro-videsanexcellentoverviewondifferenttopics,treatingtheminindepen-dentchapters.Thedefinitiveguidetosocialnetworkanalysis,however,remainsWassermanandFaust[1994].Anextensive,yetreasonablyacces-siblepieceofwork.Finally,goingfromstructuretocontent,ChristakisandFowler[2009]concentratelessonthestructureofsocialnetworksandin-steadattempttodiscoverandexplainthemeaningbehindlinksinsocialnetworksandwhatinformationcanbederivedfromthosenetworks.266 MATHEMATICALNOTATIONSBasicsetnotationsNThesetofnaturalnumbers.RThesetofrealnumbers.jSjThesizeofa(finite)setS.minSThesmallestvaluefoundinsetS.maxSThelargestvaluefoundinsetS.8Theuniversalquantifier,usedinstatementssuchas“forall...”.9Theexistentialquantifier,usedinstatementssuchas“thereexists...”.x2SElementxisamemberofsetS.VnWThesetVexcludingelementsthatarealsomemberofW.VWDenotesthatthesetVisasubsetofW,andpossiblyequaltoW.VWDenotesthatVisapropersubsetofW,i.e.,VWandV6=W.VWTheintersectionofthetwosetsVandW.Tni=1ViTheintersectionofnsets:V1V2VnV[WTheunionofthetwosetsVandW.Sni=1ViTheunionofnsets:V1[V2[[VnGeneralmathematicalnotationsdxeThesmallestnaturalnumbergreaterorequaltox.bxcThelargestnaturalnumbersmallerorequaltox.n!Tobepronouncedasnfactorial:n!def=n(n1)(n2)1.nkThefactthatnismuchlargerthank.267 nx,meaningx+x++åSummation,suchasåi=1i12xn.PMultiplication,suchasPnxi,meaningx1x2i=1xn.[a1,a2,...an]The(ordered)sequenceofelementsa1,a2,...,an.xSxtakesthevalueresultingfromtheexpressionS,pronouncedas“xbecomesS”.f(x)O(g(x))f(x)isboundedbyg(x):9M8x>x0:jf(x)jx0:jf(x)j>Mjg(x)j.Thisalsomeansthatg(x)O(f(x)).f(x)Q(g(x))f(x)followsthesameformasg(x):9M,M08x>x0:M0jg(x)j0vertices.Km,nThecompletebipartitegraphwithwithtwovertexsetsofsizemandn,respectively.GThecomplementofgraphG,i.e.,thegraphobtainedfromGbyremovingitsedgesandjoiningverticesthatwerenonadjacentinG.Hk,nAk-connectedgraphwithnverticesandaminimalnumberofedges:aHararygraph.N(v)Thesetofneighborsofvertexv.Nin(v)Thesetofin-neighborsofvertexv.Nout(v)Thesetofout-neighborsofvertexv.268 d(v)Thedegreeofvertexv,i.e.,thenumberofincidentedges.din(v)Theindegreeofvertexv,i.e.,thenumberofincomingarcsatv.dout(v)Theoutdegreeofvertexv,i.e.,thenumberofoutgo-ingarcsfromv.D(G)ThemaximaldegreeofanyvertexingraphG:maxfd(v)jv2V(G)g.Metricsongraphsd(u,v)Thegeodesicdistancebetweenvertexuandv.Thisiseitheraminimal-length(u,v)-pathoraminimal-weight(u,v)-path..e(u)Theeccentricityofvertexu:themaximumdistanceofutoanyothervertex.t(G)ThenetworktransitivityofgraphG:theratiobe-tweenthenumberoftrianglesandtriplesinG.cC(u)Theclosenessofvertexu(inagraphG),measuredasthereciprokeofthetotaldistanceuhastotheotherverticesofG.cB(u)Thebetweennesscentralityofvertexu:theratioofshortestpathsbetweentwoverticesthatgothroughu.cE(u)Thevertexcentralityofu:thereciprokeofitseccen-tricity.diam(G)ThediameterofgraphG:thelengthofthelongestshortestpathbetweenanytwovertices,i.e.,themax-imaleccentricityamongtheverticesofG.rad(G)TheradiusofgraphG:theminimaleccentricityamongitsvertices.C(G)ThecenterofgraphG:thesetofverticesforwhichtheeccentricityisthesameastheradiusofG.cc(v)Theclusteringcoefficientofvertexv.CC(G)TheaverageclusteringcoefficientmeasuredoverallverticesofgraphG.w(G)ThenumberofcomponentsofgraphG.k(G)ThesizeofaminimalvertexcutofgraphG.l(G)ThesizeofaminimaledgecutofgraphG.c0(G)TheedgechromaticnumberofG:theminimalkforwhichgraphGisk-edgecolorable.c(G)ThechromaticnumberofG:theminimalkforwhichgraphGisk-vertexcolorable.269 ProbabilitiesP[d=k]Theprobabilitythatthedegree(ofanarbitrarilycho-senvertex)isequaltok.P[k]AnabbreviationforP[d=k].E[X]TheexpectedvalueoftherandomvariableX(oftencorrespondingtothemean).SpecialclassesofgraphsER(n,p)ThecollectionofErdos-R¨enyirandomgraphswith´nverticesandprobabilitypthattwodistinctverticesarejoined.WS(n,k,p)ThecollectionofWatts-Strogatzrandomgraphswithnvertices,initialvertexdegreekandrewiringprob-abilityp.BA(n,n0,m)ThecollectionofBarabasi-Albertrandomgraphs´withnvertices,n0initialverticesandagrowthofmedgesateachstep.270 INDEXk-regulargraph,23Barabasi-Albertgraph,´seerandomgraphaccessnetwork,191Bellman-Fordalgorithm,124acyclicgraph,51betweennesscentrality,152,227,address,188233,234MAC,189BGP,seeBorderGatewayProtocolbigOnotation,128address,hostidentifier,190binomialdistribution,159address,IP,189bipartitegraph,46address,networkidentifier,190complete,53adjacencymatrix,31,138blockmodeling,229symmetric,31bordergateway,seegateway,bor-adjacentvertices,19derADSLconnection,103BorderGatewayProtocol,193algorithmbowtie,seeWebgraphbreadthfirst,62arc,57centerofagraph,151head,58characteristicpathlength,141tail,58Chinesepostmanproblem,87AS,seeautonomoussystemChord,197ASnumber,192fingertable,199AStopology,192successor,198assortativemixing,137chromaticnumber,72automorphicequivalence,seeequiv-circularembedding,45alenceclient,195automorphism,258client-serverarchitecture,195autonomoussystem,192cliqueaveragepathlength,141adjacentk-cliques,249community,249BAgraph,seerandomgraphdirectedk-clique,251271 k-clan,247curvefitting,136k-clique,246cutedge,38k-club,248cutvertex,38,152k-distance-clique,247cycle,37maximal,246directed,61cliquepercolation,249cycletime,seeepidemicprotocolclosedwalk,37DAG,seedirectedacyclicgraphclosedwalk,83decentralizedalgorithm,126closeness,151,233,234degreecorrelation,138clusteringdegreecorrelation,137globalview,147degreedistributionlocalview,144powerlaw,172clusteringcoefficientdegreeprestige,235ofavertex,163degreesequence,23ofadirectedgraph,145ordered,23ofagraph,144DHCP,seeDynamicHostConfig-ofavertex,144,145urationProtocolofavertexinaweightedgraph,DHCPserver,189145diameter,141cohesivesubgroup,246digraph,57communicationstronglyconnected,61heliographic,4weaklyconnected,61telegraphic,4Dijkstra’salgorithm,120communicationprotocol,5Dirac’stheorem,95completebipartitegraph,53directproof,73completegraph,19directedcycle,61complexnetwork,3directedwalk,61component,38directedacyclicgraph,61computationallyefficient,130directedgraph,57computationallyinefficient,130acyclic,61connectedworld,4arc,57connectedgraph,37orientation,58connectedvertices,37strict,58connectivitydirectedk-clique,seecliquek-connected,39directedpath,61k-edge-connected,39directedtrail,61optimallyconnected,39DISCONNECTED,seeWebgraphconnectorproblem,107disconnectedgraph,38correlationcoefficient,137distancecount-to-infinityproblem,127betweenvertices,47,66cubicgraph,23,29Euclidean,256272 geodesic,66,140fingertable,seeChordDNS,seeDomainNameSystemflowofcontrol,63,68DomainNameSystem,213forest,51domainname,212DynamicHostConfigurationPro-gateway,border,191tocol,189geodesic,66geodesicdistance,seedistanceeccentricity,140,151,233giantcomponent,165edge,10,18gossiping,seeepidemic-basednet-duplicating,89worksendpoint,19gossipingmodels,143incident,19graph,10loop,19k-regular,23multiple,19,69acyclic,51weight,65automorphism,258edgelist,33center,151edgechromaticnumber,71complement,19edgecoloring,71component,38minimal,70connected,37edgecut,38definition,18edge-independentpaths,40directed,57eigenvalue,239disconnected,38eigenvector,238,239edge,18endpoint,seeedge,endpointempty,19epidemicdissemination,143Hamiltonian,92epidemicprotocol,seepeer-to-peerinduced,29cycletime,206,208isomorphism,33,258round,208,209joinvertices,18epidemic-basednetwork,204line,30equivalenceorientation,58automorphic,259planar,50regular,260plane,50equivalence,structural,255regular,23ERrandomgraph,seerandomgraphsimple,19,31,58Euclideandistance,seedistance,Euclideansubgraph,29Eulerconstant,162,179tree,seetreeEulertour,83union,29Eulertrail,84vertex,18,57existentialquantifier,20weighted,65existentialproof,76,96graphembeddingexpectedvalue,seerandomvari-circular,45ableranked,46273 spring,47IP,seeInternetProtocolgraphclosure,97IPaddress,seeaddress,IPgraphembedding,45isomorphicgraphs,33,258graphtheory,13,18ISP,seeInternetServiceProvidergraphic,23gridgraph,127k-clan,seeclique,k-clank-cliquecommunity,seecliqueHamiltoncycle,81,92k-club,seeclique,k-clubHamiltonpath,92k-connectedgraph,39Hamiltoniangraph,92k-core,249Hararygraph,41k-distance-clique,seecliquehead,seearc,headk-edgecoloring,71homenetwork,190k-edge-connectedgraph,39homophily,226k-vertexcoloring,71host,187hostidentifier,seeaddress,hostiden-LAN,seelocal-areanetworktifierlinegraph,seegraph,lineHTML,seeHyperTextMarkupLan-local-areanetwork,188guageloop,seeedge,loopHTTP,seeHyperTextTransferPro-lowerbound,102tocolHTTPrequest,213MACaddress,seeaddress,MAChub,133markuplanguage,214hyperlink,212,213matching,91,263HyperTextMarkupLanguage,214perfect,92HyperTextTransferProtocol,213MBone,107mean(ofarandomvariable),seeiff,26randomvariableIN,seeWebgraphmedian,141in-neighborset,58Menger,Karl,40incidencematrix,31messagerouting,119indegree,58multiplearc,69independentset,263multipleedge,seeedge,multipleindirectproof,73inducedgraph,29neighborset,20infixnotation,110networkinfluencedomain,235transportation,107interface,119networktransitivity,149communication,119networkscience,10InternetProtocol,189networkdensity,147,163InternetServiceProvider,191sparse,167Internet,edge,191networkflow,263274 networkidentifier,seeaddress,net-planegraph,50workidentifierPosa’salgorithm,99networkscience,11powerlawdistribution,seedegreenetworktransitivity,147distributionnetwork,access,191preferentialattachment,174network,home,190prefixnotation,110network,tier1,192prooftechniquenetwork,tier2,191extremality,84network,tier3,191prooftechniquesnonconstructiveproof,27existential,96proofbycontradiction,44one-modenetwork,252proofbyinduction,51optimallyconnectedgraph,39proofbyconstruction,27,96orientation,58prooftechniquesOUT,seeWebgraphbyconstruction,27,96out-neighborset,58bycontradiction,44outdegree,58byinduction,51overlaynetwork,109,196direct,73existential,76packet,187extremality,96PageRank,217indirect,73partialview,196proximityprestige,235path,37pseudo-code,63directed,61controlflow,63edge-independent,40length,126radius,140vertex-independent,40randomvariable,159peer,196randomgraph,46peer-to-peerBarabasi-Albert,175´epidemics,204ERrandomgraph,158peer-to-peernetwork,196Erdos-R¨enyigraph,158´unstructured,204Watts-Strogatz,168peeringrelationship,191randomnetworkperfectmatching,92seerandomgraph,158Petersengraph,45randomvariable,159pigeonholeprinciple,44discrete,159planargraph,50expectedvalue,160planargraphmean,160exteriorregion,50rankedembedding,46face,50rankedprestige,236interiorregion,50reachabilityanalysis,62region,50regulargraph,23275 regularequivalence,seeequivalencestructuralequivalence,seeequiv-rootedtree,109,120alencerotationaltransformation,99structuralbalance,228,240round,seeepidemicprotocolsubgraph,29router,188,189supersmallworld,180routing,119,187surfaceWeb,216routingalgorithm,66switch,188routingcost,124routingprotocol,119tail,seearc,taildistancevector,126telegraphiccommunication,4TENDRIL,seeWebgraphlinkstate,120topology,123routingtable,119tour,81,83trail,37scale-freenetwork,172directed,61scale-freeness,139transportationnetwork,107normalized,140travelingsalesmanproblem,93scalingexponent,172tree,6,51,68,107SCC,seeWebgraph,SCCbinary,111server,195descendant,111shortestpath,47,66intermediatenode,109shuttertelegraph,5leafnode,109sign,240parent,111productof,242rooted,109,120signedgraph,240sink,120balanced,243spanning,109sinktree,120triad,228,240small-worldnetwork,167triangle,146socialbalance,seestructuralbal-atavertex,146ancetransitive,150socialnetwork,167,225weight,149sociogram,10,228,231triplesociometry,228atavertex,146spanningtree,109nonvacuous,150spanningwalk,81weight,149sparsenetwork,seenetworkden-TSP,seetravelingsalesmanprob-sitylemspidertrap,216TUBE,seeWebgraphspringembedding,47two-modenetwork,252standarddeviation,138strict,seedirectedgraph,strictunderlyinggraph,58stronglyconnecteddigraph,61UniformResourceLocator,213276 URL,seeUniformResourceLoca-DISCONNECTED,218torIN,217OUT,218vertex,10,18,57TENDRIL,218adjacent,19TUBE,218degree,21Webserver,213degreecorrelation,137,138Website,212indegree,58Websubgraph,217outdegree,58weight,65type,137weightedaverage,160vertexdegreeweightedclusteringcoefficient,145distribution,59weightedgraph,65vertexcentrality,151,234WorldWideWeb,212vertexcoloring,71WSrandomgraph,seerandomgraph,vertexcut,38Watts-Strogatzvertexdegree,21,31,32WWW,seeWorldWideWebdistribution,22vertexdegreedistribution,59vertexreachability,62vertexstrength,145vertex-independentpaths,40virtualnetwork,42walk,37,81closed,37,83directed,61spanning,81Watts-Strogatzrandomgraph,168weaklink,168weaktie,230weaklyconnecteddigraph,61Websubgraphbowtie,217Webclient,213Webcrawlingbreadthfirst,216PageRank,217,220randomselection,217WebGraphSCC(StronglyConnectedCom-ponent),217Webgraph,214277 BIBLIOGRAPHYAdamicL.TheSmallWorldWeb.InAbiteboulS.andVercoustreA.-M.,editors,ECDL,volume1696ofLectureNotesonComputerScience,pages443–452,Berlin,Sept.1999.Springer-Verlag.Citedon221,222AlbitzP.andLiuC.DNSandBIND.O’Reilly&Associates,Sebastopol,CA.,4thedition,2001.Citedon213AldousJ.andWilsonR.GraphsandApplications,AnIntroductoryApproach.Springer-Verlag,Berlin,2000.Citedon265AppelK.andHakenW.EveryPlanarMapisFourColorable.Bull.Amer.Math.Soc.,82:711–712,1976.Citedon74AppelK.andHakenW.TheFourColorProofSuffices.MathematicalIntelligencer,8(1):10–20,1986.Citedon74ApplegateD.L.,BixbyR.E.,ChvatalV.,andCookW.J.TheTravelingSalesmanProblem:AComputationalStudy.PrincetonUniversityPress,Princeton,NJ,2007.Citedon93,266BarabasiA.-L.´Linked,TheNewScienceofNetworks.PerseusPublishing,Cambridge,MA,2002.Citedon157,266BarabasiA.-L.andAlbertR.TheEmergenceofScalinginRandomNetworks.´Sci-ence,286:797–817,1999.Citedon172,174,175BarratA.,Barth’elemyM.,Pastor-SatorrasR.,andVespignaniA.TheArchitectureofComplexWeightedNetworks.ProceedingsoftheNationalAcadamyofSciencesoftheUnitedStatesofAmerica,101(11):3747–3752,Mar.2004.Citedon145BarrosoL.,DeamJ.,andHolzeU.WebSearchforaPlanet:TheGoogleClusterArchitecture.IEEEMicro,23(2):21–28,Mar.2003.Citedon215BecchettiL.,CastilloC.,DonatoD.,andFazzoneA.AComparisonofSamplingTech-niquesforWebGraphCharacterization.InWorkshoponLinkAnalysis:DynamicsandStaticofLargeNetworks(LinkKDD2006),NewYork,NY,Aug.2006.ACM,ACMPress.Citedon216BoldiP.,SantiniM.,andVignaS.PageRankasaFunctionoftheDampingFactor.In14thInternationalWorldWideWebConference,pages557–566,NewYork,NY,May2005.ACM,ACMPress.Citedon221BondyJ.andMurtyU.GraphTheorywithApplications.Macmillan,London,1976.Citedon42,71,85,118,265279 BondyJ.andMurtyU.GraphTheory.Springer-Verlag,Berlin,2008.Citedon41BorgattiS.andEverettM.TheNotionofPositioninSocialNetworkAnalysis.Soci-ologicalMethodology,22:1–35,1992.Citedon260BrandesU.andErlebachT.,editors.NetworkAnalysis:MethodologicalFoundations,volume3418ofLectureNotesonComputerScience.Springer-Verlag,Berlin,2005.Citedon133,266BrinS.andPageL.TheAnatomyofaLarge-scaleHypertextualWebSearchEngine.ComputerNetworks,30(1-7):107–117,1998.Citedon217BrinkmeierM.andShankT.NetworkStatistics.InBrandesU.andErlebachT.,editors,NetworkAnalysis,volume3418ofLectureNotesonComputerScience,pages293–317.Springer-Verlag,Berlin,2005.Citedon140BroderA.,KumarR.,MaghoulF.,RaghavanP.,RajagopalanS.,StataR.,TomkinsA.,andWienerJ.GraphStructureintheWeb.ComputerNetworks,33(1-6):309–320,2000.Citedon217,218,219,222BuchananM.Nexus:SmallWorldsandtheGroundbreakingScienceofNetworks.Norton,NewYork,NY,2002.Citedon157CartwrightD.andHararyF.StructuralBalance:AGeneralizationofHeider’sThe-ory.PsychologicalReview,63(5):277–293,1956.Citedon228ChartrandG.IntroductoryGraphTheory.DoverPublications,NewYork,NY,1977.Citedon76ChiY.-J.,OliveiraR.,andZhangL.Cyclops:TheAS-levelConnectityObservatory.ACMComputerCommunicationsReview,Oct.2008.Citedon194,195ChoJ.,Garcia-MolinaH.,HaveliwalaT.,LamW.,PaepckeA.,RaghavanS.,andWesleyG.StanfordWebBaseComponentsandApplications.ACMTransactionsonInternetTechnology,6(2):153–186,2006.Citedon219ChristakisN.andFowlerJ.Connected:TheSurprisingPowerofOurSocialNetworksandHowTheyShapeOurLives.Little,BrownandCompany,NewYork,NY,2009.Citedon266CookD.andHolderL.,editors.MiningGraphData.JohnWiley,NewYork,2007.Citedon133CotheyV.Web-crawlingReliability.JournaloftheAmericanSocietyforInformationScienceandTechnology,55(14):1228–1238,2004.Citedon219CsermelyP.WeakLinks:StabilizersofComplexSystemsfromProteinstoSocialNetworks.Springer-Verlag,Berlin,2006.Citedon168d’AngeloJ.andWestD.MathematicalThinking,Problem-SolvingandProofs.PrenticeHall,EnglewoodCliffs,N.J.,2ndedition,2000.Citedon51NooyW.de.Descriptionofthedata.Technicalnote,Nov.2006.http://home.medewerker.uva.nl/w.denooy/.Citedon254NooyW.de,MrvarA.,andBatageljV.ExploratorySocialNetworkAnalysiswithPajek.CambridgeUniversityPress,Cambridge,UK,2005.Citedon225,230DekkerW.andRaaijB.van.DeElite.Meulenhoff,Amsterdam,TheNetherlands,2006.InDutch.Citedon254DemersA.,GreeneD.,HauserC.,IrishW.,LarsonJ.,ShenkerS.,SturgisH.,Swine-hartD.,andTerryD.EpidemicAlgorithmsforReplicatedDatabaseMaintenance.In6thSymposiumonPrinciplesofDistributedComputing,pages1–12.ACM,Aug.280 1987.Citedon206DharwadkerA.ANewAlgorithmforfindingHamiltonianCircuits.http://www.geocities.com/dharwadker/hamilton,2004.Citedon96DiestelR.GraphTheory.Springer-Verlag,Berlin,3rdedition,2005.Citedon41DoddsP.,MuhammedR.,andWattsD.J.AnExperimentalStudyofSearchinGlobalSocialNetworks.Science,301:827–829,Aug.2003.Citedon10DonatoD.,LauraL.,LeonardiS.,andMillozziS.TheWebasaGraph:Howfarweare.ACMTransactionsonInternetTechnology,7(1),Jan.2007.Citedon220,221DorogovtsevS.,MendesJ.,andSamukhinA.StructureofGrowingNetworkswithPreferentialLinking.PhysicalReviewLetters,85:4633–4636,2000.Citedon175DorogovtsevS.,MendesJ.,andDorogovtsevS.EvolutionofNetworks:FromBiologicalNetstotheInternetandWWW.OxfordUniversityPress,NewYork,NY,2003.Citedon174,177,181DunneJ.,WilliamsR.,andMartinezN.Food-WebStructureandNetworkTheory:TheRoleofConnectanceandSize.ProceedingsoftheNationalAcadamyofSciencesoftheUnitedStatesofAmerica,99(20):12917–12922,Oct.2002.Citedon158EadesP.AHeuristicforGraphDrawing.CongressusNumerantium,42:149–160,1984.Citedon47,48EdmondsJ.andJohnsonE.Matching,EulerTours,andtheChinesePostman.Math-ematicalProgramming,5(1):88–124,Dec.1973.Citedon90ErdosP.andR¨enyiA.OnRandomGraphs.´PublicationesMathematicae,6:290–297,1959.Citedon158ErikssonH.MBone:TheMuliticastBackbone.CommunicationsoftheACM,37(8):54–60,Aug.1994.Citedon107EugsterP.,GuerraouiR.,KermarrecA.-M.,andMassoulieL.EpidemicInformation´DisseminationinDistributedSystems.Computer,37(5):60–67,May2004.Citedon143FronczakA.,FronczakP.,andHolystJ.Mean-fieldTheoryforClusteringCoeffi-cientsinBarabasi-AlbertNetworks.´PhysicalReviewE,68(4):046126,Oct.2003.Citedon178FronczakA.,FronczakP.,andHolystJ.AveragePathLengthinRandomNetworks.PhysicalReviewE,70(5):056110,Nov.2004.Citedon162,179GareyM.andJohnsonD.ComputersandIntractibility:AGuidetotheTheoryofNP-Completeness.Freeman,NewYork,1979.Citedon130GibbonsA.AlgorithmicGraphTheory.CambridgeUniversityPress,Cambridge,UK,1985.Citedon90,92GoodrichM.andTamassiaR.AlgorithmDesign:Foundations,AnalysisandInternetExamples.JohnWiley,NewYork,2002.Citedon112,121,129GrahamR.andHellP.OntheHistoryoftheMinimumSpanningTreeProblem.AnnalsoftheHistoryofComputing,7(1):43–57,Jan.1985.Citedon116GranovetterM.TheStrengthofWeakTies.AmericanJournalofSociology,78(6):1360–1380,May1973.Citedon230GrotschelM.andPadbergM.Ulysses2000:InSearchofOptimalSolutionstoHard¨CombinatorialProblems.TechnicalReportZIB-SC-93-34,ZIB,Berlin,Nov.1993.Citedon93,94281 GulliA.andSignoriniA.TheIndexableWebisMorethan11.5BillionPages.In14thInternationalWorldWideWebConference.ACM,May2005.Citedon8HaddadiH.,FayD.,JamakovicA.,MaennelO.,MooreA.W.,MortierR.,RioM.,andUhligS.BeyondNodeDegree:EvaluatingASTopologyModels.TechnicalReportUCAM-CL-TR-725,UniversityofCambridge,ComputerLaboratory,Cambridge,UK,July2008.Citedon194HageP.andHararyF.StructuralModelsinAnthropology.CambridgeUniversityPress,Cambridge,UK,1983.Citedon147HallJ.,HartlineJ.D.,KarlinA.R.,SaiaJ.,andWilkesJ.OnAlgorithmsforEfficientDataMigration.In12thSymposiumonDiscreteAlgorithms,pages620–629,NewYork,NY,Jan.2001.ACM-SIAM,ACMPress.Citedon69HannemanR.andRiddleM.IntroductiontoSocialNetworkMethods.LectureNotes,UniversityofCaliforniaatLosAngeles,CA,2005.Citedon259HararyF.OntheNotionofBalanceofaSignedGraph.MichiganMathematicalJournal,2(2):143–146,1953.Citedon243,244HolmeP.andKimB.GrowingScale-FreeNetworkswithTunableClustering.PysicalReviewE,65(2):026107,Jan.2002.Citedon181,182HolzmannG.andPehrsonB.TheEarlyHistoryofDataNetworks.IEEEComputerSocietyPress,LosAlamitos,CA.,1995.Citedon4HustonG.ExploringAutonomousSystemNumbers.TheInternetProtocolJournal,9(1):2–23,Mar.2006.Citedon193JacksonM.SocialandEconomicNetworks.PrincetonUniversityPress,Princeton,NJ,2008.Citedon227JelasityM.,VoulgarisS.,GuerraouiR.,KermarrecA.-M.,andSteenM.van.Gossip-basedPeerSampling.ACMTransactionsonComputerSystems,25(3),Aug.2007.Citedon208JelasityM.,KowalczykW.,andSteenM.van.NewscastComputing.InAdvancedComputationalTechnologies.RomanianAcademicPress,2010.Citedon209JenkinsK.andDemersA.LogarithmicHararyGraphs.In21stInternationalCon-ferenceonDistributedComputingSystemsWorkshops,LosAlamitos,CA.,Apr.2001.IEEE,IEEEComputerSocietyPress.Citedon42JuddC.,McClellandG.,andRyanC.DataAnalysis,AModelComparisonApproach.Routledge,Hove,UK,2ndedition,2009.Citedon136,138KleinbergJ.TheConvergenceofSocialandTechnologicalNetworks.Communica-tionsoftheACM,51(11):66–72,Nov.2008.Citedon230KnokeD.andYangS.SocialNetworkAnalysis.Number07-154inQuantativeAp-plicationsintheSocialSciences.SAGEPublications,ThousandOaks,CA,2ndedition,2008.Citedon252,266KotschutzkiD.,LehmannK.,PeetersL.,RichterS.,Tenfelde-PodehlD.,andZlo-towskiO.CentralityIndices.InBrandesU.andErlebachT.,editors,NetworkAnalysis,volume3418ofLectureNotesonComputerScience,pages16–61.Springer-Verlag,Berlin,2005.Citedon151KruskalJ.OntheShortestSpanningSubtreeofaGraphandtheTravelingSalesmanProblem.Proc.AmericanMathematicalSociety,7(1):48–50,Feb.1956.Citedon116KuanM.-K.GraphicProgrammingUsingOddorEvenPoints.ChineseMathematics,282 1:273–277,1962.Citedon87LevienR.,editor.SignpostsinCyberspace:TheDomainNameSystemandInternetNav-igation.NationalAcademicResearchCouncil,Washington,DC,2005.Citedon213LewisT.G.NetworkScience:TheoryandPractice.JohnWiley,NewYork,2009.Citedon157LiL.,AldersonD.,DoyleJ.,andWillingerW.TowardsaTheoryofScale-FreeGraphs:Definitions,Properties,andImplications.InternetMathematics,2(4):431–523,2005.Citedon139,140LickliderJ.andTaylorR.TheComputerasaCommunicationDevice.ScienceandTechnology,Apr.1968.Citedon9LiuB.WebDataMining.Springer-Verlag,Berlin,2007.Citedon215LorrainF.andWhiteH.StructuralEquivalenceofIndividualsinSocialNetworks.JournalofMathematicalSociology,1:49–80,1971.Citedon255LuaE.,CrowcroftJ.,PiasM.,SharmaR.,andLimS.ASurveyandComparisonofPeer-to-PeerOverlayNetworkSchemes.IEEECommunicationsSurveys&Tutorials,7(2):22–73,Apr.2005.Citedon197LuksE.IsomorphismofGraphsofBoundedValencecanbeTestedinPolynomialTime.JournalofComputerandSystemSciences,25(1):42–65,Aug.1982.Citedon37MacedoniaM.andBrutzmanD.MBoneProvidesAudioandVideoAcrosstheIn-ternet.Computer,27(4):30–36,Apr.1994.Citedon107MalkinG.andSteenstrupM.Distance-VectorRouting.InSteenstrupM.,editor,RoutinginCommunicationsNetworks,pages83–98.PrenticeHall,EnglewoodCliffs,N.J.,1995.Citedon126MandelJ.TheStatisticalAnalysisofExperimentalData.DoverPublications,NewYork,NY,1984.Citedon138McKayB.PracticalGraphIsomorphism.CongressusNumerantium,30:45–87,1980.Citedon37McQuillanJ.GraphTheoryAppliedtoOptimalConnectivityinComputerNet-works.ACMComputerCommunicationsReview,7(2):13–41,Apr.1977.Citedon42MichaelJ.LaborDisputeReconciliationinaForestProductsManufacturingFacility.ForestProductsJournal,47(11):41–45,Nov.1997.Citedon225MitchellM.Complexity,AGuidedTour.OxfordUniversityPress,Oxford,UK,2009.Citedon266MokkenJ.Cliques,Clubs,andClans.QualityandQuantity,13(2):161–173,Apr.1979.Citedon246,247MoyJ.Link-StateRouting.InSteenstrupM.,editor,RoutinginCommunicationsNetworks,pages135–157.PrenticeHall,EnglewoodCliffs,N.J.,1995.Citedon120NewmanM.AssortativeMixinginNetworks.PhysicalReviewLetters,89:208701,2002.Citedon136NewmanM.TheStructureandFunctionofComplexNetworks.SIAMReview,45:167–256,2003a.Citedon146NewmanM.MixingPatternsinNetworks.Phys.Rev.E,67(2):026126,Feb.2003b.Citedon137NewmanM.,BarabasiA.-L.,andWattsD.,editors.TheStructureandDynamicsof283 Networks.PrincetonUniversityPress,Princeton,NJ,2006.Citedon266OliveiraR.,ZhangB.,andZhangL.InSearchoftheElusiveGroundTruth:TheInternet’sAS-levelConnectivityStructure.InInternationalConferenceonMeasure-mentsandModelingofComputerSystems,pages217–228,NewYork,NY,June2008.ACM,ACMPress.Citedon195OpsahlT.andPanzarasaP.ClusteringinWeightedNetworks.SocialNetworks,31:155–163,2009.Citedon149PadgettJ.andAnsellC.RobustActionandtheRiseoftheMedici,1400–1434.Amer-icanJournalofSociology,98(6):1259–1319,May1993.Citedon226,227PallaG.,DerenyiI.,FarkasI.,andVicsekT.UncoveringtheOverlappingCommu-´nityStructureofComplexNetworksinNatureandSociety.Nature,435:814–818,June2005.Citedon249PallaG.,FarkasI.,PollnerP.,DerenyiI.,andVicsekT.DirectedNetworkModules.´NewJournalofPhysics,9:186,June2007.Citedon250,251PanduranganG.,PrabhakarR.,andUpfalE.UsingPageRanktoCharacterizeWebStructure.InternetMathematics,3(1):1–20,2006.Citedon221PorterM.,OnnelaJ.-P.,andMuchaP.CommunitiesinNetworks.NoticesoftheAmericanMathematicalSociety,56(9):1082–1097,2009.Citedon229,249PosaL.HamiltonianCircuitsinRandomGraphs.DiscreteMathematics,14(4):359–364,1976.Citedon99RazD.andCohenR.TheInternetDarkMatter:OnTheMissingLinksintheASConnectivityMap.In25thINFOCOMConference,pages1–12,LosAlamitos,CA.,Apr.2006.IEEE,IEEEComputerSocietyPress.Citedon195SalusP.andQuartermanJ.DisruptionsandEmergenciesontheInternet.MatrixNetSystems,Sept.2002.TalkpresentedatTPRC2002.Citedon3ScottJ.SocialNetworkAnalysis,AHandbook.SAGEPublications,London,UK,2ndedition,2000.Citedon247,266SerranoM.,MaguitmanA.,BogunaM.,FortunatoS.,andVespignaniA.Decod-ingtheStructureoftheWWW:AComparativeAnalysisofWebCrawls.ACMTransactionsontheWeb,1(2),2007.Citedon217,219ShermanL.SociometryintheClassroom:Howtodoit.http://www.users.muohio.edu/shermalw/,2000.Citedon231StoicaI.,MorrisR.,Liben-NowellD.,KargerD.R.,KaashoekM.F.,DabekF.,andBalakrishnanH.Chord:AScalablePeer-to-peerLookupProtocolforInternetApplications.IEEE/ACMTransactionsonNetworking,11(1):17–32,Feb.2003.Citedon197,202,204TanenbaumA.andSteenM.van.DistributedSystems,PrinciplesandParadigms.Pren-ticeHall,UpperSaddleRiver,N.J.,2ndedition,2007.Translations:German,Por-tugese,Italian.Citedon196ThelwallM.LinkAnalysis,AnInformationScienceApproach.ElsevierAcademicPress,Amsterdam,TheNetherlands,2004.Citedon215ThimblebyH.TheDirectedChinesePostmanProblem.Software–PracticeandExpe-rience,33(11):1081–1096,Sept.2003.Citedon89UrdanetaG.,PierreG.,andSteenM.van.WikipediaWorkloadAnalysisforDecen-tralizedHosting.ComputerNetworks,53(11):1830–1845,July2009.Citedon9284 VandegriendB.FindingHamiltonianCycles:Algorithms,GraphsandPerformance.Master’sthesis,UniversityofAlberta,DepartmentofComputingScience,Ed-monton,Canada,1998.Citedon99Vega-RedondoF.ComplexSocialNetworks.CambridgeUniversityPress,Cambridge,UK,2007.Citedon174,175VellemanD.HowToProveIt.JohnWiley,NewYork,2ndedition,2006.Citedon265VossJ.MeasuringWikipedia.In10thInternationalConferenceoftheInternationalSoci-etyforScientometricsandInformetrics,July2005.Citedon9VoulgarisS.,GavidiaD.,andSteen.M.van.CYCLON:InexpensiveMembershipManagementforUnstructuredP2POverlays.JournalofNetworkandSystemsMan-agement,13(2):197–217,June2005.Citedon211WamsJ.andSteenM.van.InternetMessaging.InSinghM.,editor,PracticalHand-bookofInternetComputing,chapter7,pages7–1–7–18.CRCPress,BocaRaton,FL,2004.Citedon9WassermanS.andFaustK.SocialNetworkAnalysis:MethodsandApplications.Cam-bridgeUniversityPress,Cambridge,UK,1994.Citedon147,234,240,247,252,258,266WattsD.SixDegrees:TheScienceofaConnectedAge.Norton,NewYork,NY,2003.Citedon10,266WattsD.The”New”ScienceofNetworks.AnnualReviewofSociology,30:243–270,2004.Citedon157WattsD.SmallWorlds,TheDynamicsofNetworksbetweenOrderandRandomness.PrincetonUniversityPress,Princeton,NJ,1999.Citedon158WattsD.andStrogatzS.CollectiveDynamicsofSmallWorldNetworks.Nature,393:440–442,June1998.Citedon144,167WestD.AnIntroductiontoGraphTheory.PrenticeHall,EnglewoodCliffs,N.J.,2ndedition,2001.Citedon41,84,265WilliamsG.LinearAlgebrawithApplications.JonesandBartlett,Sudberry,MA,4thedition,2001.Citedon239WilsonR.FourColorsSuffice,HowtheMapProblemWasSolved.PrincetonUniversityPress,Princeton,NJ,2004.Citedon266XuW.andLiuZ.HowCommunityStructureInfluencesEpidemicSpreadinSocialNetworks.PhysicaA:StatisticalMechanicsanditsApplications,387(2-3):623–630,Jan.2008.Citedon143285

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭