Issuu on Google+

Harnessing network science to reveal our digital footprints Jukka-Pekka “JP� Onnela Harvard University

University of Waterloo; January 26, 2011


Analysis and modeling of social networks

Metrics and methods for network analysis

Network theory

Online social systems and social media

2


Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions

3


Phone calls and texts in a European network

Animation by Mikko Kivel채, Helsinki University of Technology, Finland. 4


Tie strengths in social networks The weak ties hypothesis •

The stronger the tie between A and B, the higher the fraction of common friends they have

Mark Granovetter, The strength of weak ties, American Journal of Sociology 78, 1360, 1973

5


Tie strengths in social networks Revisiting the hypothesis with cell phone data • Tie strength? • Fraction of friends in common?

7 min 15 min

(3 calls)

5 min 3 min

Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 6


Tie strengths in social networks

7


Tie strengths in social networks

mean

std

max

degree k

3.3

2.5

144

weight wN

15.4

37.3

3,610

weight wD

41 min

206 min

663 h

strength sN

51

75

3,644

strength sD

135 min

386 min

690 h

8


Tie strengths in social networks

Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 9


Tie strengths in social networks Initial connected network

10


Tie strengths in social networks 80% of the strongest links removed

11


Tie strengths in social networks Initial connected network

12


Tie strengths in social networks 80% of the weakest links removed

13


Connected phase

Connected phase

Disconnected phase

14


Tie strengths in social networks Qualitative difference at the global level • Phase transition when weak ties (red) are removed first • No phase transition when strong ties (black) are removed first • Quantitative division between weak and strong: Order parameter RLCC (fraction of nodes in LCC) Susceptibility S (average cluster size excl. LCC) Phase 1: Connected phase Phase 2: Disconnected phase

15


Implications for spreading processes on networks?

16


Diffusion in social networks Most studies of diffusion are based on small, binary networks • Use the observed network as a platform to study weighted diffusion (SI) • Start with one infected node; Infect neighbors with given probabilities • Weighted model

Unweighted

Number of infected nodes

Fraction of infected nodes

• Unweighted model

Weighted

Time

Time

17


Diffusion in social networks Where do individuals get their information from?

Unweighted

• Unweighted: Infections via “weak” ties • Weighted: Infections via “intermediate” ties • WT’s have access to new info • WT’s have low transmissions • ST’s have high transmission rates • ST’s rarely have access to new info

18

Weighted


Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions

19


Network community • “A group of nodes that are relatively densely connected to each other but relatively sparsely connected to other nodes in the network” • Communities are thought to have a strong bearing on functional units in networks (e.g. social) • Community detection is one of the most active areas of research in network science

Porter, Onnela, Mucha, Communities in networks, Notices of the American Mathematical Society 56, 1082, 2009 20


Example community • Zachary Karate Club network describes the friendships between 34 members of a karate club at a U.S. university in the 1970s • After an internal dispute, the club split in two, and members chose preferentially to be with their friends • Node color indicates post-split club affiliation • Community detection: What’s the “best” way to split the group?

21


Modularity maximization • Modularity maximization is the most commonly used method

• Assign nodes to communities to maximize modularity (algorithmic definition) • More within-community edges than one would expect at random

Newman, Modularity and community structure in networks, PNAS 103, 8577, 2006 22


Multislice community detection Goal: Extend modularity maximization to deal with • Time-dependent networks: Nodes and ties may change in time • Multiscale networks: Structure simultaneously present at multiple scales • Multiplex networks: Multiple types of ties

Mucha, Richardson, Macon, Porter, Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328, 876, 2010 23


Multislice community detection • Introduce multiple slices • Connect slices by connecting nodes across slides • Null model?

ORDERED: neighbors

CATEGORICAL: all to all

24


Modularity from a dynamical process • Quality of a partition in terms of its “stability”, which is an autocovariance function of an ergodic Markov process on the network:

RM (t) =

� C

p˙i =

� Aij j

kj

[P (C, t) − P (C, ∞)]

pj − pi

• Expansion of matrix exponential to first-order in time recovers NewmanGirvan modularity with “resolution parameter”

Lambiotte, Delvenne, Barahona, arxiv:0812.1770

25


Multislice formulation • Undirected network slices • Undirected couplings • Define multislice strength

Aijs = Ajis Cjrs = Cjsr κjs = kjs + cjs

• Density of random walkers in node i at slice s:

p˙is =

� jr

∗ pjr

(Aijs δsr + δij Cjsr )pjr /κjr − pis within slice

between slices

2µ =

= κjr /(2µ)

� jr

26

κjr


Multislice formulation • Null model: Probability of sampling node-slice is conditional on whether the multislice structure allows one to step from node j at slice r to node i at slice s:

∗ ρis|jr pjr

kis kjr Cjsr cjr κjr = δsr + δij 2ms κjr cjr κjr 2µ

• Subtracting this conditional joint probability from the linear in time approximation of the exponential describing the Laplacian dynamics gives

Qmultislice

�� �

1 = 2µ ijsr

kis kjs Aijs − γs δsr 2ms

• Each slice has its own resolution parameter • Intra-slice couplings

+ δij Cjsr δ(cis , cjr )

γs

Cjsr = {0, ω}

Mucha, Richardson, Macon, Porter, Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328, 876, 2010 27


Application I: College students (multiplex) • “Tastes, ties, and time” multiplex network of 1640 college students • Examine the following symmetrized ties from one wave of data: 1.

Facebook friendships

2.

Facebook picture friendships (upload & tag a photo)

3.

Roommates (share dormitory room, creating clusters of 1-6 students)

4.

Housing group (preference to be placed in same upper-class residence)

• Slices are categorical, hence inter-slice coupling from all slices to all slices

28


Application I: College students (multiplex) • When omega = 0, individuals (must be) placed in four separate communities • Increasing omega causes communities to merge across slices, especially if the patterns of connection are similar between slices (tie types) • For intermediate omega, most individuals are placed in 1 or 2 communities, indicating their networks maintain group-level similarities across tie types • Small minority maintain 4 separate assignments => different positions in slices

ω

#communities

0 0.1 0.2 0.3 0.4 0.5 1

1036 122 66 49 36 31 16

# communities per individual 1

2

0 14% 19.9% 26.2% 31.8% 39.3% 100%

0 40.5% 49.1% 48.3% 47% 42.4% 0

3

4

0 100% 37.3% 8.2% 25.3% 5.7% 21.6% 3.9% 18.4% 2.8% 16.8% 1.5% 0 0 29

1

2

3

4


Application II: Karate club (multiscale) • Zachary Karate Club consists of 34 members of a 1970s university club • An internal dispute led to the schism of the karate club into two smaller clubs • Sociologist Wayne Zachary studied club’s friendships when schism occurred • Realized he might have been able to predict the split in advance • Classic small-scale social network and typical small-scale benchmark • Color = actual post-split affiliation • Dashed lines = divisions

30


Application II: Karate club (multiscale) • Keep the same unweighted 34 x 34 adjacency matrix across all 16 ordered slices • Resolution dictated by a specified sequence of resolution parameters gamma = {0.25, 0.5, ..., 4} • Communities shown for inter-slice coupling omega = 0 (top) and omega = 0.1(bottom) • Colors correspond to communities (repeat colors in the top panel across uncoupled slices) • Dashed lines partition the network into four communities at the default resolution of modularity (gamma = 1)

31


Application III: US Senate (longitudinal) • 100 Senators serving staggered six-year terms • Study Congresses 1 - 110, covering 1789-2008, with 1884 individual Senators • Define weighted connections between each pair of Senators in terms of similarity of their voting dynamics (independently for each two-year Congress) • Define adjacency matrices based on roll-call votes: where and

αijk

bij

Aij = (1/bij )

� k

equals unity if and only if i and j voted the same on bill k

is the total number of bills on which both legislators voted

• Ordered inter-slice coupling from each Senator to himself only when in consecutive Congresses • Note that link strengths and nodes change from one slice to another

32

αijk


Application III: US Senate (longitudinal) • Obtain 9 communities (color coded) using inter-slice coupling omega = 0.5 • Dark blue and red correspond to modern Democratic and Republican parties • Vertical gray bars indicate Congresses in which three communities appeared

Nominal party affiliations:

• Pro-Administration (PA) • Anti-Administration (AA) • Federalist (F) • Democratic-Republican (DR) • Whig (W) • Anti-Jackson (AJ) • Adams (A) • Jackson (J) • Democratic (D) • Republican (R) 33


Application III: US Senate (longitudinal) • Obtain 9 communities (color coded) using inter-slice coupling omega = 0.5 • Dark blue and red correspond to modern Democratic and Republican parties • Vertical gray bars indicate Congresses in which three communities appeared

Gray areas:

• 4th and 5th: First with political parties •10th and 11th:Vice President Aaron Burr's indictment for treason •14th and 15th: Changing structures in Democratic-Republican party •31st: Compromise of 1850 •37th: Beginning of the American Civil War •73rd and 74th: Landslide 1932 election amidst the Great Depression •85th to 88th:

34

Brought the major American civil rights acts


But these are proofs of concept. What can we do with this for real?

35


Application: Health care

Onnela et al, Impact of physician communities for healthcare costs, working paper, 2011 36


Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions

37


Social influence • Ways in which people affect each others’ beliefs, feelings, and behaviors • Traditionally the domain of social psychology • Prominent in contagion in sociology, herding behavior in economics, speculative bubbles in financial markets, public health, etc. • Online social systems provide a complementary perspective

• Closed and data rich systems • Access to complete populations of agents without sampling • Platform? • Behavior?

38


Social influence and Facebook • Facebook has free applications or “apps” • Focus on a simple, observable behavior: Facebook “app” installation • Installation is not use! • Since apps are free, why would influence matter? • Popular applications: • Readily discoverable (low search cost) • High quality (exhaustively tested) • High functionality (superior features)

39


Social influence and Facebook Local information

John Doe I

John Doe II

Jane Doe I

Jane Doe III

Jane Doe II

Jane Doe IV

Global information

40


Social influence and Facebook •

Each installation contributes to both local and global information

Each installation is a microscopic social stimulus

Superposition of 104 million application installations

Possibility of cascades, or adoption ripples, in the network

41


Data • Time period June 25, 2007 - August 14, 2007 • Hourly data for 2,705 applications, T=1,208 time steps • Number of application i users at time t denoted by ni(t)

42

7/4/07 0:01 7/4/07 1:01 7/4/07 2:01 7/4/07 3:01 7/4/07 4:01 7/4/07 5:01 7/4/07 6:01 7/4/07 7:01 7/4/07 8:01 7/4/07 9:03 7/4/07 10:01 7/4/07 11:02 7/4/07 12:02 7/4/07 13:02 7/4/07 14:01 7/4/07 15:01 7/4/07 16:01 7/4/07 17:03 7/4/07 18:03 7/4/07 19:03 7/4/07 20:03 7/4/07 21:03 7/4/07 22:03 7/4/07 23:03

1820 1836 1839 1847 1852 1860 1867 1874 1880 1889 1899 1908 1921 1931 1949 1964 1987 2000 2014 2025 2036 2048 2060 2071


We wanted to learn about individual behavior, but these are aggregate data?

43


Fluctuation scaling CORRELATED

σi ∼ µi

INDEPENDENT

CORRELATED

INDEPENDENT

σi ∼

1/2 µi

• Fluctuation scaling can be used to study collective behavior • Facebook “coins”, one per app per user, are coupled via local and global signals • What is the slope, i.e. extent of social influence, on Facebook? 44


Fluctuation scaling

σi ∼

α µi

α ∈ [0.5, 1]

Individual regime

αI ≈ 0.55

Collective regime

αC ≈ 0.85

0.36 = 55 installations a day

Onnela, Reed-Tsochas, The spontaneous emergence of social influence in online systems, PNAS 107, 18375, 2010 45


And then to something different...

46


Mislove, Lehmann, Ahn, Onnela, Rosenquist, Understanding the demographics of Twitter users, submitted, 2010.

47


Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions

48


Conclusions and Outlook • Structure of large-scale human social networks • Local, global, diffusion

• Community detection in multislice network • Multiscale, multiplex, time-dependent

• Online social networks • Social influence from aggregate data; Content

49


Conclusions and Outlook • Structure of large-scale human social networks • Local, global, diffusion • Cell phones as diagnostic tools • Community detection in multislice network • Multiscale, multiplex, time-dependent • Communication within evolving organizations • Online social networks • Social influence from aggregate data; Content • Public health applications & consumer confidence

50


Thank you


Harnessing Network Science