Statistical Analysis of Network Data with R(第二版) 上机实操2-锐单电子商城

3.2 图形可视化元素

图片可视化的核心是用点、圆、矩形等符号表示节点，用平滑曲线表示边缘。绘图协议、审美和约束drawing conventions, aesthetics, and constraints

3.3 图的布局

布局是图形可视化的核心，是空间中节点和边缘的位置（坐标）

(1)环形布局

最简单的布局，所有节点等距分布在一个圆周上，连接节点的边缘通过圆，交叉随机网络

library(sand) igraph.options(vertex.size=3,vertex.label=NA,edge.arrow.size=0.5) g <- make_lattice(c(5,5,5)) plot(g,layout=layout.circle) title("lattice")

一般网络

> par(mfrow=c(1,2)) > data("aidsblog") > plot(aidsblog) > title("common") > plot(aidsblog,layout=layout.circle) > title("circle")

(2)弹性布局:

将侧面视为物理力，分为引力和排斥力。引力节点之间的距离应尽可能小，排斥节点之间的距离应尽可能大弹力布局可用fruchterman.reingold算法(以两个大神的名字命名)

> data("aidsblog") > igraph.options(vertex.size=3,vertex.label=NA,edge.arrow.size=0.5) > plot(aidsblog,layout=layout.fruchterman.reingold)

fruchterman.reingold也是pajek默认布局，如下图所示

也可用kamada.kawai布局

plot(aidsblog,layout=layout.kamada.kawai)

(3)树型布局:

gt <- graph_from_literal(1- 2,1- 3,1- 4,2- 5,2- 6,                          2- 7,3- 8,3- 9,4- 10) igraph.options(vertex.size=30,vertex.label=NULL,edge.arrow.size=0.5) plot(gt)

上图是R语言的默认布局，但不直观。此外，代码中vertex.label=NULL参数NULL，图中显示数字，如果是NA什么都不显示设置布局更直观 plot(gt,layout=layout_as_tree)

(4)二部图布局:

gb <- graph_from_literal(a1:a2:a3,c1:c2,a1:a2-c1,a2:a3-c2) V(gb)$type <- grepl("^a",V(gb)$name) print_all(gb,v=T) plot(gb, layout= -layout_as_bipartite(gb)[,2:1],       vertex.size=60, vertex.shape=ifelse(V(gb)$type,       "rectangle", "circle"),       vertex.label.cex=1.75,       vertex.color=ifelse(V(gb)$type, "red", "cyan"))

3.4 装饰图的布局

图纸的基本结构决定了节点和边缘的位置。此外，还可以修改节点和边缘的大小、形状和颜色。以空手道网络为例，先看原图：

方法一:直接设置节点和边缘属性(这些属性属于图本身)

(1)原图中的节点name属性

> V(karate)$name  [1] "Mr Hi"    "Actor 2"  "Actor 3"  "Actor 4"  "Actor 5"  "Actor 6"   [7] "Actor 7"  "Actor 8"  "Actor 9"  "Actor 10" "Actor 11" "Actor 12" [13] "Actor 13" "Actor 14" "Actor 15" "Actor 16" "Actor 17" "Actor 18" [19] "Actor 19" "Actor 20" "Actor 21" "Actor 22" "Actor 23" "Actor 24" [25] "Actor 25" "Actor 26" "Actor 27" "Actor 28" "Actor 29" "Actor 30" [31] "Actor 31" "Actor 32" "Actor 33" "John A"

（2）用name属性生成label属性

V(karate)$label <- sub("Actor ","",V(karate)$name) > V(karate)$label  [1] "Mr Hi"  "2"      "3"      "4"      "5"      "6"      "7"       [8] "8"      "9"      "10"     "11"     "12"     "13"     "14"     [15] "15"     "16"     "17"     "18"     "19"     "20"     "21"     [22] "22"     "23"     "24"     "25"     "26"     "27"     "28"     [29] "29"     "30"     "31"     "32"     "33"     "John A"

(3)添加节点shape根据一般和特殊的原则设置不同的值

V(karate)$shape <- "circle" V(karate)[c("Mr Hi","John A")]$shape <- "rectangle"

(4)有原始数据节点Faction根据该属性为节点设置颜色值

V(karate)[Faction==1]$color="red" V(karate)[Faction==2]$color="dodgerblue"

(5)根据边缘的权重确定节点的大小，首先设置节点size属性

strength()在这种情况下，边缘的权重是已知的

> edge.attributes(karate) $weight  [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3 [36] 2 3 5 3 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2 [71] 2 4 2 3 3 4 4 5 > V(karate)$size <- 4*sqrt(strength((karate))) > V(karate)$size  [1] 25.922963 21.540659 22.978251 16.970563 11.313708 14.966630 14.422205  [8] 14.422205 16.492423  6.928203 11.313708  6.928203  8.000000 16.492423 [15]  8.944272 10.583005  9.797959  6.928203  6.928203  8.944272  8.000000 [22]  8.000000  8.944272 18.330303 10.583005 14.966630  9.797959 14.422205 [29]  9.797959 14.422205 13.266499 18.330303 24.657656 27.712813

(6)增加边缘width属性是边缘宽带，其大小等于边缘的权重值

> edge.attributes(karate) $weight  [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3 [36] 2 3 5 3 3  3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2
[71] 2 4 2 3 3 4 4 5
> E(karate)$width <- E(karate)$weight

（7）节点需要通过两个端点来确定，所以在设置边属性时，需要通过节点来选择边

f1 <- V(karate)[Faction==1]
f2 <- V(karate)[Faction==2]
#两个节点都属于f1集合的
E(karate)[f1%--%f1]$color="pink"
#两个节点都属于f2集合的
E(karate)[f2%--%f2]$color="lightblue"
#一个节点属于f1集合、另一个节点属于f2集合
E(karate)[f1%--%f2]$color="yellow"

（8）设置 标签的偏移量、出图

V(karate)$label.dist <- ifelse(V(karate)$size >=10,0,0.75)
plot(karate,layout=layout.kamada.kawai)

方法二：设置plot()函数的参数（不改变图本身的属性）

> data("lazega")
> # lazega是graph对象
> vertex.attributes(lazega)
$name
 [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33"
[34] "V34" "V35" "V36"

$Seniority
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36

$Status
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1

$Gender
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1
[36] 1

$Office
 [1] 1 1 2 1 2 2 2 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 2
[36] 1

$Years
 [1] 31 32 13 31 31 29 29 28 25 25 23 24 22  1 21 20 23 18 19 19 17  9 16
[24] 15 15 15 13 11 10  7  8  8  8  8  8  5

$Age
 [1] 64 62 67 59 59 55 63 53 53 53 50 52 57 56 48 46 50 45 46 49 43 49 45
[24] 44 43 41 47 38 38 39 34 33 37 36 33 43

$Practice
 [1] 1 2 1 2 1 1 2 1 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2
[36] 1

$School
 [1] 1 1 1 3 2 1 3 3 1 3 1 2 2 1 3 1 1 2 1 1 2 3 2 2 2 3 1 2 3 3 2 3 3 2 3
[36] 3

（1）把不同的办公地点赋给不同的颜色

因为该向量是int型，可用索引法

> colbar <- c("red","dodgerblue","goldenrod")
> class(V(lazega)$Office)
[1] "integer"
> V(lazega)$Office %>% unique()
[1] 1 2 3
> v_colors <- colbar[V(lazega)$Office]
> # 时刻提醒自己结果是一个partition
> v_colors
 [1] "red"        "red"        "dodgerblue" "red"        "dodgerblue"
 [6] "dodgerblue" "dodgerblue" "red"        "red"        "red"       
[11] "red"        "red"        "red"        "dodgerblue" "goldenrod" 
[16] "red"        "red"        "dodgerblue" "red"        "red"       
[21] "red"        "red"        "red"        "red"        "dodgerblue"
[26] "red"        "red"        "dodgerblue" "red"        "dodgerblue"
[31] "dodgerblue" "dodgerblue" "dodgerblue" "red"        "dodgerblue"
[36] "red"

（2）同样的思路，把节点的Practice赋予不同的形状

> v_shape <- c("circle","square")[V(lazega)$Practice]

（3）依据节点的Years属性设置节点大小

> v_size <- 3.5 * sqrt(V(lazega)$Years)

（4）将Seniority属性作为节点标签

> v_label <- V(lazega)$Seniority

（5）将以上变量作为plot()函数的参数传入

> l <- layout_with_fr(lazega)
> plot(lazega,layout=l,
+      vertex.color=v_colors,
+      vertex.shape=v_shape,
+      vertex.size=v_size,
+      vertex.label=v_label)

3.5 大型网络的可视化

当节点数有上千个时，图就变成“毛球”，下图不过1431个节点

> summary(fblog)
IGRAPH 3e87bca UN-- 192 1431 -- 
+ attr: name (v/c), PolParty (v/c)
> plot(fblog)

试着把不同的政党用不同颜色区分一下

> party_name <- sort(unique(V(fblog)$PolParty))
> party_name
[1] " Cap21"                   " Commentateurs Analystes"
[3] " Les Verts"               " liberaux"               
[5] " Parti Radical de Gauche" " PCF - LCR"              
[7] " PS"                      " UDF"                    
[9] " UMP"                    
> party_name %>% class()
[1] "character"
> # 将政党名称由字符型转换一下
> # 如果直接转成因子型，因子水平很凌乱
> party_name %>% as.factor()
[1]  Cap21                    Commentateurs Analystes
[3]  Les Verts                liberaux               
[5]  Parti Radical de Gauche  PCF - LCR              
[7]  PS                       UDF                    
[9]  UMP                    
9 Levels:  Cap21  Commentateurs Analystes  Les Verts ...  UMP
> # 将政党名称由字符型转换成因子型，再转整型
> party_name %>% as.factor() %>% as.numeric()
[1] 1 2 3 4 5 6 7 8 9
> party_color <- party_name %>% as.factor() %>% as.numeric()
> # 出图
> plot(fblog,vertex.color=party_color,vertex.size=3,vertex.label=NA)

依然看不清节点之间的关系基于 VxOrd 的 DrL 算法，可用来处理大型图，教材说它把节点进行了聚类，但我觉得也不理想

> l <- layout_with_drl(fblog)
> # 出图
> plot(fblog,vertex.color=party_color,vertex.size=5,vertex.label=NA,layout=l)

但这个图可用作进一步分析的起点，先用drl算法发现节点可聚类，接下来用图分割法（graph partitioning methods） igraph有专门函数contract() This function creates a new graph, by merging several vertices into one. The vertices in the new graph correspond to sets of vertices in the input graph.

fblog_c <- contract(fblog,party_num)
#注意用来分割节点的参数是一个partition（192个），与上例给政党上色的参数（9个）不同
party_num <- V(fblog)$PolParty %>% as.factor() %>% as.numeric()
plot(fblog_c,vertex.label=NA)

出图后，原图中每个节点与其他类的节点的边被保留下来，所得图是一个多重边图。新节点的标签也组合了原图节点标签

原书simplify(fblog.c)部分代码执行失败，且未找到解决办法，略。

从网络中选择导出子图

> # 普查每个节点的直接邻居（含自身）
> k_nbhds <- make_ego_graph(karate,order = 1)
> sapply(k_nbhds,vcount)
 [1] 17 10 11  7  4  5  5  5  6  3  4  2  3  6  3  3  3  3  3  4  3  3  3
[24]  6  4  4  3  5  4  5  5  7 13 18
> # 选择最大的两个
> k1 <- k_nbhds[[1]]
> k2 <- k_nbhds[[34]]
> par(mfrow=c(1,2))
> plot(k1)
> plot(k2)

资讯详情

Statistical Analysis of Network Data with R(第二版) 上机实操2