3.2 图形可视化元素
图片可视化的核心是用点、圆、矩形等符号表示节点,用平滑曲线表示边缘。 绘图协议、审美和约束drawing conventions, aesthetics, and constraints
3.3 图的布局
布局是图形可视化的核心,是空间中节点和边缘的位置(坐标)
(1)环形布局
最简单的布局,所有节点等距分布在一个圆周上,连接节点的边缘通过圆,交叉 随机网络
library(sand) igraph.options(vertex.size=3,vertex.label=NA,edge.arrow.size=0.5) g <- make_lattice(c(5,5,5)) plot(g,layout=layout.circle) title("lattice")
一般网络
> par(mfrow=c(1,2)) > data("aidsblog") > plot(aidsblog) > title("common") > plot(aidsblog,layout=layout.circle) > title("circle")
(2)弹性布局:
将侧面视为物理力,分为引力和排斥力。引力节点之间的距离应尽可能小,排斥节点之间的距离应尽可能大 弹力布局可用fruchterman.reingold算法(以两个大神的名字命名)
> data("aidsblog") > igraph.options(vertex.size=3,vertex.label=NA,edge.arrow.size=0.5) > plot(aidsblog,layout=layout.fruchterman.reingold)
fruchterman.reingold也是pajek默认布局,如下图所示
也可用kamada.kawai布局
plot(aidsblog,layout=layout.kamada.kawai)
(3)树型布局:
gt <- graph_from_literal(1- 2,1- 3,1- 4,2- 5,2- 6, 2- 7,3- 8,3- 9,4- 10) igraph.options(vertex.size=30,vertex.label=NULL,edge.arrow.size=0.5) plot(gt)
上图是R语言的默认布局,但不直观。此外,代码中vertex.label=NULL
参数NULL,图中显示数字,如果是NA
什么都不显示 设置布局更直观 plot(gt,layout=layout_as_tree)
(4)二部图布局:
gb <- graph_from_literal(a1:a2:a3,c1:c2,a1:a2-c1,a2:a3-c2) V(gb)$type <- grepl("^a",V(gb)$name) print_all(gb,v=T) plot(gb, layout= -layout_as_bipartite(gb)[,2:1], vertex.size=60, vertex.shape=ifelse(V(gb)$type, "rectangle", "circle"), vertex.label.cex=1.75, vertex.color=ifelse(V(gb)$type, "red", "cyan"))
3.4 装饰图的布局
图纸的基本结构决定了节点和边缘的位置。此外,还可以修改节点和边缘的大小、形状和颜色。 以空手道网络为例,先看原图:
方法一:直接设置节点和边缘属性(这些属性属于图本身)
(1)原图中的节点name属性
> V(karate)$name [1] "Mr Hi" "Actor 2" "Actor 3" "Actor 4" "Actor 5" "Actor 6" [7] "Actor 7" "Actor 8" "Actor 9" "Actor 10" "Actor 11" "Actor 12" [13] "Actor 13" "Actor 14" "Actor 15" "Actor 16" "Actor 17" "Actor 18" [19] "Actor 19" "Actor 20" "Actor 21" "Actor 22" "Actor 23" "Actor 24" [25] "Actor 25" "Actor 26" "Actor 27" "Actor 28" "Actor 29" "Actor 30" [31] "Actor 31" "Actor 32" "Actor 33" "John A"
(2)用name属性生成label属性
V(karate)$label <- sub("Actor ","",V(karate)$name) > V(karate)$label [1] "Mr Hi" "2" "3" "4" "5" "6" "7" [8] "8" "9" "10" "11" "12" "13" "14" [15] "15" "16" "17" "18" "19" "20" "21" [22] "22" "23" "24" "25" "26" "27" "28" [29] "29" "30" "31" "32" "33" "John A"
(3)添加节点shape根据一般和特殊的原则设置不同的值
V(karate)$shape <- "circle" V(karate)[c("Mr Hi","John A")]$shape <- "rectangle"
(4)有原始数据节点Faction根据该属性为节点设置颜色值
V(karate)[Faction==1]$color="red" V(karate)[Faction==2]$color="dodgerblue"
(5)根据边缘的权重确定节点的大小,首先设置节点size属性
strength()
在这种情况下,边缘的权重是已知的
> edge.attributes(karate) $weight [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3 [36] 2 3 5 3 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2 [71] 2 4 2 3 3 4 4 5 > V(karate)$size <- 4*sqrt(strength((karate))) > V(karate)$size [1] 25.922963 21.540659 22.978251 16.970563 11.313708 14.966630 14.422205 [8] 14.422205 16.492423 6.928203 11.313708 6.928203 8.000000 16.492423 [15] 8.944272 10.583005 9.797959 6.928203 6.928203 8.944272 8.000000 [22] 8.000000 8.944272 18.330303 10.583005 14.966630 9.797959 14.422205 [29] 9.797959 14.422205 13.266499 18.330303 24.657656 27.712813
(6)增加边缘width属性是边缘宽带,其大小等于边缘的权重值
> edge.attributes(karate) $weight [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3 [36] 2 3 5 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2
[71] 2 4 2 3 3 4 4 5
> E(karate)$width <- E(karate)$weight
(7)节点需要通过两个端点来确定,所以在设置边属性时,需要通过节点来选择边
f1 <- V(karate)[Faction==1]
f2 <- V(karate)[Faction==2]
#两个节点都属于f1集合的
E(karate)[f1%--%f1]$color="pink"
#两个节点都属于f2集合的
E(karate)[f2%--%f2]$color="lightblue"
#一个节点属于f1集合、另一个节点属于f2集合
E(karate)[f1%--%f2]$color="yellow"
(8)设置标签的偏移量、出图
V(karate)$label.dist <- ifelse(V(karate)$size >=10,0,0.75)
plot(karate,layout=layout.kamada.kawai)
方法二:设置plot()函数的参数(不改变图本身的属性)
> data("lazega")
> # lazega是graph对象
> vertex.attributes(lazega)
$name
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33"
[34] "V34" "V35" "V36"
$Seniority
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36
$Status
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1
$Gender
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1
[36] 1
$Office
[1] 1 1 2 1 2 2 2 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 2
[36] 1
$Years
[1] 31 32 13 31 31 29 29 28 25 25 23 24 22 1 21 20 23 18 19 19 17 9 16
[24] 15 15 15 13 11 10 7 8 8 8 8 8 5
$Age
[1] 64 62 67 59 59 55 63 53 53 53 50 52 57 56 48 46 50 45 46 49 43 49 45
[24] 44 43 41 47 38 38 39 34 33 37 36 33 43
$Practice
[1] 1 2 1 2 1 1 2 1 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2
[36] 1
$School
[1] 1 1 1 3 2 1 3 3 1 3 1 2 2 1 3 1 1 2 1 1 2 3 2 2 2 3 1 2 3 3 2 3 3 2 3
[36] 3
(1)把不同的办公地点赋给不同的颜色
因为该向量是int型,可用索引法
> colbar <- c("red","dodgerblue","goldenrod")
> class(V(lazega)$Office)
[1] "integer"
> V(lazega)$Office %>% unique()
[1] 1 2 3
> v_colors <- colbar[V(lazega)$Office]
> # 时刻提醒自己结果是一个partition
> v_colors
[1] "red" "red" "dodgerblue" "red" "dodgerblue"
[6] "dodgerblue" "dodgerblue" "red" "red" "red"
[11] "red" "red" "red" "dodgerblue" "goldenrod"
[16] "red" "red" "dodgerblue" "red" "red"
[21] "red" "red" "red" "red" "dodgerblue"
[26] "red" "red" "dodgerblue" "red" "dodgerblue"
[31] "dodgerblue" "dodgerblue" "dodgerblue" "red" "dodgerblue"
[36] "red"
(2)同样的思路,把节点的Practice赋予不同的形状
> v_shape <- c("circle","square")[V(lazega)$Practice]
(3)依据节点的Years属性设置节点大小
> v_size <- 3.5 * sqrt(V(lazega)$Years)
(4)将Seniority属性作为节点标签
> v_label <- V(lazega)$Seniority
(5)将以上变量作为plot()函数的参数传入
> l <- layout_with_fr(lazega)
> plot(lazega,layout=l,
+ vertex.color=v_colors,
+ vertex.shape=v_shape,
+ vertex.size=v_size,
+ vertex.label=v_label)
3.5 大型网络的可视化
当节点数有上千个时,图就变成“毛球”,下图不过1431个节点
> summary(fblog)
IGRAPH 3e87bca UN-- 192 1431 --
+ attr: name (v/c), PolParty (v/c)
> plot(fblog)
试着把不同的政党用不同颜色区分一下
> party_name <- sort(unique(V(fblog)$PolParty))
> party_name
[1] " Cap21" " Commentateurs Analystes"
[3] " Les Verts" " liberaux"
[5] " Parti Radical de Gauche" " PCF - LCR"
[7] " PS" " UDF"
[9] " UMP"
> party_name %>% class()
[1] "character"
> # 将政党名称由字符型转换一下
> # 如果直接转成因子型,因子水平很凌乱
> party_name %>% as.factor()
[1] Cap21 Commentateurs Analystes
[3] Les Verts liberaux
[5] Parti Radical de Gauche PCF - LCR
[7] PS UDF
[9] UMP
9 Levels: Cap21 Commentateurs Analystes Les Verts ... UMP
> # 将政党名称由字符型转换成因子型,再转整型
> party_name %>% as.factor() %>% as.numeric()
[1] 1 2 3 4 5 6 7 8 9
> party_color <- party_name %>% as.factor() %>% as.numeric()
> # 出图
> plot(fblog,vertex.color=party_color,vertex.size=3,vertex.label=NA)
依然看不清节点之间的关系 基于 VxOrd 的 DrL 算法,可用来处理大型图,教材说它把节点进行了聚类,但我觉得也不理想
> l <- layout_with_drl(fblog)
> # 出图
> plot(fblog,vertex.color=party_color,vertex.size=5,vertex.label=NA,layout=l)
但这个图可用作进一步分析的起点,先用drl算法发现节点可聚类,接下来用图分割法(graph partitioning methods) igraph有专门函数contract()
This function creates a new graph, by merging several vertices into one. The vertices in the new graph correspond to sets of vertices in the input graph.
fblog_c <- contract(fblog,party_num)
#注意用来分割节点的参数是一个partition(192个),与上例给政党上色的参数(9个)不同
party_num <- V(fblog)$PolParty %>% as.factor() %>% as.numeric()
plot(fblog_c,vertex.label=NA)
出图后,原图中每个节点与其他类的节点的边被保留下来,所得图是一个多重边图。新节点的标签也组合了原图节点标签
原书simplify(fblog.c)
部分代码执行失败,且未找到解决办法,略。
从网络中选择导出子图
> # 普查每个节点的直接邻居(含自身)
> k_nbhds <- make_ego_graph(karate,order = 1)
> sapply(k_nbhds,vcount)
[1] 17 10 11 7 4 5 5 5 6 3 4 2 3 6 3 3 3 3 3 4 3 3 3
[24] 6 4 4 3 5 4 5 5 7 13 18
> # 选择最大的两个
> k1 <- k_nbhds[[1]]
> k2 <- k_nbhds[[34]]
> par(mfrow=c(1,2))
> plot(k1)
> plot(k2)