当前位置: 首页 > 编程学习 > 其它语言 > R语言 > 正文

R语言学习笔记(十七) data.table包中melt与dcast函数的使用

2018-04-22 来源:博客园/嘻呵呵

melt函数可以将宽数据转化为长数据

dcast函数可以将长数据转化为宽数据

> DT = fread("melt_default.csv")
> DT
   family_id age_mother dob_child1 dob_child2 dob_child3
1:         1         30 1998-11-26 2000-01-29         NA
2:         2         27 1996-06-22         NA         NA
3:         3         26 2002-07-11 2004-04-05 2007-09-02
4:         4         32 2004-10-10 2009-08-27 2012-07-21
5:         5         29 2000-12-05 2005-02-28         NA
> DT.m1 <- melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"),
+               variable.name = "child", value.name = "dob")
> DT.m1
    family_id age_mother      child        dob
 1:         1         30 dob_child1 1998-11-26
 2:         2         27 dob_child1 1996-06-22
 3:         3         26 dob_child1 2002-07-11
 4:         4         32 dob_child1 2004-10-10
 5:         5         29 dob_child1 2000-12-05
 6:         1         30 dob_child2 2000-01-29
 7:         2         27 dob_child2         NA
 8:         3         26 dob_child2 2004-04-05
 9:         4         32 dob_child2 2009-08-27
10:         5         29 dob_child2 2005-02-28
11:         1         30 dob_child3         NA
12:         2         27 dob_child3         NA
13:         3         26 dob_child3 2007-09-02
14:         4         32 dob_child3 2012-07-21
15:         5         29 dob_child3         NA
> dcast(DT.m1, family_id + age_mother ~ child, value.var = "dob")
   family_id age_mother dob_child1 dob_child2 dob_child3
1:         1         30 1998-11-26 2000-01-29         NA
2:         2         27 1996-06-22         NA         NA
3:         3         26 2002-07-11 2004-04-05 2007-09-02
4:         4         32 2004-10-10 2009-08-27 2012-07-21
5:         5         29 2000-12-05 2005-02-28         NA

对于较为复杂的数据可以这样做

> DT <- fread("melt_enhanced.csv")
> DT
   family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1:         1         30 1998-11-26 2000-01-29         NA             1             2            NA
2:         2         27 1996-06-22         NA         NA             2            NA            NA
3:         3         26 2002-07-11 2004-04-05 2007-09-02             2             2             1
4:         4         32 2004-10-10 2009-08-27 2012-07-21             1             1             1
5:         5         29 2000-12-05 2005-02-28         NA             2             1            NA
> DT.m2 <- melt(DT, measure = patterns("^dob","^gender"), value.name = c("dob", "gender"))
> DT.m2
    family_id age_mother variable        dob gender
 1:         1         30        1 1998-11-26      1
 2:         2         27        1 1996-06-22      2
 3:         3         26        1 2002-07-11      2
 4:         4         32        1 2004-10-10      1
 5:         5         29        1 2000-12-05      2
 6:         1         30        2 2000-01-29      2
 7:         2         27        2         NA     NA
 8:         3         26        2 2004-04-05      2
 9:         4         32        2 2009-08-27      1
10:         5         29        2 2005-02-28      1
11:         1         30        3         NA     NA
12:         2         27        3         NA     NA
13:         3         26        3 2007-09-02      1
14:         4         32        3 2012-07-21      1
15:         5         29        3         NA     NA
> DT.c2 <- dcast(DT.m2, family_id + age_mother ~ variable, value.var = c("dob","gender"))
> DT.c2
   family_id age_mother      dob_1      dob_2      dob_3 gender_1 gender_2 gender_3
1:         1         30 1998-11-26 2000-01-29         NA        1        2       NA
2:         2         27 1996-06-22         NA         NA        2       NA       NA
3:         3         26 2002-07-11 2004-04-05 2007-09-02        2        2        1
4:         4         32 2004-10-10 2009-08-27 2012-07-21        1        1        1
5:         5         29 2000-12-05 2005-02-28         NA        2        1       NA