data.table - R converting to long format, pattern -


i convert data.table 1 wide format long.

set.seed(1) dt <- data.table(      id = c(1:5, na),   name = c("bob","ana","smith","sam","big","lulu"),   kind_2001 = factor(sample(c(letters[1:3], na), 6, true)),   kind_2002 = factor(sample(c(letters[1:3], na), 6, true)),   kind_2003 = factor(sample(c(letters[1:3], na), 6, true)),   conc_2001 = sample(99,6),   conc_2002 = sample(79,6),   conc_2003 = sample(49,6)   )   id  name kind_2001 kind_2002 kind_2003 conc_2001 conc_2002 conc_2003 1   bob         b        na         c        38        22        24 2   ana         b         c         b        77        31        29 3 smith         c         c        na        91         2        49 4   sam        na                 b        21        30         9 5   big                         c        62        66        38 na  lulu       na                na        12        26        30 

and this:

id   name   year    kind   conc     1    bob    2001    b      38 1    bob    2002    na     22 1    bob    2003    c      24 2    ana    2001    b      77 2    ana    2002    c      31 2    ana    2003    b      29 ... 

the real table has many more variables, i'm looking solution without explicitly saying every column name or number, detecting automatically pattern.
have 2 kind of columns, ending underscore , 4 digit year, such _2001, , other without ending.
can have underscore in middle of name (this kept untransformed).
transform columns ending year long format.

i've tried with

melt(dt, id=1:2, variable.name = "year") or  melt(dt, id=1:2, measure=patterns("_2[0-9][0-9][0-9]$")) 

but i'm not getting want.

maybe first need filter names gsub.

pd: i've found solution.

posi <- grep("_[0-9][0-9][0-9][0-9]$",colnames(dt)) work <- unique(gsub("_[0-9][0-9][0-9][0-9]$","",colnames(dt)[posi])) melt(dt, measure=patterns(paste0("^",work)), variable="year", value.name=work) 

it works year column not populated properly. i'm missing or it's bug. , i'm sure written simpler.

id  name year kind conc  1   bob    1    b   38  2   ana    1    b   77  3 smith    1    c   91  4   sam    1   na   21  5   big    1      62 na  lulu    1   na   12  1   bob    2   na   22  2   ana    2    c   31  3 smith    2    c    2  4   sam    2      30  5   big    2      66 na  lulu    2      26  1   bob    3    c   24  2   ana    3    b   29  3 smith    3   na   49  4   sam    3    b    9  5   big    3    c   38 na  lulu    3   na   30 

regards

i've tried eddi solution database , error:

"error: cannot allocate vector of size 756.5 mb" though have 16gb of memory.

here's option that's more robust respect order of columns, missing/extra years:

dcast(melt(dt, id.vars = c("id", "name"))         [, .(id, name, sub('_.*', '', variable), sub('.*_', '', variable), value)],       id + name + v4 ~ v3) #    id  name   v4 conc kind # 1:  1   bob 2001   38    b # 2:  1   bob 2002   22   na # 3:  1   bob 2003   24    c # 4:  2   ana 2001   77    b # 5:  2   ana 2002   31    c # 6:  2   ana 2003   29    b # 7:  3 smith 2001   91    c # 8:  3 smith 2002    2    c # 9:  3 smith 2003   49   na #10:  4   sam 2001   21   na #11:  4   sam 2002   30    #12:  4   sam 2003    9    b #13:  5   big 2001   62    #14:  5   big 2002   66    #15:  5   big 2003   38    c #16: na  lulu 2001   12   na #17: na  lulu 2002   26    #18: na  lulu 2003   30   na 

edit many id columns:

idvars = grep("_", names(dt), invert = true) dcast(melt(dt, id.vars = idvars)         [, `:=`(var      = sub('_.*', '', variable),                 year     = sub('.*_', '', variable),                 variable = null)],       ... ~ var, value.var='value') 

Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -