python - Loop over groups Pandas Dataframe and get sum/count -
i using pandas structure , process data. dataframe:
and code enabled me dataframe:
(data[['time_bucket', 'beginning_time', 'bitrate', 2, 3]].groupby(['time_bucket', 'beginning_time', 2, 3])).aggregate(np.mean)
now want have sum (ideally, sum , count) of 'bitrates' grouped in same time_bucket. example, first time_bucket((2016-07-08 02:00:00, 2016-07-08 02:05:00), must 93750000 sum , 25 count, case 'bitrate'.
i did :
data[['time_bucket', 'bitrate']].groupby(['time_bucket']).agg(['sum', 'count'])
and result :
but want have data in 1 dataframe.
can simple loop on 'time_bucket' , apply function calculate sum of bitrates ? ideas ? thx !
i think need merge
, need same levels of indexes
of both dataframes
, use reset_index
. last original multiindex
set_index
:
data = pd.dataframe({'a':[1,1,1,1,1,1], 'b':[4,4,4,5,5,5], 'c':[3,3,3,1,1,1], 'd':[1,3,1,3,1,3], 'e':[5,3,6,5,7,1]}) print (data) b c d e 0 1 4 3 1 5 1 1 4 3 3 3 2 1 4 3 1 6 3 1 5 1 3 5 4 1 5 1 1 7 5 1 5 1 3 1
df1 = data[['a', 'b', 'c', 'd','e']].groupby(['a', 'b', 'c', 'd']).aggregate(np.mean) print (df1) e b c d 1 4 3 1 5.5 3 3.0 5 1 1 7.0 3 3.0 df2 = data[['a', 'c']].groupby(['a'])['c'].agg(['sum', 'count']) print (df2) sum count 1 12 6 print (pd.merge(df1.reset_index(['b','c','d']), df2, left_index=true, right_index=true) .set_index(['b','c','d'], append=true)) e sum count b c d 1 4 3 1 5.5 12 6 3 3.0 12 6 5 1 1 7.0 12 6 3 3.0 12 6
i try solution output df1
, aggregated impossible right data. if sum level c
, 8
instead 12
.
Comments
Post a Comment