python 3.x - Pandas Label Duplicates -
given following data frame:
import pandas pd d=pd.dataframe({'label':[1,2,2,2,3,4,4], 'values':[3,5,7,2,5,8,3]}) d label values 0 1 3 1 2 5 2 2 7 3 2 2 4 3 5 5 4 8 6 4 3
i know how count unique values this:
d['dup']=d.groupby('label')['label'].transform('count')
which results in:
label values dup 0 1 3 1 1 2 5 3 2 2 7 3 3 2 2 3 4 3 5 1 5 4 8 2 6 4 3 2
but column have following values: 1
if there 1 unique
row per label column, 2
if there duplicates
, row in question first
of such, , 0
if row duplicate
of original. this:
label values dup status 0 1 3 1 1 1 2 5 3 2 2 2 7 3 0 3 2 2 3 0 4 3 5 1 1 5 4 8 2 2 6 4 3 2 0
thanks in advance!
i think can use loc
condition created function duplicated
:
d['status'] = 2 d.loc[d.dup == 1, 'status'] = 1 d.loc[d.label.duplicated(), 'status'] = 0 print (d) label values dup status 0 1 3 1 1 1 2 5 3 2 2 2 7 3 0 3 2 2 3 0 4 3 5 1 1 5 4 8 2 2 6 4 3 2 0
or double numpy.where
:
d['status1'] = np.where(d.dup == 1, 1, np.where(d.label.duplicated(), 0, 2)) print (d) label values dup status status1 0 1 3 1 1 1 1 2 5 3 2 2 2 2 7 3 0 0 3 2 2 3 0 0 4 3 5 1 1 1 5 4 8 2 2 2 6 4 3 2 0 0
Comments
Post a Comment