python pandas how to drop duplicates selectively -

i need @ rows in column ['b'] , if row non-empty go corresponding column ['c'] , drop duplicates of particular index against other rows in third column ['c'] while preserving particular index. came across drop_duplicates, unable find way duplicates of highlighted row opposed duplicates in column. can't use drop_duplicates on whole column because want retain duplicates in column may correspond empty values in column ['b'].

so possible scenarios be: if in ['b'] find non empty value, may go current index in ['c'] , find duplicates of 1 index , drop those. these duplicates correspond empty or non-empty values in ['b']. if in ['b'] find empty value skip next index. way possible empty value indices in ['b'] removed indirectly because duplicates of index in ['c'] corresponding non empty ['b'] value.

edited sample data:


df1 = pd.dataframe([['','ccch'], ['chc','ccch'], ['cchcc','cnhcc'], ['','ccch'], ['cnhcc','cnoch'], ['','nch'], ['','nch']], columns=['b', 'c'])    df1      b     c   0         ccch 1   chc   ccch 2   cchcc cnhcc 3         ccch 4   cnhcc cnoch 5         nch 6         nch 

post processing , dropping correct duplicates:

df2 = pd.dataframe([['chc','ccch'], ['cchcc','cnhcc'], ['cnhcc','cnoch'], ['','nch'], ['','nch']], columns=['b', 'c'])  df2      b     c 1   chc   ccch 2   cchcc cnhcc 4   cnhcc cnoch 5         nch 6         nch 

above see result rows removed rows 0,3 duplicates in column ['c'] of row 1 has non 0 'b' value. row 5,6 kept though duplicates of each other in column ['c'] because have no non 0 'b' value. rows 2 , 4 kept because not duplicates in column ['c'].

so logic go through each row in column 'b' if empty move down row , continue. if not empty go corresponding column 'c' , drop duplicates of column 'c' row while preserving index , continue next row untill logic has been applied values in column 'b'.

column b value empty --> @ next value in column b

| or if not empty |

column b not empty --> column c --> drop duplicates of index of column c while keeping current index --> @ next value in column b

say group dataframe according 'c' column, , check each group existence of 'b'-column non-empty entry:

  • if there no such entry, return entire group

  • otherwise, return group, non-empty entries in 'b', duplicates dropped

in code:

def remove_duplicates(g):                                         return g if sum(g.b == '') == len(g) else g[g.b != ''].drop_duplicates(subset='b')  >>> df1.groupby(df1.c).apply(remove_duplicates)['b'].reset_index()[['b', 'c']]        b      c 0    chc   ccch 1  cchcc  cnhcc 2  cnhcc  cnoch 3           nch 4           nch 


Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -