How to remove duplicates rows in a pandas dataframe in python ?

How to remove duplicates rows in a pandas dataframe in python ?

Daidalos June 16, 2020


Examples of how to emove duplicates rows in a pandas dataframe in python:

1 -- Create a dataframe

Lets create first for example the following dataframe

import pandas as pd

data = {'Name':['Ben','Anna','Anna','Anna','Zoe','Zoe','Tom','John','Steve'], 
        'Age':[20,27,27,27,43,43,30,12,21], 
        'Sex':[1,0,0,0,0,0,1,1,1]}

df = pd.DataFrame(data)

print(df)

returns here

    Name  Age  Sex
0    Ben   20    1
1   Anna   27    0
2   Anna   27    0
3   Anna   27    0
4    Zoe   43    0
5    Zoe   43    0
6    Tom   30    1
7   John   12    1
8  Steve   21    1

2 -- Drop duplicates rows

To drop duplicates rows, a solution is to use the pandas function drop_duplicates

df.drop_duplicates(keep = 'first', inplace=True)

returns

    Name  Age  Sex
0    Ben   20    1
1   Anna   27    0
4    Zoe   43    0
6    Tom   30    1
7   John   12    1
8  Steve   21    1

3 -- Drop duplicate rows

Another example with the following dataframe

data = {'Name':['Ben','Anna','Anna','Anna','Zoe','Zoe','Tom','John','Steve'], 
        'Customer id':['0001','0005','0005','0005','0023','0023','0008','0009','0012'], 
        'Age':[20,27,23,24,43,43,30,12,21], 
        'Sex':[1,0,0,0,0,0,1,1,1]}

df = pd.DataFrame(data)

returns

    Name Customer id  Age  Sex
0    Ben        0001   20    1
1   Anna        0005   27    0
2   Anna        0005   23    0
3   Anna        0005   24    0
4    Zoe        0023   43    0
5    Zoe        0023   43    0
6    Tom        0008   30    1
7   John        0009   12    1
8  Steve        0012   21    1

To remove duplicate rows according to the column named here 'Cus†umer id', it is possible to add the argument subset, illustration:

df.drop_duplicates(subset ="Customer id", keep = 'first', inplace=True)

returnsL

    Name Customer id  Age  Sex
0    Ben        0001   20    1
1   Anna        0005   27    0
4    Zoe        0023   43    0
6    Tom        0008   30    1
7   John        0009   12    1
8  Steve        0012   21    1

4 -- References