How to extract columns names of a pandas dataframe in python ?

Daidalos October 20, 2019


With pandas to get the names of a dataframe, there is the attribute columns (ref):

>>> DataFrame.columns

Examples of applications:

Read a cvs data file and create a dataframe with pandas

Let's consider the cvs data file train.csv (that can be downloaded on kaggle)

>>> import pandas as pd
>>> data = pd.read_csv('train.csv')
>>> data.head()
   Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0   1          60       RL         65.0     8450   Pave   NaN      Reg   
1   2          20       RL         80.0     9600   Pave   NaN      Reg   
2   3          60       RL         68.0    11250   Pave   NaN      IR1   
3   4          70       RL         60.0     9550   Pave   NaN      IR1   
4   5          60       RL         84.0    14260   Pave   NaN      IR1

  LandContour Utilities    ...     PoolArea PoolQC Fence MiscFeature MiscVal  \
0         Lvl    AllPub    ...            0    NaN   NaN         NaN       0   
1         Lvl    AllPub    ...            0    NaN   NaN         NaN       0   
2         Lvl    AllPub    ...            0    NaN   NaN         NaN       0   
3         Lvl    AllPub    ...            0    NaN   NaN         NaN       0   
4         Lvl    AllPub    ...            0    NaN   NaN         NaN       0

  MoSold YrSold  SaleType  SaleCondition  SalePrice  
0      2   2008        WD         Normal     208500  
1      5   2007        WD         Normal     181500  
2      9   2008        WD         Normal     223500  
3      2   2006        WD        Abnorml     140000  
4     12   2008        WD         Normal     250000

[5 rows x 81 columns]

Get dataframe columns names

Get the data frame column names:

>>> data.columns

example

>>> columns = data.columns
>>> columns
Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
       'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
       'SaleCondition', 'SalePrice'],
      dtype='object')

>>> type(columns)
<class 'pandas.indexes.base.Index'>
>>> columns[5]
'Street'

Select one or several columns

Example of how to select one column

>>> data['SalePrice']
0       208500
1       181500
2       223500
3       140000
4       250000
5       143000
6       307000
7       200000
8       129900
9       118000
10      129500
11      345000
12      144000
13      279500
14      157000
15      132000
16      149000
17       90000
18      159000
19      139000
20      325300
21      139400
22      230000
23      129900
24      154000
25      256300
26      134800
27      306000
28      207500
29       68500
         ...  
1430    192140
1431    143750
1432     64500
1433    186500
1434    160000
1435    174000
1436    120500
1437    394617
1438    149700
1439    197000
1440    191000
1441    149300
1442    310000
1443    121000
1444    179600
1445    129000
1446    157900
1447    240000
1448    112000
1449     92000
1450    136000
1451    287090
1452    145000
1453     84500
1454    185000
1455    175000
1456    210000
1457    266500
1458    142125
1459    147500
Name: SalePrice, dtype: int64

Example of how to select two columns

>>> data[['SalePrice','BldgType']]
      SalePrice BldgType
0        208500     1Fam
1        181500     1Fam
2        223500     1Fam
3        140000     1Fam
4        250000     1Fam
5        143000     1Fam
6        307000     1Fam
7        200000     1Fam
8        129900     1Fam
9        118000   2fmCon
10       129500     1Fam
11       345000     1Fam
12       144000     1Fam
13       279500     1Fam
14       157000     1Fam
15       132000     1Fam
16       149000     1Fam
17        90000   Duplex
18       159000     1Fam
19       139000     1Fam
20       325300     1Fam
21       139400     1Fam
22       230000     1Fam
23       129900   TwnhsE
24       154000     1Fam
25       256300     1Fam
26       134800     1Fam
27       306000     1Fam
28       207500     1Fam
29        68500     1Fam
...         ...      ...
1430     192140     1Fam
1431     143750   TwnhsE
1432      64500     1Fam
1433     186500     1Fam
1434     160000     1Fam
1435     174000     1Fam
1436     120500     1Fam
1437     394617     1Fam
1438     149700     1Fam
1439     197000     1Fam
1440     191000     1Fam
1441     149300   TwnhsE
1442     310000     1Fam
1443     121000     1Fam
1444     179600     1Fam
1445     129000     1Fam
1446     157900     1Fam
1447     240000     1Fam
1448     112000     1Fam
1449      92000    Twnhs
1450     136000   Duplex
1451     287090     1Fam
1452     145000   TwnhsE
1453      84500     1Fam
1454     185000     1Fam
1455     175000     1Fam
1456     210000     1Fam
1457     266500     1Fam
1458     142125     1Fam
1459     147500     1Fam

[1460 rows x 2 columns]

References

Links Site
columns pandas
How to get column names in Pandas dataframe geeksforgeeks
House Prices: Advanced Regression Techniques kaggle

Licence


Activity


Google Ads