Categorical variable encoding in Pandas
Three ways of encoding categorical variables in Pandas
I found three ways of enoding categorical variables using Pandas functions only. Lets discuss one by one.
- pd.Categorical(column_name).codes
- pd.get_dummies(column_name)
- pd.factorize(column_name)[0]
First import Pandas module and train data
import pandas as pd
train = pd.read_csv('train.csv')
Find categorical variables from train data. To select categorial variables use df.select_dtypes(include=[‘object’]) where df is DataFrame type data.

- ‘Customer_id’, ‘name’ are not adding information to model. So, won’t encode them and drop these columns.
Select required variables and make list of categorical vaiables which need to be encoded as below:-

1. Use of pd.Categorical(column_name).codes
Here in this code, iterating over each categorical column, enoding it and save in new dataframe named cat.

2. Use of pd.get_dummies()
Here created general function create_dummies to create dummy columns from categorical column. Then iterate over each categorial variable and and encode using general function.
def create_dummies(df,column_name):
dummies=pd.get_dummies(df[column_name],prefix=column_name)
df=pd.concat([df,dummies],axis=1)
return dfcat_dummy = categorical_cols.copy()for column in columns_list:
cat_dummy = create_dummies(cat_dummy,column)
cat_dummy
Output of cat dummy as below:

Instead of converting one by one categorical variable, one can directly convert all variables at once as below:
cat_dummy2 = categorical_cols.copy()
pd.get_dummies(cat_dummy2.iloc[:,2:])
In the above code, copied categorical columns dataframe into cat_dummy2 named dataframe. Then, used iloc to exclude first two columns namely ‘name’, ‘customer_id’ .
3. Use of pd.factorize()
In the below code, encoding each categorical variable one by one using factorize method.

Let me know for any improvements or enhancements. Code and data is available here.
I hope you enjoyed reading !
Don’t forget to appreciate and clapp.