Categorical variable encoding in Pandas

Anshu Trivedi
2 min readDec 28, 2021

--

Three ways of encoding categorical variables in Pandas

I found three ways of enoding categorical variables using Pandas functions only. Lets discuss one by one.

  1. pd.Categorical(column_name).codes
  2. pd.get_dummies(column_name)
  3. pd.factorize(column_name)[0]

First import Pandas module and train data

import pandas as pd
train = pd.read_csv('train.csv')

Find categorical variables from train data. To select categorial variables use df.select_dtypes(include=[‘object’]) where df is DataFrame type data.

  • ‘Customer_id’, ‘name’ are not adding information to model. So, won’t encode them and drop these columns.

Select required variables and make list of categorical vaiables which need to be encoded as below:-

1. Use of pd.Categorical(column_name).codes

Here in this code, iterating over each categorical column, enoding it and save in new dataframe named cat.

2. Use of pd.get_dummies()

Here created general function create_dummies to create dummy columns from categorical column. Then iterate over each categorial variable and and encode using general function.

def create_dummies(df,column_name):
dummies=pd.get_dummies(df[column_name],prefix=column_name)
df=pd.concat([df,dummies],axis=1)
return df
cat_dummy = categorical_cols.copy()for column in columns_list:
cat_dummy = create_dummies(cat_dummy,column)

cat_dummy

Output of cat dummy as below:

Instead of converting one by one categorical variable, one can directly convert all variables at once as below:

cat_dummy2 = categorical_cols.copy()
pd.get_dummies(cat_dummy2.iloc[:,2:])

In the above code, copied categorical columns dataframe into cat_dummy2 named dataframe. Then, used iloc to exclude first two columns namely ‘name’, ‘customer_id’ .

3. Use of pd.factorize()

In the below code, encoding each categorical variable one by one using factorize method.

Let me know for any improvements or enhancements. Code and data is available here.

I hope you enjoyed reading !
Don’t forget to appreciate and clapp.

Sign up to discover human stories that deepen your understanding of the world.

--

--

Anshu Trivedi
Anshu Trivedi

Written by Anshu Trivedi

Data Scientist-Analyst|Data science|Computer Vision

Responses (1)

Write a response