SkillAgentSearch skills...

PSmatching

do the green drivers also drive longer? --- causal identification using the propensity score approach

Install / Use

/learn @ccubc/PSmatching
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Does Owning an Energy Efficient Vehicle Lead to Longer Driving Distance

This project aims to explore the question "Does owning an energy efficient vehicle lead to longer driving distance". Using data from National Household Travel Survey(2017), I explore how households' driving pattern is correlated with owning an energy efficient vehicle, which includes hybrid electric vehicles(HEV), plug-in hybrid electric vehicles(PHEV), electric vehicles(EV), and other alternative fuel vehicles.

The question could be of interest to policy makers who provide financial incentives for purchasing energy efficient vehicles. Policy makers promote energy efficient vehicles with a hope to reduce the environmental impact of driving. However, if there exists the notorious rebound effect, which means "owning a green vehicle leads to more driving", the environmental benefit of driving a green vehicle would be discounted. Therefore it would be benificial to the policy maker to detect and quantify such a rebound effect.

A main difficulty of quantifying rebound effect is "selection bias": households who anticipate to drive longer mileage have greater incentive to purchase energy efficient vehicles due to fuel cost saving. Not addressing this issue will result in over-estimate in the rebound effect. To alleviate such concern, I use propensity score matching method to first pair up households with similar characteristics and are equally likely to purchase energy efficient vehicles, then compare the difference of their driving distances. Since the paired households are believed to be equally likely to purchase energy efficient vehicles, the purchase decision becomes quasi-random. Therefore, we overcome the selection bias problem.

The dataset contains information regarding to:

  • households' size, income, state, urban/rural area, number of adults, number of vehicles, etc.
  • vehicles' fuel type, size, annual mileage, etc.

The following code will first import and clean the dataset, and then use propensity score matching method to calculate how much extra mileage are caused by owning a green vehicle.


import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})
import warnings
warnings.filterwarnings('ignore')
from pymatch.Matcher import Matcher
import statsmodels.api as sm
import seaborn as sns

Import and clean dataset

data = pd.read_csv('/Users/chengchen/Dropbox/NHTS_2018/data/NHTS2017/vehpub.csv')
data.head()
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>HOUSEID</th> <th>VEHID</th> <th>VEHYEAR</th> <th>VEHAGE</th> <th>MAKE</th> <th>MODEL</th> <th>FUELTYPE</th> <th>VEHTYPE</th> <th>WHOMAIN</th> <th>OD_READ</th> <th>...</th> <th>HH_CBSA</th> <th>HBHTNRNT</th> <th>HBPPOPDN</th> <th>HBRESDN</th> <th>HTEEMPDN</th> <th>HTHTNRNT</th> <th>HTPPOPDN</th> <th>HTRESDN</th> <th>SMPLSRCE</th> <th>WTHHFIN</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>30000007</td> <td>1</td> <td>2007</td> <td>10</td> <td>49</td> <td>49032</td> <td>1</td> <td>1</td> <td>3</td> <td>69000</td> <td>...</td> <td>XXXXX</td> <td>20</td> <td>1500</td> <td>750</td> <td>750</td> <td>50</td> <td>1500</td> <td>750</td> <td>2</td> <td>187.31432</td> </tr> <tr> <th>1</th> <td>30000007</td> <td>2</td> <td>2004</td> <td>13</td> <td>49</td> <td>49442</td> <td>1</td> <td>2</td> <td>-8</td> <td>164000</td> <td>...</td> <td>XXXXX</td> <td>20</td> <td>1500</td> <td>750</td> <td>750</td> <td>50</td> <td>1500</td> <td>750</td> <td>2</td> <td>187.31432</td> </tr> <tr> <th>2</th> <td>30000007</td> <td>3</td> <td>1998</td> <td>19</td> <td>19</td> <td>19014</td> <td>1</td> <td>1</td> <td>1</td> <td>120000</td> <td>...</td> <td>XXXXX</td> <td>20</td> <td>1500</td> <td>750</td> <td>750</td> <td>50</td> <td>1500</td> <td>750</td> <td>2</td> <td>187.31432</td> </tr> <tr> <th>3</th> <td>30000007</td> <td>4</td> <td>1997</td> <td>20</td> <td>19</td> <td>19021</td> <td>1</td> <td>1</td> <td>2</td> <td>-88</td> <td>...</td> <td>XXXXX</td> <td>20</td> <td>1500</td> <td>750</td> <td>750</td> <td>50</td> <td>1500</td> <td>750</td> <td>2</td> <td>187.31432</td> </tr> <tr> <th>4</th> <td>30000007</td> <td>5</td> <td>1993</td> <td>24</td> <td>20</td> <td>20481</td> <td>1</td> <td>4</td> <td>2</td> <td>300000</td> <td>...</td> <td>XXXXX</td> <td>20</td> <td>1500</td> <td>750</td> <td>750</td> <td>50</td> <td>1500</td> <td>750</td> <td>2</td> <td>187.31432</td> </tr> </tbody> </table> <p>5 rows × 49 columns</p> </div>
data = data[data.FUELTYPE<4] 
data = data[data.FUELTYPE>0] 
data = data[data.ANNMILES>0] 
data = data[data.HHFAMINC>0] 
data = data[data.HOMEOWN>0] 
data = data[data.VEHAGE>0] 
data = data[data.VEHTYPE>0]# drop the observations where some information is missing
income_dic = {1:5000, 2:12500, 3:20000, 4:30000, 5:42500, 6:62500, 7:87500, 8:112500, 9: 137500, 10: 175000, 11:225000}
data['income'] = data['HHFAMINC'].map(income_dic)
# map the mean income amount of the income category in the survey
home_dic = {1:1, 2:0, 97:0}
# yes: owning home  no: not owning home
data['homeown'] = data['HOMEOWN'].map(home_dic)
urban_dic = {1:'urban_area',2:'urban_cluster',3:'near_urban',4:'not_urban'}
data['urban'] = data['URBAN'].map(urban_dic)
vehtype_dic = {1: 'car',2: 'van',3: 'SUV',4: 'pickup',5: 'truck',6: 'RV',7: 'motorcycle',97: 'else'}
data['vehtype'] = data['VEHTYPE'].map(vehtype_dic)
fueltype_dic = {1: 'gas', 2: 'diesel', 3: 'hybrid/electric/alternative'}
data['fueltype'] = data['FUELTYPE'].map(fueltype_dic)

Summary Statistics

# Frequency of FuelType
data.groupby('fueltype')['fueltype'].count()
fueltype
diesel                           5362
gas                            170384
hybrid/electric/alternative      4966
Name: fueltype, dtype: int64
# Average Driving Distance Grouped by FuelType
data.groupby('fueltype')['ANNMILES'].mean()
fueltype
diesel                         11247.242260
gas                             9620.500235
hybrid/electric/alternative    12308.237213
Name: ANNMILES, dtype: float64

Above shows the mean annual mileage of vehicles of different fuel types. As can be seen, the lower the fuel cost, the longer the mileage driven is.

More Summary Statistics

htype_dic = {1:  'biodiesel', 2:  'plug-in hybrid', 3:  'electric', 4:  'hybrid', -9: 'NA', -8: 'NA', -1: 'NA', 97: 'NA'}
data['hfuel'] = data['HFUEL'].map(htype_dic)
data.groupby(['fueltype','hfuel'])['ANNMILES'].mean()
fueltype                     hfuel         
diesel                       NA                11247.242260
gas                          NA                 9620.500235
hybrid/electric/alternative  NA                12387.113208
                             biodiesel         12541.944444
                             electric           8235.281720
                             hybrid            12874.606100
                             plug-in hybrid    11516.482587
Name: ANNMILES, dtype: float64

Within the category of energy efficient vehicle, annual mileage ranking is : <br> hybrid > biudiesel > plug-in hybrid > electric <br> (Battery size probably plays a role in limiting the mileage of pure electric vehicles.) <br>

area_avg_mile = data.groupby(['HHSTATE','HH_CBSA'])['ANNMILES'].mean().to_frame()
# average annual driving distance of each region (State + Core Based Statistical Area)
data = pd.merge(data,area_avg_mile, how = 'right', left_on = ['HHSTATE','HH_CBSA'], right_index = True)
# add as new column in the dataframe
data = data.rename(columns = {'ANNMILES_x': 'ANNMILES', 'ANNMILES_y': 'area_avg_mile'})
data['relative_mile'] = data['ANNMILES']/data['area_avg_mile']
# this is relative driving milage, compared to the average level of the local area
data.groupby(['fueltype','hfuel'])['relative_mile'].mean()

fueltype                     hfuel         
diesel                       NA                1.173388
gas                          NA                0.985955
hybrid/electric/alternative  NA                1.284829
                             biodiesel         1.306946
                             electric          0.888567
                             hybrid            1.351362
                             plug-in hybrid    1.223351
Name: relative_mile, dtype: float64

Energy efficient vehicles are driven for more mileage compared to its local average levels. This further comfirms the driving behavior pattern. Until this stage we have looked at the general data pattern without handling the "selection bias" issue. We will try to deal with this problem using propensity score matching.<br><br>

Propensity Score Matching

The following section will match the treatment/control groups:

  • treatment group: hybrid/electric/alternative vehicles
  • control group: gasoline/diesel vehicles They will be matched by both household and vehicle charac
View on GitHub
GitHub Stars6
CategoryDevelopment
Updated2y ago
Forks1

Security Score

60/100

Audited on Mar 19, 2024

No findings