Mobility, the environment, and human health are all harmed by sub-optimal control policies in transportation systems. Intersection traffic signal controllers are a crucial part of today's transportation infrastructure, as sub-optimal policies may lead to traffic jams and as a result increased levels of air pollution and wasted time. Many adaptive traffic signal controllers have been proposed in the literature, but research on their relative performance differences is limited. On the other hand, to the best of our knowledge there has been no work that directly targets CO2 emission reduction, even though pollution is currently a critical issue. In this paper, we propose a reward shaping scheme for various RL algorithms that not only produces lowers CO2 emissions, but also produces respectable outcomes in terms of other metrics such as travel time. We compare multiple RL algorithms --- sarsa, and A2C --- as well as diverse scenarios with a mix of different road users emitting varied amounts of pollution.