Combinatorial optimization is applied to a wide variety of real-world problems like job scheduling, capacity planning and supply-chain management. These problems are usually modelled as Mixed Integer Programming (MIP) Problems and solved using the Branch and Bound (B\&B) paradigm. Branch and Bound method partitions the solution space (branching) by creating constrained sub-problems (bounding) and explores those subsets of the solution space which are highly likely to produce optimal solutions. The efficiency of the Branch and Bound method in finding optimal solutions is heavily influenced by the variable and node selection heuristics used for branching. In this paper, we propose a novel deep reinforcement learning based variable selection strategy. The proposed solution shows significant improvement over Strong Branching (SB) strategies, which have been traditionally used for variable selection. The solution also outperforms the current state of the art RL-based branching strategies like PPO and DQN. The results of our experiments show that the proposed solution is robust and scalable to different kind of problems.