Workshop: Machine Learning Meets Econometrics (MLECON)

How informative is the Order Book Beyond the Best Levels? Machine Learning Perspective

Dat T Tran


Research on limit order book markets has been rapidly growing and nowadays high-frequency full order book data is widely available for researchers and practitioners. However, it is common that research papers use the best level data only, which motivates us to ask whether the exclusion of the quotes deeper in the book over multiple price levels causes performance degradation. In this paper, we address this question by using modern Machine Learning (ML) techniques to predict mid-price movements without assuming that limit order book markets represent a linear system. We provide a number of results that are robust across ML prediction models, feature selection algorithms, data sets, and prediction horizons. We find that the best bid and ask levels are systematically identified not only as the most informative levels in the order books, but also to carry most of the information needed for good prediction performance. On the other hand, even if the top-of-the-book levels contain most of the relevant information, to maximize models' performance one should use all data across all the levels. Additionally, the informativeness of the order book levels clearly decreases from the first to the fourth level while the rest of the levels are approximately equally important.