Timezone: »
Given a target binary function, the binary code search retrieves top-K similar functions in the repository, and similar functions represent that they are compiled from the same source codes. Searching binary code is particularly challenging due to large variations of compiler tool-chains and options and CPU architectures, as well as thousands of binary codes. Furthermore, there are some pivotal issues in current binary code search schemes, including inaccurate text-based or token-based analysis, slow graph matching, or complex deep learning processes. In this paper, we present an unsupervised tensor embedding scheme, Codee, to carry out code search efficiently and accurately at the binary function level. First, we use an NLP-based neural network to generate the semantic-aware token embedding. Second, we propose an efficient basic block embedding generation algorithm based on the network representation learning model. We learn both the semantic information of instructions and the control flow structural information to generate the basic block embedding. Then we use all basic block embeddings in a function to obtain a variable-length function feature vector. Third, we build a tensor to generate function embeddings based on the tensor singular value decomposition, which compresses the variable-length vectors into short fixed-length vectors to facilitate efficient search afterward. We further propose a dynamic tensor compression algorithm to incrementally update the function embedding database. Finally, we use the local sensitive hash method to find the top-K similar matching functions in the repository. Compared with state-of-the-art cross-platform and cross-optimization-level code search schemes,our scheme achieves higher average search accuracy, shorter feature vectors, and faster feature generation performance using four datasets, OpenSSL, Coreutils, libgmp and libcurl.
Author Information
Jia Yang (Huazhong University of Sci. & Technology)
Cai Fu (Huazhong university of science and technology)
Xiao-Yang Liu (Columbia University)
More from the Same Authors
-
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2021 : Graph-Tensor Singular Value Decomposition for Data Recovery »
Lei Deng · Haifeng Zheng · Xiao-Yang Liu -
2021 : High Performance Hierarchical Tucker Tensor Learning Using GPU Tensor Cores »
hao huang · Xiao-Yang Liu · Weiqin Tong · Tao Zhang · Anwar Walid -
2021 : Deep variational reinforcement learning by optimizing Hamiltonian equation »
Zeliang Zhang · Xiao-Yang Liu -
2021 : Spectral Tensor Layer for Model-Parallel Deep Neural Networks »
Zhiyuan Wang · Xiao-Yang Liu -
2023 Poster: Classical Simulation of Quantum Circuits: Parallel Environments and Benchmark »
Xiao-Yang Liu · Zeliang Zhang -
2022 Poster: Homomorphic Matrix Completion »
Xiao-Yang Liu · Zechu (Steven) Li · Xiaodong Wang -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 : Discussion Pannel »
Xiao-Yang Liu · Qibin Zhao · Chao Li · Guillaume Rabusseau -
2021 : High Performance Computation for Tensor Networks Learning »
Anwar Walid · Xiao-Yang Liu -
2021 Workshop: Second Workshop on Quantum Tensor Networks in Machine Learning »
Xiao-Yang Liu · Qibin Zhao · Ivan Oseledets · Yufei Ding · Guillaume Rabusseau · Jean Kossaifi · Khadijeh Najafi · Anwar Walid · Andrzej Cichocki · Masashi Sugiyama -
2021 : Opening Remarks »
Xiao-Yang Liu -
2020 : Closing Remarks »
Xiao-Yang Liu -
2020 : Panel Discussion 2: Software and High Performance Implementation »
Glen Evenbly · Martin Ganahl · Paul Springer · Xiao-Yang Liu -
2020 : Panel Discussion 1: Theoretical, Algorithmic and Physical »
Jacob Biamonte · Ivan Oseledets · Jens Eisert · Nadav Cohen · Guillaume Rabusseau · Xiao-Yang Liu -
2020 Workshop: First Workshop on Quantum Tensor Networks in Machine Learning »
Xiao-Yang Liu · Qibin Zhao · Jacob Biamonte · Cesar F Caiafa · Paul Pu Liang · Nadav Cohen · Stefan Leichenauer -
2020 : Opening Remarks »
Xiao-Yang Liu -
2019 : Coffee + Posters »
Changhao Chen · Nils Gählert · Edouard Leurent · Johannes Lehner · Apratim Bhattacharyya · Harkirat Singh Behl · Teck Yian Lim · Shiho Kim · Jelena Novosel · Błażej Osiński · Arindam Das · Ruobing Shen · Jeffrey Hawke · Joachim Sicking · Babak Shahian Jahromi · Theja Tulabandhula · Claudio Michaelis · Evgenia Rusak · WENHANG BAO · Hazem Rashed · JP Chen · Amin Ansari · Jaekwang Cha · Mohamed Zahran · Daniele Reda · Jinhyuk Kim · Kim Dohyun · Ho Suk · Junekyo Jhung · Alexander Kister · Matthias Fahrland · Adam Jakubowski · Piotr Miłoś · Jean Mercat · Bruno Arsenali · Silviu Homoceanu · Xiao-Yang Liu · Philip Torr · Ahmad El Sallab · Ibrahim Sobh · Anurag Arnab · Krzysztof Galias -
2018 : Posters and Open Discussions (see below for poster titles) »
Ramya Malur Srinivasan · Miguel Perez · Yuanyuan Liu · Ben Wood · Dan Philps · Kyle Brown · Daniel Martin · Mykola Pechenizkiy · Luca Costabello · Rongguang Wang · Suproteem Sarkar · Sangwoong Yoon · Zhuoran Xiong · Enguerrand Horel · Zhu (Drew) Zhang · Ulf Johansson · Jonathan Kochems · Gregory Sidier · Prashant Reddy · Lana Cuthbertson · Yvonne Wambui · Christelle Marfaing · Galen Harrison · Irene Unceta Mendieta · Thomas Kehler · Mark Weber · Li Ling · Ceena Modarres · Abhinav Dhall · Arash Nourian · David Byrd · Ajay Chander · Xiao-Yang Liu · Hongyang Yang · Shuang (Sophie) Zhai · Freddy Lecue · Sirui Yao · Rory McGrath · Artur Garcez · Vangelis Bacoyannis · Alexandre Garcia · Lukas Gonon · Mark Ibrahim · Melissa Louie · Omid Ardakanian · Cecilia Sönströd · Kojin Oshiba · Chaofan Chen · Suchen Jin · aldo pareja · Toyo Suzumura