Poster
in
Workshop: Machine Learning for Systems

CloudEval-YAML: A Realistic and Scalable Benchmark for Cloud Configuration Generation

Yifei Xu ⋅ Yuning Chen ⋅ Xumiao Zhang ⋅ Xianshang Lin ⋅ Pan Hu ⋅ Yunfei Ma ⋅ Songwu Lu ⋅ Wan Du ⋅ Zhuoqing Morley Mao ⋅ Ennan Zhai ⋅ Dennis Cai

Project Page [ OpenReview]

Abstract

Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 13 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost. The codebase is released at https://github.com/alibaba/CloudEval-YAML.

Chat is not available.