Workshop: Instruction Tuning and Instruction Following

Oral Presentations

Fri 15 Dec 2 p.m. PST — 3:20 p.m. PST

  1. Understanding Hidden Context in Preference Learning: Consequences for RLHF
  2. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
  3. Understanding the Effects of RLHF on LLM Generalisation and Diversity
  4. Learning Interactive Real-World Simulators
  5. Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
  6. Self-RAG: Self-reflective Retrieval Augmented Generation
  7. Delve into PPO: Implementation Matters for Stable RLHF
  8. FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

