Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

1National University of Singapore,  2UT Austin,  3University of St. Gallen,  4Oxford University
(* Zhiyuan, Zhangyang, and Kai are core contributors to this work.)
Code (coming soon) Paper Twitter BibTex
We introduce Drag-and-Drop LLMs ( DnD ) 🥳, a prompt-conditioned parameter generator that enables training-free adaptation of large language models. By leveraging a lightweight text encoder and a cascaded hyperconvolutional decoder, DnD produces task-specific LoRA matrices from unlabeled task prompts in seconds. It achieves up to 12,000× lower overhead than full fine-tuning, outperforms the strongest training LoRAs by up to 30% on zero-shot common-sense reasoning, math, coding, and multimodal benchmarks, and generalizes robustly across domains, all requiring only unlabeled data prompts. Conclusively, DnD offers a powerful, flexible, and efficient alternative to traditional fine-tuning for rapid model specialization.

Customize Your LLMs w/o Training in seconds!


Video 1: Parameter-Efficient Fine-Tuning often takes hours and requires separate tuning for every downstream tasks. We propose Drag-and-Drop LLMs: generating task-specific parameters in seconds utilizing only data prompts as conditions.

Drag-and-Drop Your LLMs

Motivation

Despite strong zero-shot competence endowed by pre-training, Large Language Models (LLMs) still require task-specific customization for real-world applications. Parameter-Efficient Fine-Tuning (PEFT), such as LoRA, addresses this by introducing a small set of trainable parameters while keeping original weights frozen. However, it can only alleviate but not erase the cost of per-task-tuning, creating a major bottleneck for large-scale deployment.


Figure 1: Left: Parameter-efficient methods such as LoRA need hours to optimize LLMs in order to adapt them to novel datasets. Right Our method adapts LLMs by directly generating LoRA matrices for novel datasets in seconds without any tuning.

Implementation

We observe that a LoRA adapter is nothing more than a function of its training data: gradient descent “drags” the base weights towards a task-specific optimum. If that mapping from prompts to weights can be learned directly, we could bypass gradient descent altogether.


Figure 2: Our approach obtains dragging ability via two processes: prepare training data (upper left) and training the parameter generator (upper right). When preparing training data, we explicitly pair parameters with dataset-specific conditions. During training, DnD takes condition as input and generate parameters, using original parameters as supervision.
Building on these insights, we propose Drag-and-Drop LLMs to generate task-specific weights without the need of tuning. We first train and save LoRA adapters on various datasets. To develop the “drag-and-drop” capability, we incorporate prompts from these datasets and randomly pair them with collected checkpoints to form the training data of DnD: prompt-parameter pairs. The generator is a decoder consists of cascaded convolutional blocks. During training, we employ a off-the-shelf text encoder to extract prompt embeddings and feed it into the generator. The generator outputs model weights as prediction, and we use MSE loss between the generated and original model weights to optimize it. During inference, we simply need to feed prompts from novel datasets (not seen during training) to DnD to obtain tailored parameters with one single forward pass.

Evaluations

Zero-Shot Results


Table 1: Generalization on novel (test) datasets. Our approach significantly outperforms LoRAs used in training in terms of accuracy (%) across all unseen datasets. Bold entries are the best results.

Table 2: DnD can generate parameters for more complex tasks like math, code and multimodal question answering. Our method continues to show strong zero-shot ability on these tasks.

Table 3: DnD surpasses foundation LLMs across various tasks, showing the “drag-and-drop” effect.

Table 4: DnD scales well with larger 7B foundation models, and maintains strong performance in more complex benchmark LiveCodeBench.

Utilizing fine-tuned LoRAs as training data, DnD establishes connections between input data prompts and model parameters. We test DnD's zero-shot ability by feeding it with prompts from datasets unseen in training and instruct it to generate parameters for novel datasets. Our method shows amazing improvment over the average of training LoRAs on zero-shot test sets, generalizes to multiple real-world tasks, and scales to various LLM sizes.

Comparison with Other Tuning Methods


Figure 3: DnD can reach comparable or even better performance than full-shot while being 2.5∼12K × faster, outperforming popular few-shot and ICL methods before 256 shots and without relying on answers.

To further underscore DnD's magic power, we compare it with full-shot tuning, few-shot tuning (FS), and in-context learning (ICL). Surprisingly, DnD surpasses training LoRA's full-shot ability with 2500× speedup. With more iterations, full-shot tuning outperforms DnD, but at a cost of 12,000× latency. Also, DnD consistently outperforms FS and ICL before 256 shots. It is noteworthy that FS and ICL all rely on answers to the problems, but DnD requires only unlabeled prompts.


Related works

  • Konstantin Schürholt, Boris Knyazev, Xavier Giró-i Nieto, and Damian Borth. Hyperrepresentations for pre-training and transfer learning . In NeurIPS, 2022.
  • Konstantin Schürholt, Michael W Mahoney, and Damian Borth. Towards scalable and versatile weight space learning . arXiv, 2024.
  • Kai Wang, Dongwen Tang, Boya Zeng, Yida Yin, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You. Neural network diffusion . arXiv, 2024.
  • Zexi Li, Lingzhi Gao, and Chao Wu. Text-to-model: Text-conditioned neural network diffusion for train-once-for-all personalization. arXiv, 2024.
  • Xiaolong Jin, Kai Wang, Dongwen Tang, Wangbo Zhao, Yukun Zhou, Junshu Tang, and Yang You. Conditional lora parameter generation . arXiv, 2024.
  • Kai Wang, Dongwen Tang, Wangbo Zhao, Konstantin Schürholt, Zhangyang Wang, and Yang You. Recurrent diffusion for large-scale parameter generation . arXiv, 2025.
  • Zhuang Liu and Kaiming He. A decade’s battle on dataset bias: Are we there yet? In ICLR, 2025.
  • Acknowledgments

    We sincerely appreciate Yuxiang Li, Jiaxin Wu, Bohan Zhuang, Ziheng Qin, Zangwei Zheng, Zihan Qiu, Zexi Li, Gongfan Fang, Xinyin Ma, and Qinglin Lu for valuable discussions and feedbacks during this work.

    BibTeX

    @misc{liang2025draganddropllmszeroshotprompttoweights,
          title={Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights}, 
          author={Zhiyuan Liang and Dongwen Tang and Yuhao Zhou and Xuanlei Zhao and Mingjia Shi and Wangbo Zhao and Zekai Li and Peihao Wang and Konstantin Schürholt and Damian Borth and Michael M. Bronstein and Yang You and Zhangyang Wang and Kai Wang},
          year={2025},
          eprint={2506.16406},
          archivePrefix={arXiv},
          primaryClass={cs.LG},
          url={https://arxiv.org/abs/2506.16406}, 
    }