4
Wisecrack
276d

I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?

Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).

The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.

Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.

If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.

You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate *new* models,
not new new text, not new imagery, but new *models*.

It'd be a model for training/inferring/optimizing/generating other models.

Comments
  • 3
    What's the output of the model ingesting embeddings?
    Just more embeddings, and then what you put that back into the first model to see what it made?
  • 2
    @iSwimInTheC the inputs for training are the actual floating points that compose other models weights.
    We do embedding of those, train, validate, etc.

    Inference generates new weights, sort of like autocomplete, given partial weights as input. The goal is to generate plausible weights that are within the desired distribution for some task.

    If successful, the output weights are then packaged into a model or used via parameter fine-tuning (lora), which is then used for a given task.

    I hope I'm making sense.

    Basically it's a model trained on the network weights of other models, instead of on a text or image corpus.
    It tries and predicts the next token, or weights, given previous weights as inputs.
  • 2
    @kobenz was thinking the same. Dont have the resources to train anything like that though.
  • 2
    @kobenz not familiar. Give me the gist of the idea.
Add Comment