Wisecrack

276d

I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?

Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).

The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.

Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.

If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.

You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate new models,
not new new text, not new imagery, but new models.

It'd be a model for training/inferring/optimizing/generating other models.

random

ai

gpt

all the letters of the alphabet

ml

Ranter

Comments

3

iSwimInTheC

42055

276d

What's the output of the model ingesting embeddings?
Just more embeddings, and then what you put that back into the first model to see what it made?
2

Wisecrack

9198

276d

@iSwimInTheC the inputs for training are the actual floating points that compose other models weights.
We do embedding of those, train, validate, etc.

Inference generates new weights, sort of like autocomplete, given partial weights as input. The goal is to generate plausible weights that are within the desired distribution for some task.

If successful, the output weights are then packaged into a model or used via parameter fine-tuning (lora), which is then used for a given task.

I hope I'm making sense.

Basically it's a model trained on the network weights of other models, instead of on a text or image corpus.
It tries and predicts the next token, or weights, given previous weights as inputs.
2

Wisecrack

9198

271d

@kobenz was thinking the same. Dont have the resources to train anything like that though.
2

Wisecrack

9198

271d

@kobenz not familiar. Give me the gist of the idea.

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service