Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "image-generation"
-
Wrote a image generation api which uses a headless browser within 2 days ^^ it generates an image like the one below in around 500ms while also loading external resources like that avatar and the icons. ^^2
-
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
It's been a while since i stopped programming.....
It's been so busy with all the school work/assignments/ and the most important part is that school ends at 10pm, arrive home at 11pm, prepare for tomorrow school stuff, sleep at 2am, wake up at 7am next morning, and again ends at 10pm 5 days a week...
It is exhausting, but I am getting used to this routine.
Studying my own programming skills or working on a side project? Not sure when to do it... The only way to continue studying is at breaks at school, or sleep less and study....
But it is impossible....
I have some great projects that are waiting to go out to the world, to list a few:
- cloud gaming
- cloud storage with live streaming
- complete school schedule management
- home automation framework in dotnet
- deepfakes and ai image generation algorithm (~18 months of training till now)
- game cheat engine (20GB total omfg ^^)
- and more
and I don't have time to finish it. lol
I think it will see the bright world after 3 years of high school... By then, my projects will be ancient, probably....
TIme is really short.
24 hours equally, but feels like 8 hours a day....
Should I abandon the project rn and focus on studying? (probably should)
or should i sell the project or open source it?
Also, how do you manage your time between work(study) and side projects (especially big ones)?4 -
Anyone tried converting speech waveforms to some type of image and then using those as training data for a stable diffusion model?
Hypothetically it should generate "ultrarealistic" waveforms for phonemes, for any given style of voice. The training labels are naturally the words or phonemes themselves, in text format (well, embedding vectors fwiw)
After that it's a matter of testing text-to-image, which should generate the relevant phonemes as images of waveforms (or your given visual representation, however you choose to pack it)
I would have tried this myself but I only have 3gb vram.
Even rudimentary voice generation that produces recognizable words from text input, would be interesting to see implemented and maybe a first for SD.
In other news:
Implementing SQL for an identity explorer. Basically the system generates sets of values for given known identities, and stores the formulas as strings, along with the values.
For any given value test set we can then cross reference to look up equivalent identities. And then we can test if these same identities hold for other test sets of actual variable values. If not, the identity string cam be removed, or gophered elsewhere in the database for further exploration and experimentation.
I'm hoping by doing this, I can somewhat automate the process of finding identities, instead of relying on logs and using the OS built-in text search for test value (which I can then look up in the files that show up, and cross reference the logged equations that produced those values), which I use to find new identities.
I was even considering processing the logs of equations and identities as some form of training data perhaps for a ML system that generates plausible new identities but that's a little outside my reach I think.
Finally, now that I know the new modular function converts semiprimes into numbers with larger factor trees, I'm thinking of writing a visual browser that maps the connections from factor tree to factor tree, making them expandable and collapsible, andallowong adjusting the formula and regenerating trees on the fly.7 -
So I thought to myself.
Hey I'll go ahead and use python, it will make this easier than using c++.
So I start looking at python.
And I start looking at specific common functions that c/c++ and .net all offer.
Like writing a fucking png image.
And I start seeing 3rd party libs that are at version 0.2
And so I say, this is supposedly the language data people love. which would include searching gis data too right ?
Everybody touts this level for ai and machine learning and all this other bullshit but I can't even create a fucking image ? And every document points to this same lib where it comes to creating this image ? at version 0.2 ?? 20 years or more after PNG was created ?
So I look up geotiff, and see 0.4........ so..... what is this language good for again ? I can parse json in javascript and do the other things I want...
Oh scatterplot generation ? What is it being displayed in jpeg ? Maybe the jpeg implementation is good. because you know i just use scatterplots constantly. yup. most of the data I require to analyze uses scatterplots. not risk.
fun.
oh and look django.... who the fuck uses django ?
and omg it makes me format my text or the run bombs.....
jesus. rpg much ?
I'm just... I'm not seeing...
WHY ?????????
and then I have zimmermans voice buzzing in my head about just using goddamn .net26