There are some tasks we can automate, and that will happen. That’s been a very long-running trend, though; it’s nothing new. People generally don’t write machine language by physically flipping switches these days; many decades of automation have happened since then.
I also don’t think that a slightly-tweaked latent diffusion model, of the present “generative AI” form, will get all that far, either. The fundamental problem: taking an incomplete specification in human language and translating it to a precise set of rules in machine language making use of knowledge of the real world, isn’t something that I expect you can do very effectively by training on a existing corpus.
The existing generative AIs work well on tasks where you have a large training corpus that maps from something like human language to an image. The resulting image don’t have a lot by way of hard constraints on their precision; you can illustrate that by generating a batch of ten images for a given prompt that might all look different, but a fair number look decent-enough.
I think that some of that is because humans typically process images and language in a way that is pretty permissive of errors; we rely heavily on context and our past knowledge about the real world to obtain meaning up with the correct meaning. An image just needs to “cue” our memories and understanding of the world. We can see images that are distorted or stylized, or see pixel art, and recognize it for what it is.
But…that’s not what a CPU does. Machine language is not very tolerant of errors.
So I’d expect a generative AI to be decent at putting out content intended to be consumed by humans – and we have, in fact, had a number of impressive examples of that working. But I’d expect it to be less-good at putting out content intended to be consumed by a CPU.
I think that that lack of tolerance for error, plus the need to pull in information from the real world, is going to make translating human language to machine language less of a good match than translating human language to human language or human language to human-consumable image.
In the long run, sure.
In the near term? No, not by a long shot.
There are some tasks we can automate, and that will happen. That’s been a very long-running trend, though; it’s nothing new. People generally don’t write machine language by physically flipping switches these days; many decades of automation have happened since then.
I also don’t think that a slightly-tweaked latent diffusion model, of the present “generative AI” form, will get all that far, either. The fundamental problem: taking an incomplete specification in human language and translating it to a precise set of rules in machine language making use of knowledge of the real world, isn’t something that I expect you can do very effectively by training on a existing corpus.
The existing generative AIs work well on tasks where you have a large training corpus that maps from something like human language to an image. The resulting image don’t have a lot by way of hard constraints on their precision; you can illustrate that by generating a batch of ten images for a given prompt that might all look different, but a fair number look decent-enough.
I think that some of that is because humans typically process images and language in a way that is pretty permissive of errors; we rely heavily on context and our past knowledge about the real world to obtain meaning up with the correct meaning. An image just needs to “cue” our memories and understanding of the world. We can see images that are distorted or stylized, or see pixel art, and recognize it for what it is.
But…that’s not what a CPU does. Machine language is not very tolerant of errors.
So I’d expect a generative AI to be decent at putting out content intended to be consumed by humans – and we have, in fact, had a number of impressive examples of that working. But I’d expect it to be less-good at putting out content intended to be consumed by a CPU.
I think that that lack of tolerance for error, plus the need to pull in information from the real world, is going to make translating human language to machine language less of a good match than translating human language to human language or human language to human-consumable image.