While learning to train diffusion models, a crucial insight struck: all generative models are predictive. We have known this for LLM's. Essentially, the predict the next most likely token. But many people don't realise that the same is true for generative image models, that seems to be entirely creative. They are trained to essentially remove the noise in an image. they are trained by receiving sets of images and their captions and then adding noise to the image, and tasking it with denoising it. To generate a new image, they are given a noisy image, and they are trained to literally predict which part of the image is noise. It outputs a prediction, in the form of an image, and then this noise is removed from the noisy image. This is done iteratively, until all the noise has been removed. It works like Michelangelo making a sculpture, it looks at a piece of marble and removes parts of it, the noise, until leaving a beautifully sculpted image.
It is fascinating to be that an iterative predictive process is used to generate new images. At no point the computer is really generating new images, it is revealing them from noise, denoising completely blurred and noisy images.
And this begs the question. How is this different from human creativity. Are we fundamentally different? Do we actually create, or do our neural networks follow a similar, but hidden from our consciousness predictive process? For example, when painting or drawing using pencil, are we predicting which next stroke would most correspond to the image we have in mind.
This opens a bigger question: is all creation a predictive process? Predicting which next stroke is the most optimal? Or is human creativity fundamentally different?
At the moment, we don't have the answer, and anyone who claims to have it is probably misinformed. We don't yet know exactly how creative processes work in the brain, and how our neural networks generate new visions.
One can imagine that in order to generate say a painting of a never seen landscape, our brain is remembering how landscapes look like, and then doing some combinations and transformations of them internally, and guiding conscious action using strokes that most closely take the creation there. In some cases, the creation is completely improvisational: there is no mental image that we are trying to resemble. But perhaps the brain is still predicting which stroke would satisfy our artistic and aesthetic appreciation more.
I think we will learn more about the brain, and be able to precisely answer if there are fundamental different between predictive generation, and what our brain does t make a beautiful work of art.