That’s perhaps why image generators are comparatively better than text generators. But there’s still something off, by your example it seems that the model cannot reliably use clues like position to understand “this is a «leg»”. And I don’t know much about image generators but I think that they’re still statistics- and probability-based.
That’s perhaps why image generators are comparatively better than text generators. But there’s still something off, by your example it seems that the model cannot reliably use clues like position to understand “this is a «leg»”. And I don’t know much about image generators but I think that they’re still statistics- and probability-based.