A text comes from pictures at the author.
The author, of course, knows the correspondence between the text and the pictures.
But the text itself does not imply the pictures.
For readers, developping an image from a text is a blind jump.
Practically, those who can read the text are those who posess a similar image to the author's.
(Descripting my coffee cup by means of letters will be a hard task, and I cannot expect others to imaginate a similar coffee cup.)
If pictures can be presented in picture form, it is the best.
And here multimedia come !