In traditional polygon-based worlds, a lot of the detail you see has been captured in textures. The same textures are reused many times over the same scene. This brings a lot of additional detail at little memory cost. Each texture needs to be stored only once.
Let's say you have a column that appears many times over in a scene. You would create a triangle mesh for the column with enough triangles to capture just the silhouette and then use textures to capture everything else. A process called UV-mapping links each triangle in the mesh to a section of the texture.
This is how virtual worlds and games get to look their best. If we had to represent everything as geometry it would be too much information even for top-of-the-line systems. If given the choice, nobody would use UV-mapping, but there is really no choice.
With voxels you have the same problem. If you want to capture every detail using voxels you would need too many of them. This may be possible in a near future, maybe for different hardware generations and architectures, but I would not bet on voxels becoming smaller than pixels anytime soon.
The beauty of voxels is they can encode anything you want. We saw it would be possible to keep voxel resolutions low and still bring a lot of detail into scene if we encoded UV mapping into voxels, just like vertices do for traditional polygon systems.
You can see some very early results in this video:
Luckily we need to store UV only for those voxels in the outside, so the data is manageable. For procedural objects, voxels could also use UV. The rocks we instance over terrain could be using detailed textures instead of triplanar mapping. Same for trees and even man-made elements like buildings and ruins. For procedural voxels the UV adds little overhead since nothing has to be stored anyway.
Use of UV is optional. The engine is able to merge UV-mapped voxels with triplanar-mapped voxels on the fly. You can carve pieces out of these models, or merge them with procedural voxels and still have one single watertight mesh:
As you can see the leopard's legs do not go underground in the rendered mesh. Everything connects properly.
This is an earlier video I think never linked before:
Why go over all this trouble? UV-mapping took polygonal models to a whole new level of visual quality and performance. We are going through the same phase now.
This kind of encoding we have done for UV also opens the doors to new interesting applications, like animation. If you think about it, animation is not different from UV-mapping. Instead of mapping vertices to a texture we map vertices to bones, but it is pretty much the same. So, yes, that zebra could move one day.