The terms "Big Data" are frequently thrown around these days. I don't know about you, but this is what comes to mind:
Big Data is when you have so much information that traditional storage and management techniques begin to lose their edge.
One of the promises of Procedural Generation is to cut the amount of data required to render virtual worlds. In theory it sounds perfect. Just describe the world by a few classes, pick a few seeds and let everything grow from there. It is storing metadata instead of data. The actual generation is done by the player's system, data is created on-demand.
The problem is how to generate compelling content. The player's machine may not be powerful enough for this. Odds are you will end up targeting the lowest common denominator and your options will become seriously limited. For instance, Minecraft generates the entire world locally. This limits its complexity. Blocks are huge and there is little diversity. For Minecraft it works, but not many games could get away with it. If you want some kind of realistic graphics it is probably out of the question.
As it seems, data must be generated away from the user`s system. One possible approach is to use procedural techniques to create a static world. Whatever data constraints we may have can be built in as parameters to the procedural generation. For instance we could make it sure the entire world definition fits on a DVD.
On the other hand, procedural techniques can produce huge datasets of very unique features. If you don`t really need to fit on a DVD, Bluray or any type of media you can keep around your house, you could create very large worlds. This is not new, Google Earth already does that. You don't need to download the entire dataset to start browsing, it all comes on-demand. A game world would not be as large as our real planet, but it would be a lot more complex and detailed. The world's topology for sure will be more than just a sphere.
Then it becomes about Big Data. While generation and storage may be simple and affordable, moving that much data becomes a real challenge. What is the point of becoming so big when you cannot fit through the door anymore?
One option is not to move any data and render the user's point of view on the server. You would stream these views down to the player's machine pretty much like you would stream a movie. This is what the OnLive service does for conventional games. It is appealing because you can deal with bandwidth using generic techniques that are not really related to the type of content you produce, like the magic codecs from OnLive and their competition. But there are a few issues with this. Every user will create some significant load on the servers. Bandwidth costs may also be high, every frame experienced by the users must go down your wires.
The other option is to do it like Google Earth: send just enough data to the client so the virtual world can be rendered there. This approach requires you to walk a thin line. You must decide which portions of data to send over the wire and which ones can be still be created on the client at runtime. Then, whatever you choose to send, you must devise ways to compress it the most.
It is not only about network bandwidth. In some cases data will be so big, traditional polygon rendering may not make sense anymore. When you start having large amounts of triangles becoming smaller than actual pixels after they are projected into screen, it is best to consider other forms of rendering like ray-tracing. For instance this is what these guys found:
This is taken from their realtime terrain engine. This dataset covers a 56 km × 85 km area of the Alps at a resolution of 1 m and 12.5 cm for the DEM and the photo texture, respectively. It amounts to over 860 GB of data
I am a hobbyist, having dedicated servers to render the world for users is not even an option. I rather pay my mortgage. If I want anyone to experience the virtual world I'm creating, I will need to stream data to end-users and render there. I have not figured out a definitive solution, but I have something I think will work. My next post will be about how I'm dealing with my big data.