Comments on Procedural World: Storage Matters

The RocksDB team just released 2.8 today! Note tha...

2014-04-07T14:34:54.729-04:00

The RocksDB team just released 2.8 today! Note that it's optimized for flash, so I'm not sure whether it's ideal for client side use in a general-purpose engine, though you could definitely use it on the server side.

Regarding compression, Facebook uses in-memory compression all over the place. It's highly application-dependent whether it will help, but for large caches it's almost always a win, because the choice is usually between spending some CPU time decompressing the data, after which it will be in the L1, L2, and/or L3 cache, or spending even more CPU time doing nothing while you wait for the data to come back from main memory.

As for caching, do you really expect your app to be stable enough that you're willing to throw out the cache entirely on a crash? Maybe you'd want to invalidate the cache on crash anyway, so perhaps that's not a big deal.

Incidentally, letting the kernel handle your paging for you is not always the best idea. Sometimes it will end up prioritizing your cache over, say, program code or some other data you might want to keep close at hand. You can also run into a situation where you're reading a bunch of data sequentially, and the kernel thinks that means you're going to read a whole bunch more, so it kicks a bunch of stuff out of RAM to make room. This can happen whether or not the memory you're accessing is mapped to a file; it could just be memory that hasn't been touched since you've allocated it.

It's possible, at least on Unix-like systems (not a Windows person myself) to handle your own page faults, passing any that don't belong to your caching system on to the kernel. This may not work so well for pointer-free structures, but you can always turn pointer-free structures into pointer-ful structures. What you do is to allocate space for the root of your structure, then protect it so that it can neither be read nor written. When you get a page fault for that space, you map the root in, "unswizzling" any pointers from their persistent form to pointers to more newly allocated space, which you also protect. Repeat recursively ad infinitum. Now that you're in control of the paging mechanism (which is no slower than letting the kernel do it, except for the necessity of unswizzling the pointers and possibly decompressing now that you have less in RAM), you can do all your allocations from a pre-allocated pool which you can then lock into memory so it never gets paged to disk. See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.648 for a description of this technique, which actually works for all sorts of huge data structures, even on 32 bit hardware, and even when your "source" address space spans the network.

Yes the data chunks have no pointers, all the data...

2014-04-05T13:23:23.190-04:00

Yes the data chunks have no pointers, all the data is local so in case of corruption the damage is constrained within one chunk. This DB is mostly for implementing caches. For long time persistence I recommend using something more reliable. I will take a look at RocksDB, thanks for the suggestion.

MongoDB uses a similar strategy (mmapping files in...

2014-04-05T12:13:57.462-04:00

MongoDB uses a similar strategy (mmapping files into RAM) to achieve its speed. The cost is data corruption on crash: if you let the OS choose the order of writes, you can't guarantee whatever is on disk after a write is consistent. You can do tricks with sentinels per page, but most people who start with this strategy end up abandoning it in favor of some form of write-ahead logging or log-structured store.

You are already a step ahead of most applications since I'm assuming your data is pointer-free and has a lot of locality. Otherwise, the mmap approach would be a lot less atteactive to you. Unless you really don't care about losing your data in a crash, you will end up implementing most of what RocksDB gives you for free, so you might as well just bite the bullet and use RocksDB. Your keys can just be the octree node addresses using a simple prefix scheme.

I have experience doing this kind of storage on huge scale, so I'm happy to give pointers and critique ideas.

I'm still on my "quest" to read your...

2013-07-15T15:15:26.022-04:00

I'm still on my "quest" to read your whole blog from start to finish. Almost there..

I have to reply to this one though as I was conceptualizing this exact thing a month ago (but keep in mind I'm a newbie programmer).

When you say P2P network I always think of torrents. But I know they don't operate as that. As you know they split the data up all over into what I call "data chunks" and you are leeching it from everyone else that is seeding. This is what my mind wraps around for a P2P online game or MMO. Could you have every player seeding/leeching the whole world in this way? But have some "connection conditions" that revolve around players being close to each other. i.e. Player A is near players 1, 2, & 3 so that is who he seeds most (say 80% of his total uploaded data) of his "data chunks" to. Then Players 1, 2, & 3 are seeding what they leeched from Player A (say 15% of their total) to the next closest players, and so on as the distance ring gets larger. (Players that are super far from anyone while either have a random connection to other super "alone" players or to the server)

If your world has more players, then there's more seeders and leechers and it remains at equilibrium. As players get closer, their rates and connection-interaction increases. And all players have some of the new "torrented" data ready with them so that no upload/download/processing is wasted between all players.

Anyways, this is just something that went around my head for a while. I researched it heavily, but found limited information that related to game connections and then decided I'm not experienced enough for it.

I just thought I'd share one of my concepts. I've got many, and you're a guy that may appreciate them or even find one useful :) Cheers!

While your post doesn't address several import...

2013-04-12T02:12:09.060-04:00

While your post doesn't address several important questions, it nevertheless is a good start. Most of the issues that DO arise have already been solved or can be dodged with clever thinking.

For starters there is the issue of bandwidth, which is more a factor than cpu/cache performance, HDD read/write times, and RAM perf combine. You already mentioned one solution, which is initially transmit level/environmental data and from there only transmit world updates to things that are immediately visible. Visibility calculations for determining strictly what IS within the radius of a given player can be done client side, and then the server can throttle this visibility radius as a simple precaution.
Obviously everything not above the ground can remain unsent.

More important is compression. If we are talking chunk encoding of voxel data then we don't have to have xyz coordinates for data
within a given chunk (lets say 8x8x8 subchunks). We can encode each chunk and subchunk as one long chain of bytes. The engine has the dimensions of a chunk preprogrammed, so if each voxel is a byte than the first 16 voxels say, are arranged in a line starting at the top of the chunk in the north west corner and ending at the top north east corner. Move one voxel-width south, and start a new line, repeating the same NW-to-NE sequence until you have a layer. Shift down one layer and continue. Rinse and repeat until a whole chunk is reassembled. The position of the individual voxels is then relative to the absolute position of the containing chunk, so you only have to transmit the chunk XYZ and the voxel data but not the XYZ of individual voxels.

As long as you have the time and skills to do this...

2013-04-09T04:57:33.004-04:00

As long as you have the time and skills to do this! Ambitious dude! :D

Oh! and huge fan of your work, look forward to one...

2013-04-02T06:51:29.743-04:00

Oh! and huge fan of your work, look forward to one day seeing a retail game using the engine!

Look forward to one day, PC's with webgl being...

2013-04-02T06:50:23.710-04:00

Look forward to one day, PC's with webgl being capable of running environments like this xD

May I ask how you went about the database? an already existing database solution or custom crafted one, and if so which language / platform was this written in / for ?

Bitcoin spends a lot of time and energy asserting ...

2013-03-31T22:28:55.376-04:00

Bitcoin spends a lot of time and energy asserting trust. Not sure if this is feasible for a realtime environment.

The engine is called Voxel Farm for a reason: world blocks can be generated anywhere: servers, clients, or event peer clients. The issue with P2P is about speed. A P2P network may not be able to respond fast enough to random queries. Usually you need to come up with each world chunk in 30 milliseconds or so. Both a dedicated server or a local generation can do that. A P2P network not so sure.

The crowdsourced trust sound a little like the sys...

2013-03-31T21:04:39.907-04:00

The crowdsourced trust sound a little like the system for bitcoins (http://en.wikipedia.org/wiki/Bitcoin)

And when it comes to processing the changes, couldn't you do something like a render farm, where the processing is split up on all connected clients. So instead of doing the same processing on multiple clients, wish seems like an unnecessary overhead, you would divide each processing task into chunks spread out on multiple clients. And I'm guessing reading the changes for each client would still be the same.

Wow, you are really not sitting down, but doing lo...

2013-03-30T17:04:01.923-04:00

Wow, you are really not sitting down, but doing lot of work here.
Size what takes game world is really important.
One of good examples of world size reduction we could see in game fuel, it was huge world (useless mostly but still) and it took really low space in hd.
It is important that we keep good performance and size, so it will make community what can work with ending project bigger.

This engine is best for huge open world/sandbox areas.
That is type of thing I like tho most in gaming industry - feeling of open world around and seems your work is dooing good job with that.

good luck :)

What I have right now is equivalent to libuv, alth...

2013-03-29T14:36:34.580-04:00

What I have right now is equivalent to libuv, although it does not cover Unix. It uses the same principles: very few worker threads efficiently managed by the OS completion ports and an asynchronous message model on top of that.

Libuv does fit the profile, and I did consider it, but I would not miss the fun of creating similar components for anything in the world. You can count it as some sort of inspiration.

Have you considered using node.js as your server t...

2013-03-29T04:33:37.971-04:00

Have you considered using node.js as your server technology? (or just the libuv) ?

This looks great, I've been thinking something...

2013-03-28T18:16:38.486-04:00

This looks great, I've been thinking something like this should exist. So, I'm proud of you, with the apparent skill and ability, to do it. Hope this doesn't die here :)

A single instance of the database can only scale a...

2013-03-28T14:56:01.827-04:00

A single instance of the database can only scale as much as the hardware allows. That means there is a hard limit on what one single DB can do. The thing is, this is an octree-based system, so you can have different databases serving different octants. The same applies to other tasks than storage.

Like you said, for some applications only user changes need to be transmitted and stored. If something drastic changes in the servers, like the world definition for mountains, it would mean the user created data does not apply anymore. If you had a house on a hill, the hill may not be there anymore. Your house would be floating. If your application is like that, you need to make sure the main world features do not change once people have started playing in it.

Now if that is somehow part of the gameplay you want, by removing a hill your server game logic could also clear all the user-built stuff over it and place it as a pile of rubbish on top of the new location.

That would make for a nice God game actually.

I'd kinda like to take part in this conversati...

2013-03-28T08:09:31.432-04:00

I'd kinda like to take part in this conversation, but my programming credentials are sadly lacking.

Being artistically inclined, I keep forgetting there is a lot of essential unseen work like this that needs to go on to make this kind of persisent editable world happen.

I'll come back later when we're talking about making things look pretty ;-)

keep up the good work

PS As someone who has read through the whole blog, I can confirm Miguel does refer a number of times to the octree principle the terrain data is built on. Even I vaguely understood it!

Well, when something changes on the server, you co...

2013-03-28T08:00:39.029-04:00

Well, when something changes on the server, you could re-build the client-side worlds... Or, atleast, only change the things which changed. With other online games (multiplayer or otherwise) you also need to pause often to update the client. If you change the world generation algorithm, you'd probably want to shut down the server and update the clients anyway.

The Database seems nice so far, but will it be abl...

2013-03-28T06:08:45.573-04:00

The Database seems nice so far, but will it be able to "scale up" as an option?
Also How about the network load? Its only neccessary to send user-manipulated chunk. But if something changes like the servers, World Generation Algorithm. Wouldn't it make the data unsynced?

Client/server trust has been lurking in the back o...

2013-03-28T05:27:36.068-04:00

Client/server trust has been lurking in the back of my mind when thinking about my own multiplayer game processes. I really like this crowdsourcing idea and it would never have occurred to me! :)

No need to slow down. You ask questions I provide ...

2013-03-28T01:04:26.002-04:00

No need to slow down. You ask questions I provide answers, others benefits from the exchange.

This may have been an introductory post on the system. There is no reason why it should mention how locks are handled, etc. That is up to me to disclose.

I get the feeling you have tried something like this in the past, but probably a lot simpler, which may justify some of the assumptions you have made. Saving voxel blocks on your own by any chance?

>If you have an octree database like this one ...

2013-03-28T00:40:05.459-04:00

>If you have an octree database like this one

Woah, slow down. You mentioned octree databases, the original blog post just said "shiny". I don't think we're even arguing the same things. I said one thread w/o synchronization for fast access. You clearly have information about the database implementation not given in the above blog post. That's kind of important. Maybe you should have had a blog post on that instead?

I'm not trying to question your credibility or decisions here - I agreed in my first comment that your solutation was indeed the way to go. My point is that just because "Windows Task Manager" shows low RAM usage, doesn't mean that's actually what's happening. And if you did more volume read/writing in a shorter timespan, you'd notice that too.

Your point of being limited to 1 thread because ot...

2013-03-28T00:22:02.900-04:00

Your point of being limited to 1 thread because otherwise you would end having some critical sections applies to every field of computer science, not only data storage.

Multithreading implies some degree synchronization. How much it hurts is a matter of how many serial aspects your process has (see Amdahl's law).

If you have an octree database like this one, you can have queries for different octants go in parallel, as the very spatial hashing you use guarantees no collisions.

What is even better, you can make the write-lock resolution adaptive. If you see some octants having a lot of write activity, you can refine the octant into sub-octants and you will produce a new set of locks that will allow more operations to go in parallel.

Only when you have a collision within an octant you need to serialize, but this is a fraction of cases. Even in this case you would have a reader/writer type of lock. So all reads will go in parallel, only writing will block reads. For a system where you read a lot more than what you write this has incredible gains.

Regarding the RAM cost of compression/decompression. Have you look at how completion ports work? The reality for any server is you have a very limited number of threads to do your work anyway. For obvious reasons you store the data compressed, only when you need to use it you decompress it. Now there is no point in having decompressed data if you are not using it. Since you have a very limited number of threads, that also means you are using a very limited set of data at any time. This set is what you need to keep in RAM.

In my case the uncompressed data for a chunk is 250K. A server with 4 cores would have 8 worker threads. That means is less than 2 Megs worth of RAM to deal with the uncompression. Now, most of the uncompression actually happens in the clients as you want to send compressed data down the wire as well.

>Like I said this is happening from multiple th...

2013-03-27T23:36:28.012-04:00

>Like I said this is happening from multiple threads, 12 in this particular case. Not sure where do you get this idea of the OS paging system having undefined behavior with multithreading. This is actually the recommended way to do IPC in Windows, with processes and many threads participating at the same time.

Sure. With locking and proper synchronization, which significantly hurts the IO performance.

http://stackoverflow.com/questions/9463403/shared-memory-c-read-and-write-synchronisation

>Note it is 10 GB read and 10 GB written, simultaneously from multiple threads. It is a DB not just flat storage, so there are spatial queries going on in every case. It involved updating a lot of existing records, managing fragmentation, and other databasey things. It is not just appending information, which would be straightforward. You would just write the the HD in that case.

The very nature of OS paging is completely file based. You might have a database abstraction ontop, but at it's core, it is very much flatfiles.

>As for compression and uncompression, it does not increase RAM use at all. Data can only be used by the cores you have available, if you have 8 cores, the maximum workers threads you should have is 16. This is 16 uncompressed buffers, which is peanuts.

To decompress the data, you take the compressed data (from the database), apply the algorithm to the stream to decompress the data, which creates a new copy of the data. So you now have 2 copies of the data, compressed from the database, and uncompressed from the decompression algorithm. In a best case scenario, your RAM is doubled.

>This notion of my compressed 10GB being equivalent to a 50MB Minecraft dataset, not sure where it is coming from. Is that bistromatics?

Not at all. http://en.wikipedia.org/wiki/Zip_bomb

2013-03-27T23:33:58.199-04:00

This comment has been removed by the author.

I don't mind criticism, just having a bit of t...

2013-03-27T22:09:03.982-04:00

I don't mind criticism, just having a bit of trouble with your logic and information.

Note it is 10 GB read and 10 GB written, simultaneously from multiple threads. It is a DB not just flat storage, so there are spatial queries going on in every case. It involved updating a lot of existing records, managing fragmentation, and other databasey things. It is not just appending information, which would be straightforward. You would just write the the HD in that case.

Like I said this is happening from multiple threads, 12 in this particular case. Not sure where do you get this idea of the OS paging system having undefined behavior with multithreading. This is actually the recommended way to do IPC in Windows, with processes and many threads participating at the same time.

As for compression and uncompression, it does not increase RAM use at all. Data can only be used by the cores you have available, if you have 8 cores, the maximum workers threads you should have is 16. This is 16 uncompressed buffers, which is peanuts.

This notion of my compressed 10GB being equivalent to a 50MB Minecraft dataset, not sure where it is coming from. Is that bistromatics?