Release of Foundry (previously known as rlife) 0.2.0

These past weeks, I’ve been working a lot on my side project and I’ve made a new release of it. First of all, the project has been renamed “Foundry” (instead of “rlife”). I wanted to find a better name for this project and as this project is now actually based on Vulkan (that was my primary objective when I started it), I thought it would be a good idea to give a name related to it. Plus, there was no crates already named “Foundry”.

So the biggest change is that the computations for passing from a generation of a grid to the next one are not done with the CPU anymore but with the GPU via the Vulkan API. To add the Vulkan support, I’ve used vulkano. The grid is represented as a Vec where each cell is a u8. So instead of computing and writing sequentially the new states of the cells in another new grid, we have now two grids (one for the current generation and the other for the next one) contained inside images stored on the Vulkan device’s memory (ideally the graphic card’s memory when there is one on the machine) and the device will launch parallel computations for determining and writing the next states of the cells.

So what are the results in term of performances? It turned out that there are huge gains regarding the time which is taken to compute the next generations of grids. Especially for computing a lot of generations at once and/or for large grids. Here are the results I’ve got with my machine that have an Intel Core i7-6700 as a CPU and an AMD Radeon RX 480 as a GPU (I’ve first generated a grid filled randomly and with a certain requested size and then run the computations):

    • calculating 1000 generations of a toroidal grid with a width of 1024 cells and a height of 1024 cells with the CPU: 74.040 seconds
    • doing the same with the GPU: 0.754 seconds
    • calculating 1000 generations of a resizable grid with a width of 1024 cells and a height of 1024 cells with the CPU: 100.603 seconds
    • doing the same with the GPU: 1.968 seconds
    • calculating 1 generation of a toroidal grid with a width of 16384 cells and a height of 16384 cells with the CPU: 18.917 seconds
    • doing the same with the GPU: 0.083 seconds
    • calculating 1 generation of a resizable grid with a width of 16384 cells and a height of 16384 cells with the CPU: 25.903 seconds
    • doing the same with the GPU: 0.243 seconds

I will soon write proper benchmarks for Foundry for better measurements.

Obviously, this is the very first implementation of the Vulkan API support so there are a lot of optimizations left to do.

See the PR that brings the Vulkan support for more details.

The next goal is to find a way for rendering the grid using Vulkan, so that Foundry can be used within GUI applications.

Advertisements

4 thoughts on “Release of Foundry (previously known as rlife) 0.2.0

  1. Sounds super good. I did something similar last year experimenting with rust threads, channels, openMPI, and openCL. You can also get even faster by using bits per cell since copying grids is time consuming – I never checked in that branch though, then I lost it.
    https://gitlab.com/Luke-Nukem/rs_life

    Even faster is the hash-life algorithm 😀

    Like

    1. What do you mean by using bits per cell?
      I am using integers instead of booleans so that it can support cellular automata with more than two states (so up to 256 if it remains with u8). But Yeah, I see a number of optimizations related to decreasing to amount of memory transfers / allocations. It will be pretty easy with toroidal grids (as theirs size will always be the same, so we can always use around the same couple of buffers/images when doing computations) but less with resizable grids as there will always be time when reallocations will be necessary… There are also the buffers used that are not the best now (for example, the GPU checks the rules for survival / birth from a memory that isn’t located on the graphic card), etc…

      It looks cool what you did! I will take a look at it later (for now I don’t have the dependencies on my computer :P)

      I don’t know how the hash-life algorithm works or even if it can be parallelized to be used on a GPU (it would be super fast if it could!). If you know resources about it I will be thankful if you shared them with me 😀

      Like

      1. Bits per cell being on/off only. I went with u8 for the same reason you did – it’s quite pretty seeing the birth and death of cells along with their liveness.

        Some of the references here are good reads – http://www.conwaylife.com/wiki/HashLife
        I’ve never considered trying to parallelise memoisation before, should be a nice challenge. I guess a concurrent access to the memoisation array would work well. Clueless for using it on GPU though 🙂

        (Same person here as first post).

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s