Recursive graph or not? Or, infinite data structure or not?
While working on implementing a graph database in my spare time, I had troubles choosing the right data structure for the graph. That’s the cornerstone of the whole thing, so some serious thinking was needed to get this base design decision right, because the rest of the implementation is going to depend a lot on it.
The critical part was to choose between a recursive or a non recursive representation. In the presence of cycles, the recursive version is infinite. Below is the conclusion of what I chose, after evaluating the strong points of each approach.
- Traversing a graph can be infinite when cycles are present. The data structure should reflect that. Having a recursive data structure supports it automatically. - It maps precisely the concept. - I can include a graph in a graph, which can enable many interesting algorithms. - It opens a whole new horizons of applications. Infinite data structures can be used to describe processes, assembly lines, fractals, etc. - It's new to me, i.e. interesting and fun!
- It’s simple to understand, and probably quick to implement. At least the basic operations on it. Instances of Functor, Applicative, Monad, etc, are easy to write. - I guess it’s more efficient by default: the underlying containers are optimised, and the direct access to elements is indeed faster than a traversal. - It’s also easy to work with as it’s finite. I can print it entirely on the screen, without having to slice it, in opposition to an infinite data structure. - I am more likely to have termination, even if it’s not granted. And hopefully I would say; I still want to be able to describe and model cycles. - Storage on disk looks much simpler, because it maps the data structure; I don’t have to convert it before writing it.
Verdict: non recursive. I know, disappointing. But here’s why I chose it.
First, it’s simpler to work with. At least for me, i.e. the whole team. I know better the world of finite containers; I can focus more on the choice and tuning of algorithms than on implementing them in the first place.
I want to move on and now I’m kinda stuck, so let’s put a bookmark there and go forward. Anyway, now I’m aware of the trade-offs, which is also an important aspect. It’ll help me reason about the code.
Avoiding recursivity can however be limiting later down the road. Or to put it differently, I don’t know exactly what I am leaving behind me. I have a good feeling, but still. You don’t know what you don’t know. Time will tell.
I can always switch back later on if needed. If this happens, it’s going to be expensive in time, but it’s a home project, and that’ll be a good opportunity to try refactoring large parts of a Haskell code base. What this tells me is that I have to carefully design my interfaces to allow such a change down the road.
What do you think? Would you have made the same decision?