Connect Four Programming Challenge Pt 2.
Continuing from Part 1...
So we've discussed the design decision behind the board, the pieces, and the representation of the board. Let's continue with talking about the algorithmic optimizations for checking the victory condition.
#Winning
Ultimately the original problem of checking for victory has a relatively quick solution - with size 7x5, there's an easy upper limit of 8*35*4 = 1120 accesses to check all possible arrangements for victory (8 directions, 4 pieces in each direction, 35 starting points). In reality it's even less because you can't check 4 pieces away in every direction (e.g. corners, edges) - this is just a rough back-of-the-envelope calculation.
This is something that's solved well under a second on modern computers! However, just because we don't have to think of optimizations doesn't mean we shouldn't! For example if this memory system was not RAM but instead some archaic physical system where each bit of memory is a light switch and someone is transcribing this by hand... I digress. In any case where this delay would be prohibitive, optimizing the number of accesses could net a huge improvement!
Additionally, looking at the algorithm, you can see that it is O(xyn) by the 35 and 4 in the calculation - this solution scales with the number of position in the grid! This would be terrible for larger boards.
Thinning the Herd
The first simple optimization is barely an optimization at all - it's just not checking the boundaries where there are less than four (n in the general case) in that direction to check. Or rather, it's about making a better estimate of our naive algorithm to start with. If we limit our algorithm to valid boundaries and consider that on the same set of four from the grid, direction does not matter (up/down/left/right, etc.)
For (x,y,n) = (7,5,4), we get 200. This is much less than the original 1120 so that's already better! Why bother with optimization if for a board of 35 we only have to check 200?
Well we can see the formula for the number of accesses is:
The derivation is left as an exercise for the reader :)
With some assumptions, namely that x,y,n are fairly similar in size to n, we see that the equation reduces to O(n^3 + n^2 + n^2) = O(n^3). More specifically its the leading term O(xyn). Even though we've saved some accesses, the order of magnitude is still the same! We check the whole board and at least n times the number of spaces in the board. We can do better.
We recognize one fact to optimize further:
A row containing a four in a row is still a valid victory so there's no need to quadruple our work for values in the middle. This optimization is a form of temporal locality
The middle values are read in duplicate, whereas a single read will accomplish the same thing
We simply need to make a counter that counts the # of repeats as it goes! If it is larger than n (4), then we've won! This only reads any given row/col/diagonal once! We save a whole factor of n - for a board of 7x5, we're down to 122 accesses!
We're down to O(xy)!
Another One
Unfortunately, we're still not done! We can do even better. Technically this seems like the best be can do given a static board in isolation. However, in context, we are given more information to work with. Presumably the use case of a victory checking algorithm is checking each round whether or not a user has won yet! Therefore, we can safely make the assumption that we only really need to check locally around the last played piece rather than the whole board. In fact, we can actually make this guarantee of correctness for any board where you're not playing after someone has already won. (Here's looking at you weird test cases)
Like a pretty flower!
If we're only checking boards that haven't won yet, this solution has accesses on the order of O(n) - 8n-4! This is great! Even for large boards, we only check locally - the only value read in multiple times is the last played one - at the cost of O(1) space to store the index of the last piece. This is a form of spatial locality for array accesses
Taking It Even Further
This is where I get off the train for optimization - I didn't do these ideas in less than two hours nor do I think I can. I'm just going to list some concepts and some potential ideas instead of fleshing them out. Additionally, given that my language of choice is python some of these memory management options might be out of control / difficult to do.
- Read in our local areas in chunks/blocks to take advantage of cache lines
- Do our best to keep the relevant 'chunk surface' cached (all the chunks surrounding the possible moves can be read in advance if n is small)
- In line with the above, we can ensure to drop the bottom part of the board (which is basically dead if it is not relevant to winning in a large board late game, farther than n-1 from the top).
- The above is true for caching, but also true for UI. We could continue to display the rest of the board to the user as the board fills up but for all intents and purposes the bottom of the board is also useless (think very tall y)
To be continued in Pt 3