Advent of code 2023 in under a second in python (ish!)

Advent of Code 2023

Warning, if you plan to complete Advent of Code 2023, this post contains spoilers so don’t read it until you’re done.

This was my second year completing Advent of Code live during December. I liked it so much last year that I completed 2019, 2020 and 2021’s puzzles (not live, obviously) in early 2023. You can read more about my first-time 2022 experience here. I really enjoyed doing AoC again this year, so here’s a quick post on what I learned.

All my solutions, written in Python, are available on github.

I’m really impressed that Eric Wastl has managed to produce this whole contest for so many years running - it must take an incredible amount of effort throughout the year to devise puzzles which are accessible to most programmers but also stretch the capabilities of the best and separate the field. You can hear him talk about how he puts the whole thing together in a talk he gave a few months ago on YouTube

This year’s competition started out a bit harder than usual (e.g. part 2 of day 1, which involved finding numbers spelled out in words within the input had non-trivial edge cases which required a little thought – this problem wasn’t hard, but it was unexpectedly complex for a day 1 problem). Several of the problems required a bit more reading than we’ve been used to in previous years too. There was much speculation that Eric was purposefully trying to prevent LLMs automatically solving the puzzles, but he has specifically clarified that puzzles were not designed with this in mind.

My favorites

Across all the days, my favorite problems were:

  • Day 20: Pulse Propagation - This problem models something a bit like digital logic gates, which I enjoyed. I modeled all the signals to be propagated in a deque which allows efficient popleft and append operations and wrote a relatively elegant method to add signals to the queue and bookkeep the counts required for the answer to part 1. Part 2 involved introspecting the supplied input (I like this kind of AoC puzzle, many others hate it!) to understand the structure of the network formed and find loops for a tractable runtime.
  • Day 19: Aplenty - This problem was not only fun because the input data fields spelled out XMAS but also because part 2 required a kind of recursive range chopping in X,M,A,S ranges to implement efficiently. I’ve previously found these kind of problems quite tedious, but this one came out very nicely given how cleanly it is structured.
  • Day 17: Clumsy Crucible - OK, this one was not my best performance in terms of getting it implemented quickly. It came out on a Saturday night which also happened to be date night for my wife and I. We enjoyed a lovely dinner with wine pairings and got back late. I made little progress that night, but woke up bright and early and enjoyed steering a rickety crucible around a maze using Dijkstra’s algorithm. It only required a tiny tweak to adapt my part 1 solution to part 2, which was gratifying.

Best lessons learned

Shoelace formula & Pick’s theorem

One of the things I like about doing AoC live is that you can learn a lot from the rest of the community. For instance, Day 10: Pipe Maze involved following a pipe around a messy grid and then computing the interior area of the figure it traced. I used the even/odd rule to solve this one. However, some of my colleagues used a technique that I hadn’t been familiar with before - namely the Shoelace formula and Pick’s theorem. The shoelace formula is a really neat mathematical algorithm to determine the area of a simple polygon whose vertices are described by their (x,y) coordinates in the plane. It’s this simple:

    def shoelace(verts):
        res = 0
        for l,r in pairs(verts):
            res += l[0]*r[1] - l[1] * r[0]
        return res

Pick’s theorem provides a formula for the area of a simple polygon with integer vertex coordinates, in terms of the number of integer points within it and on its boundary. You could use these together to solve day 10.

Imagine how happy I was to have just learned about the shoelace formula when Day 18: Lavaduct Lagoon came along. This problem requires that you calculate the area of a polygon described by the path which encloses it. In part 1 the shape is small and it’s easily solved with a breadth-first search [BFS] / flood fill, but part 2 scales the shape up so it is truly massive and BFS is no longer effective. The solution is to use Shoelace and Pick’s :-). There’s an excellent video by hyperneutrino that explains it in more detail.

networkx and sympy

NetworkX is a Python library for studying graphs and networks. SymPy is an Python library for symbolic computation. I have seen people use both of them to solve AoC problems in previous years but had not used them myself. Completing day 24 and 25 quickly ~required using libraries like these and I’m glad I took the opportunity to learn them (though it would have been convenient to have played with them more beforehand vs learning while trying to solve a live puzzle).

Part 2 of Day 24: Never Tell Me The Odds involved solving what amounted to a simultaneous equation of nine variables in nine unknowns. I was able to see the simultaneous equation quickly, and even had a stab at solving it on paper, which was not going to happen quickly. I grabbed sympy and was quite easily able to express my equations in the correct syntax to compute the answer. Note that I later tried doing this in Octave only to discover that Octave’s symbolic logic package itself delegates to sympy! I later replaced sympy with Z3 for faster runtime (read on).

sx,sy,sz = sym.Symbol('sx'), sym.Symbol('sy'), sym.Symbol('sz')
dx,dy,dz = sym.Symbol('dx'), sym.Symbol('dy'), sym.Symbol('dz')

eqns = []
vars = [sx, sy, sz, dx, dy, dz]
for i,s in enumerate(stones[:3]):
    nn = sym.Symbol("n"+str(i))
    eqns.append(sym.Eq(sx + nn * dx, s[0] + nn * s[3]))
    eqns.append(sym.Eq(sy + nn * dy, s[1] + nn * s[4]))
    eqns.append(sym.Eq(sz + nn * dz, s[2] + nn * s[5]))
    vars.append(nn)

result = sym.solve(eqns, *vars)[0]

Day 25: Snowverload required finding a minimum-cut in a graph. I wrote a brute force solution which worked for the example and, being day 25 (which is usually easy, it is Christmas after all), I expected would be enough for the real input. It was not! I didn’t know the graph algorithm for min-cut but did remember seeing stuff about cutting graphs when I had messed with networkx briefly last year. A quick Google search [graph cut networkx] helped me find the right features to use to solve my puzzle. Later I enjoyed hyper-neutrino’s video showing I only scratched the surface of what networkx can do. I’ve since explored the library in a lot more detail.

I later updated my day 25 solution to use karger’s algorithm to improve the runtime. I learned about this from one of my colleagues. I was more impressed by her solution than any of the ones using networkx but it’s great to have such a powerful library to solve problems like these quickly.

Optimizing

I enjoyed solving the problems, but like last year, I probably learned more from optimizing my solutions to achieve the best runtime. I was really happy to be using Python again - it enables enchantingly terse but still readable code and has a tremendous standard library. However, Python is definitely not the most performant language for AoC problems – there were plenty of days where bruteforce solutions in languages like Rust were possible long after Python had tapped out. There’s a well known blog post about solving all of AoC 2019 in Rust in under one second (on a single core). I set about optimizing my code so each individual solution ran in under a second on my 2021 M1 MacBook Pro.

It wasn’t immediately obvious that this was going to be possible – my baseline day 23 solution, where I felt I had found a really neat approach to save time, took 17 seconds to complete. But, here’s how I fared:

Problem baseline (s) CPU runtime (s) notes
day01 0.126 96% cpu 0.044 switched to a non-regex soln for 3x speedup
day02 0.040 97% cpu 0.039
day03 0.211 96% cpu 0.030
day04 0.035 96% cpu 0.034
day05 0.037 96% cpu 0.037
day06 0.140 96% cpu 0.032
day07 0.088 96% cpu 0.082 poker, minor speedups
day08 0.041 97% cpu 0.041
day09 0.076 92% cpu 0.064 copy vs deepcopy
day10 0.062 97% cpu 0.061
day11 0.073 96% cpu 0.042 minor speedups
day12 0.413 98% cpu 0.281 DP, seems close to optimal, tried minor tweaks
day13 0.047 97% cpu 0.046
day14 0.514 99% cpu 0.334 stones roll on grid, soln optimized
day15 0.048 97% cpu 0.048
day16 0.940 337% cpu 0.297 beams, 2x via “.” follow, then sharded across cpus
day17 1.057 99% cpu 0.471 Clumsy Crucible, used custom Bucket Heap
day18 0.040 96% cpu 0.033
day19 0.065 95% cpu 0.045 Already super fast
day20 0.188 98% cpu 0.107 Algo tweaks
day21 8.463 97% cpu 0.108 Huge speedup by reusing state from previous steps
day22 2.412 575% cpu 0.376 Minor algo tweaks, sharded
day23 16.810 557% cpu 0.819 Remapped graph to 2^n, sharded
day24 1.110 99% cpu 0.252 Moved to Z3, faster than sympy
day25 0.300 98% cpu 0.132 Implemented Karger’s instead of networkx

I did it - every solution completes in under a second. Here are a few neat things I learned:

Theoretical performance vs real-world performance

My solution to Day 17: Clumsy Crucible used Dijkstra’s algorithm. At the heart of Dijkstra’s is a priority queue to ensure we always handle the lowest cost nodes first. My initial implementation used the builtin heapq module which implements a Binary Heap which has an average-case O(1) insert and O(log n) delete-min [those are the operations priority queues need]. That’s great for solving most problems, but could a simpler data structure speed things up in practice? I replaced the heapq with a very simple Bucket Queue which I wrote myself, resulting in a ~2x speed up. Here’s the whole bucket heap implementation:

class BucketHeap:
    def __init__(self):
        self.heap = defaultdict(list)
        self.min_bucket = math.inf

    def insert(self, element):
        self.heap[element[0]].append(element)
        if element[0] < self.min_bucket:
            self.min_bucket = element[0]

    def delete_min(self):
        # Note this is a LIFO within the bucket which works for this
        # problem but may not be what you expect.
        min_element = self.heap[self.min_bucket].pop()
        if len(self.heap[self.min_bucket]) == 0:
            del self.heap[self.min_bucket]
            try:
                self.min_bucket = min(self.heap.keys())
            except:
                self.min_bucket = math.inf
        return min_element

Bitwise operations

Even in python, bitwise operations on numeric values are much faster than using set or similar to keep track of state. Day 23: A long walk was a relatively fiendish, exhaustive search problem to find the longest path from start to finish in a maze. A naive depth first search was too slow and you had to take advantage of the problem structure to simplify the state space of the maze (which had long passageways and relatively few intersections) into a graph with fewer nodes to search exhaustively. All this I had done already to get my 17 second runtime version. However, I found that remapping the entire graph from nodes named by their x,y coordinates to simple integer names where each integer was a unique power of two meant that I could keep track of visited nodes in an integer rather than a set during my search, resulting in a 3-4x speedup. Marking a node visited is a simple bitwise OR visited |= node, checking if it was visited is a bitwise AND visited & node and marking unvisited is a bitwise XOR visited ^= node

    # Remap the whole graph to power of two integer node names so we
    # can mark visited really quickly.
    gg = {}
    map_ = {}
    for i,k in enumerate(graph.keys()):
        map_[k] = pow(2,i)
    for k,v in graph.items():
        gg[map_[k]] = [(map_[(x[0],x[1])],x[2]) for x in v]
    return gg,map_[start],map_[end]

Multiprocessing in Python!??

So, I’d pulled all the obvious algorithmic tricks and yet had a couple of solutions running in 2-3 seconds, not my desired sub-second runtime. BTW if you have seen sub 1s single core solutions in python to day 22 and day 23, please let me know! You know what my M1 Max processor has? More than one core. Could I exploit those cores to solve every puzzle in less than a second each?

While this would be very simple in a language like C++, Rust, Java or Go. Python is notorious for not enabling concurrent parallelism due to its Global Interpreter Lock. However, there is a concurrent.futures module which provides a ProcessPoolExecutor which can marshall code running in entirely separate python processes (using the same source code file) to parallelize progress. You can’t communicate across the processes, but results can be merged back together on the parent process effectively. While this is by far the jankiest concurrency primitive I’ve ever used, it was quite fun to string it all together.

For day 23, I unrolled the first few of layers of my depth-first search and sharded them across processes. For day 22, I partitioned the test brick indices across shards and combined those results in the parent.

import concurrent.futures

if __name__ == "__main__":
    part2_data = partition()
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(start_dfs, part2_data)
        print(max(results))

And, for good measure…

Can we parallelize all the solutions and achieve all of AoC 2023 in python in less than 1s total? Naively: not quite! I ran all of the solutions simultaneously under GNU parallel and they completed in 1.53 seconds – close! So… I messed around with optimizing this - e.g. there is a lot of overhead in starting the python interpreter, indeed my solutions to days 1-15 and 18-21 can all run on one core in ~980 ms in the same source file, so this felt tantalizingly close to possible. Taking some extreme liberties, I pulled all the solutions into a single source file, allowed input to be loaded into RAM (but not parsed) and then kicked off everything in parallel:

🚀 pypy3 allatonce.py
**** total time (s): 0.995249

So… You can complete all of AoC 2023 in python in less than a second! (across 10 cores on a farily high end though not state of the art machine)

Conclusion

This was a lot of fun. Eric Wastl is a genius and yes, you can complete the whole set of puzzles in under 1s in python if you squint a little.

Finally, though, doing Advent of Code live takes a lot of time! I did most of the puzzles soon after they came out. While I didn’t make it onto the global leaderboard (my best performance was #187 for day 21 part one), I think I’m quite fast at completing them. I still spent in aggregate an estimated 25h19m completing the puzzles. Finding a whole extra day during the busiest time of the year requires a lot of commitment. On reflection, I found the years I completed asynchronously more fun and a slightly better learning experience so I don’t plan to do AoC live again. If you are thinking about trying it yourself, I strongly recommend starting with 2019 which I found by far the most fun - you’ll build a VM over multiple days and use it to play a game of breakout to solve a puzzle!

Citations

gnu parallel

Tange, O. (2023, December 22). GNU Parallel 20231222 (‘Sundhnúkagígur’). Zenodo. https://doi.org/10.5281/zenodo.10428184