Incremental steps toward a self-contained Lightning Node testing environment

Since I last wrote, I've been tackling a few of the challenges identified in my previous posts. In the process, I've learned about the Go language, operator implementation, and lots about btcd, the Go implementation of a Bitcoin Node.

To recap, my goal has been to develop a Kubernetes operator that is useful for managing Lightning nodes. Lightning is a layer 2 protocol built on the Bitcoin protocol which enables Bitcoin transactions to occur without necessarily writing to the Bitcoin blockchain (at least not immediately). So far, I've spent the majority of my effort supporting management of Bitcoin nodes. The reason for this is that in order to experiment with the Lightning protocol, I need to create a simulation of the Bitcoin network. I'm not quite ready to test my software using real Bitcoin, so instead I must create my own network so that I can transact with "fake" Bitcoin (coins that are only meaningful within my own testing environment). Fortunately, I'm not the only one who has faced this challenge. The developers of lnd (the current mainstream Lightning node implementation) have built features into btcd that faciliate simulation of the Bitcoin network. Specifically, nodes can be run in simnet mode which creates a private isolated network.

Aside from isolating my own Bitcoin network, another challenge is simulating block production. We need some block production in order to a) mature the blockchain to the necessary block height required unlock features, and b) support opening and closing Lightning channels.

I've implemented the following features in the kiln operator to support block production in a test environment:

The ability to read the current block height from a deployed Bitcoin node
The ability to generate a number of blocks on-demand in order to acheive the user-specified minimum block height
The ability to activate CPU mining in order to continuously generate blocks

In the process of implementing this small set of features, I learned some things about implementing operators and about btcd.

Operator implementation notes

This may seem like a very simple revelation, but my first major note-to-self has to do with the power of starting the reconcile process over again from the top with something like the following statement:

return ctrl.Result{Requeue: true}, nil

This means exit the current reconciliation and start from the beginning. This simple technique is powerful if used in the right places. You certaintly wouldn't want to "requeue" in an infitite loop, however when reconciliation steps result in a change, it can be a good idea to start from the beginning again. The idea is that the second time around, the particular conditional branch which contained the requeue will not be executed because the previous reconcile loop already executed the necessary change. The execution of the reconciliation process may loop like this several times if several changes are needed, but eventually reconciliation process will execute through a conditional path that finishes without a requeue and returns a success.

return ctrl.Result{}, nil

It already seemed intuitive that reconciliation steps should always be idempotent, and this only emphasizes the importance of that practice.

I noted previously that I was having some trouble with timing of certain reconciliation steps, and after some fiddling, I realized this technique could also be used to implement retries, and even add a delay to the retry. For example, I was able to make the initial block count query retry failed connections after 10 seconds, like this:

return ctrl.Result{RequeueAfter: time.Second * 10}, nil

Though this technique is powerful, I can also see how it can be dangerous. It will be quite easy to accidentally create the potential for an infinite loop given certain conditions. I also speculate that sending too many and/or continuous reconciliation events may result in degraded Kubernetes performance. I'm quite sure my design so far could be optimized, and I'll be keeping these concerns in mind.

Bitcoin simnet block production notes

It seemed like things were going really well interacting with the btcd RPC API from the kiln operator. Commands generally seemed to work well and behave as described in documentation, but one behavior was tripping me up. I wanted to enable continuous block production by enabling mining in my simnet environment. Turing CPU mining on was easy using SetGenerate(true, 1) which tells the Bitcoin node to start mining and to take advantage of only 1 CPU core. Node logs were showing me that mining was enabled, but nothing was happening.

Reading through btcd code comments, I finally found a clue:

Been wondering why I couldn't get btcd to CPU mine in "simnet" mode. (btcd is the Go implementation of a Bitcoin node)

It needed a friend... literally.https://t.co/e6J6bgM5OT

Plenty of blocks now. Actually... too many too fast. Now on to pod CPU limits!
— David Gordon (@aph3lio) February 18, 2022

Once I introduced a second Bitcoin node to the network, blocks started flying. This immediately brought be to my next challenge:

How can I control the pace of block production? I don't want it to be too slow so I can test Lightning channel opens and closes, but I also don't want it to be too fast or my Lightning nodes will be constantly syncing blocks and unable to process transactions in between block syncs.

This brings me back to an issue I've already documented. As a user, I should be able to control the allocation of compute resources like CPU and memory to my operands (in this case the Bitcoin node). It may be enough to slow block production down by starving the miner for CPU. Since it's a goal already, I'll make that my next step. Ideally I can tune the CPU limit down to a sweet spot where a block is produced every 30-60 seconds. It's a theory for now, but I'll put it to the test this afternoon.