In my previous post, I outlined some feature implementation goals for the Bitcoin/Lightning Kubernetes operator I am developing called kiln
. These goals will require the operator to interact with Bitcoin nodes in order to to read state and execute operations. The Bitcoin node implementation that I'm using (btcd
) exposes an RPC API. For the past couple of days, I've been exploring that API and how I can integrate the kiln
operator with with this RPC endpoint.
Spoiler alert, btcd
is written in Go, so it was not surprising to find some useful client and utility libraries and even a few Go examples. Since I am using the Go operator SDK, using the native btcd
packages was a natural choice.
But before I jump into code, I had some questions along the way, not the least of which was:
What exactly is this Bitcoin RPC API?
In my observation, the primary function of this API is to support observability of the node state. The kiln
operator will need to make observations about nodes, so that's promising. For example, the procedure called GetBlockCount()
was an easy place to get started.
The API also supports some operations that result in node state changes. One category of operations relates to mining. These operations are not useful in production environments because if you are serious about mining, you will be running on specialized equipment, not commodity hardware. Profitable Bitcoin mining on Kubernetes will probably never be a thing, and mining not the target use case for kiln
. However on the simnet
(see previous post), being able to issue arbirary commands to generate blocks is useful.
The btcd
RPC API is not an implementation of a standard Bitcoin specification as far as I know. So while it may provide some utility for observing and managing the btcd
node that the kiln
operator deploys, I don't think I can expect to manage and observe other node implementations in this way. So for now, the kiln
operator will not be interoperable with all Bitcoin node implementations. I will stick to btcd
for now, but I think node interoperability may be a topic worth revisiting. It comes down to a set of philosophical questions for me:
- Should there be a standard API specification for Bitcoin node management? Does this exist?
- Should we embrace differences between operational practices and procedures for the various Bitcoin node implementations because it's the whole point and the main reason that several exist?
- How sufficient is the Bitcoin protocol itself for supporting observation and management?
Okay, that's enough pontificating, hopefully it gives the reader some background about this particular RPC endpoint I'm using.
Getting started by observing the block count
As an incremental step toward my goals for managing block production in a simnet
setting, I started by attempting to log the node's block count to the Status
field of the BitcoinNode
CR instance that owns the node deployment.
The first thing I need is for my operator to be able to trust the RPC server's certificate. Since the BitcoinNode
CR tells the operator the name of the RPC certificate secret already, I can assume that the secret is a standard Kubernetes TLS secret with a copy of the CA's certificate in a key called ca.crt
.
Now with the CA certificate in hand, I can configure the RPC client and invoke the GetBlockCount
procedure.
Finally, I can write the block count value to the BitcoinNode
CR instance.
So this was good incremental progress. At least the operator is interacting directly with the Bitcoin node.
Some notes about testing
As I mentioned on Twitter, a developer tool called kubefwd
was really useful for testing. I could access the Bitcoin node Kubernetes service URL while running the operator on my local machine.
Aside from running the operator locally, kubefwd
also allowed me to programmatically invoke arbitrary operations to support my testing scenarios.
So for example, I could issue a command to my Bitcoin node to generate some blocks in order to test that the new block count is published to the BitcoinNode
CR instance status.
Challenges and next steps
As many of us know, in software the solution to one problem is often the cause of another. In this case, when I added in logic to report block count, I caused a race condition when creating new BitcoinNodes
. The operator tries to query the RPC server before the Bitcoin pod is fully up and running. In general, I get the impression that Kubernetes operator development is going to involve a lot of thinking about the timing of events and operations. This part isn't ironed out yet, so the implementation of the block count status update remains rough around the edges, though I'm hopeful that I can find some good practices demonstrated in other operator repositories.