Thanks to Chat-GPT! This is a polished version with the help from Chat-GPT.
Recently, I have been working on upgrading some Kubernetes clusters. Initially, the upgrade was not a complex task, but it became challenging when I had to migrate the runtime from Docker to Containerd. Throughout this process, I learned quite a bit.
The first thing I considered when planning the migration was
- whether we have workloads relying on docker
After going through the manifests in our code base, I found one application that was dependent on Docker. I involved the corresponding service team to plan the migration, and I thought we were ready to proceed.…..
However, I later came across a blog from the community that I should have read before the operation. https://kubernetes.io/blog/2022/02/17/dockershim-faq/#what-should-i-look-out-for-when-changing-cri-implementations
The blog listed certain things to consider before migrating, the first of which was something I had missed. We have fluent-bit in our cluster used to collect logs, and the parser “json” worked perfectly with Docker logs. After migrating to the host with Containerd, the parser stopped working, and tons of logs with unexpected names were sent to the logging server. I was unaware of this until a developer reached out to ask what had happened. Although it had been some time since I last worked on the Fluent-bit configuration, I replied that the Fluent-bit configuration was fine.….However, my arrogance was quickly interrupted when I was told that the names of the logs should be “stdout” or “stderr”. I began to think about how the log was sent to the server and how the name was generated. The answer was “Stackdriver,” but what caused the output plugin to send logs with a different name from before? The answer was a failure of the log parse. When I Googled “Fluent-bit Containerd,” I found some solutions.
In fact, both of them are using the “regex” parser to extract the field “time”, “stream”, “tag” and “log”. I merged one of them to our own config, and fortunately, it worked. But this was a regex expression, and I wondered whether it would cover every scenario or guarantee the extraction of these fields every time. The answer was the CRI logging format design. Eventually, I found the implementation of the Containerd logger, which ensured the regex expression worked.
Additionally, I did not add the “cri” parser directly to the “tail” input plugin as the above two solutions were to be backward compatible with nodes running Docker runtime. Instead, I added another filter with two parsers, “docker” and “cri.”
Deined permission to bind priviledged ports when app is not running as Root
The application, such as Nginx, did not run as root, and it encountered permission problems in such scenarios. There is one similar issue in the community.
As part of the infrastructure platform, our goal is to provide a stable, easy-to-use, and efficient platform to our users.
- We should think about not only what’s relying on this infra dependency but also what does it produces to evaluate the impact of the change
- We should have obsevbility regarding those resources which are not expected to change
- We should build up solid knowledge regarding the critical infra components