Stabilising performance after a major kernel revision

A topic related to upstreaming patches on kernel forks related to embedded platforms is currently being discussed for Kernel Summit 2016. This is an age-old topic related to whether it is better to work upstream and backport or apply patches to a product-specific kernel and worry about forward-porting later. The points being raised have not changed over the years and still comes down to getting something out the door quickly versus long-term maintenance overhead. I’m not directly affected so had nothing new to add to the thread.

However, I’ve had recent experience stabilising the performance of an upstream kernel after a major kernel revision in the context of a distribution kernel. The kernel in question follows an upstream-first-and-then-backport policy with very rare exceptions. The backports are almost always related to hardware enablement but performance-related patches are also cherry-picked which is what my primary concern as Performance Team Lead is. The difficulty we face is that the distribution kernel is faster than the baseline upstream stable kernel is and faster than the mainline kernel we rebase to for a new release. There are usually multiple root causes and because of the cherry-picking, it’s not a simple case of bisecting.

Performance is always workload and hardware specific so I’m not going to get into the performance figures and profiles used to make decisions but the patches in question are on a public git tree if someone was sufficiently motivated. There may be an attempt to update the -stable kernel involved without a guarantee it’ll be picked up. Right now, it’s still a work in progress but this list gives an idea of the number of patches involved;

  • 6 months stabilisation effort spread across 8 people
  • 89 patches related to performance that could be in -stable
  • More patches already merged to -stable
  • +5 patches reducing debugging overhead
  • +4 patches related to vmstat handling
  • +2 patches related to workqueues
  • +8 patches related to Transparent Huge Page overhead
  • +3 patches related to NUMA balancing
  • +30 patches related to scheduler
  • +70 patches related to locking
  • Over 4000 patches related to feature and hardware enablement

This is an incomplete list and it’s a single case that may or may not apply to other people and products. I do have anecdotal evidence that other companies carry far fewer patches when stabilising performance but in many cases, those same companies have a fixed set of well-known workloads where as this is a distribution kernel for general use.

This is unrelated to the difficulties embedded vendors have when shipping a product but lets just say that I have a certain degree of sympathy when a major kernel revision is required. That said, my experience suggests that the effort required to stabilise a major release periodically is lower than carrying ever-increasing numbers of backports that get harder and harder to backport.