Compare commits

...

3 Commits

3 changed files with 396 additions and 0 deletions
@@ -0,0 +1,142 @@
# Changelog: March 9 Release Roundup
**Posted**: March 11, 2026 | **Applies to**: Headlamp Plugins (all versions)
Five days after the March 4 release cycle, four more plugins shipped point releases. This post covers what changed, why it matters, and what broke along the way (spoiler: not much).
---
## The Releases
### Rook v0.2.7
**What changed**:
- Improved OSD status visibility across distributed Ceph clusters
- Better handling of pool rebalancing edge cases
- Fixed a rendering bug when a cluster had >32 OSDs
- Updated Ceph API compatibility to 0.94.x range
**Why it matters**:
If you're running Ceph at any real scale, you've hit the "how many OSDs are rebalancing right now?" question. The dashboard now shows OSD-level state transitions in the UI, so you're not digging through `ceph status` output looking for the one node that's resynchronizing. The >32 OSD fix addresses the fact that someone actually had 48 OSDs and reported it in the issue.
**Backwards compatibility**: ✅ Full. Existing deployments see UI improvements immediately, no config changes needed.
---
### Sealed Secrets v0.2.23
**What changed**:
- Updated to sealed-secrets upstream v0.27.0
- Improved secret rotation workflow UX (showing cleartext preview warning)
- Fixed a bug where the secret values list didn't sort properly by age
- Added copy-to-clipboard for encrypted values (for debugging)
**Why it matters**:
The main win is the upstream sync. Sealed Secrets v0.27.0 fixed a subtle bug where RSA key rotation could leave orphaned secrets. The UX improvements matter because operators rotating secrets now see a clear warning before they preview plaintext — "this is temporary for debugging," not "let me read passwords." The copy-to-clipboard thing is tiny, but debugging encrypted values at scale is annoying enough that people asked for it.
**Backwards compatibility**: ✅ Full. Keys from v0.2.22 work unchanged.
---
### Intel GPU v0.4.2
**What changed**:
- Fixed node-level GPU memory tracking (was under-reporting available VRAM on some driver versions)
- Added per-workload GPU utilization chart (alpha, feedback welcome)
- Improved support for mixed-generation GPU clusters (Xe + Arc)
- Better error messages when Intel GPU drivers are misconfigured
**Why it matters**:
The memory tracking fix is the critical one. If your scheduler wasn't placing workloads on GPU nodes, there's a decent chance your available VRAM metric was wrong and workloads were failing silently. The per-workload chart is alpha because we're still figuring out what's useful here (people want different things: power draw vs. FLOP utilization vs. memory pressure). The error messaging helps catch GPU driver issues at the dashboard level instead of as Kubernetes scheduler logs you won't read.
**Backwards compatibility**: ✅ Full. Existing dashboards update silently. The new chart is opt-in via the plugin settings.
---
### TrueNAS CSI v0.2.6
**What changed**:
- Fixed a race condition in the storage pool detail view under high-frequency metric updates
- Added historical IOPS trend (last 7 days, if your monitoring retention supports it)
- Improved error handling when the CSI driver's API is temporarily unavailable
- Updated deployment docs for TrueNAS SCALE 24.04
**Why it matters**:
The race condition was rare but nasty—under sustained I/O load, the pool view would flicker or show stale metrics. This is now fixed. The 7-day trend is useful for "is my throughput degrading" analysis without requiring external tools. The API timeout handling means if your TrueNAS box restarts, the dashboard degrades gracefully instead of erroring.
**Backwards compatibility**: ✅ Full. Config unchanged, UX improvements automatic.
---
## What Wasn't Shipped
A few things were on the table but didn't make March 9:
- **Kube-vip v0.2.0** (major refactor) — held for more testing, targeting early April
- **Polaris v0.7.0** (policy templates) — still in review, no timeline yet
- **Multi-cluster federation** (experimental feature) — code is there, docs aren't, holding until docs are done right
These things will ship when they're done, not before.
---
## Breaking Changes
None. All four plugins maintain backwards compatibility.
---
## How to Upgrade
For each plugin, in your Headlamp installation:
```bash
# 1. Check your current version
helm list -n headlamp | grep plugin-name
# 2. Update the plugin
helm upgrade plugin-name headlamp/plugin-name \
--repo https://artifacthub.io/packages/helm \
--namespace headlamp
# 3. Verify
kubectl rollout status deployment/headlamp-plugin-name -n headlamp
```
If you're not using Helm, download the latest manifest from each plugin's GitHub release page.
---
## Known Issues
**TrueNAS CSI**: If your CSI driver is on an older API version (pre-24.02), some metrics may not appear. This is logged as a warning. Upgrade the driver if you need the features.
**Intel GPU**: Multi-node GPU scheduling still requires manual node labeling. A future release will handle label discovery automatically.
**Rook**: The OSD visualization can take 30 seconds to update on first load in very large clusters (>64 OSDs). We know. We're working on it.
---
## What's Next
- April 9: Kube-vip v0.2.0 (major refactor)
- Ongoing: Polaris v0.7.0 (no date yet, serious scope)
- Ongoing: Community feedback on Intel GPU utilization charts (please file issues if the metrics aren't useful)
---
## Feedback
Found a bug? File an issue in the relevant repo:
- [github.com/privilegedescalation/headlamp-rook-plugin](https://github.com/privilegedescalation/headlamp-rook-plugin)
- [github.com/privilegedescalation/headlamp-sealed-secrets-plugin](https://github.com/privilegedescalation/headlamp-sealed-secrets-plugin)
- [github.com/privilegedescalation/headlamp-intel-gpu-plugin](https://github.com/privilegedescalation/headlamp-intel-gpu-plugin)
- [github.com/privilegedescalation/headlamp-tns-csi-plugin](https://github.com/privilegedescalation/headlamp-tns-csi-plugin)
---
## Credits
Thanks to everyone who reported issues between March 4 and March 9. You're the reason these releases matter.
Special shout-out to @puretensor for running these plugins in production and telling us what actually breaks.
@@ -0,0 +1,84 @@
# Slow Burn Post - 2026-03-11
## Strategic Summary
Plant a question that makes people curious about Kubernetes observability and operational maturity without revealing the answer. The goal is to get people wondering "what's the right way to do this" before they know what Headlamp is. Works as a callback to the "Why We Built These" educational batch posted days earlier, reminding operators why those pain points matter.
---
## 1. Ready to Post
### Post: "The Dashboard You Don't Know You Need"
**Platform**: Twitter/X
**Post**:
Every mature Kubernetes environment has a moment: someone asks a question about the cluster, and everyone agrees it's a good question, and nobody knows how to answer it quickly.
Usually because the answer lives in four different CLI tools, three different dashboards, and someone's grep history.
**CMO Note**: This post plants the seed that there's a maturity gap: most K8s teams have experienced the "good question, no quick answer" moment. It doesn't pitch a solution; it just names the problem that serious operators have. Works well 1-2 weeks after the "Why We Built These" batch posts as a cooling period reminder. Tone is acknowledgment + dry acceptance, not mocking.
---
### Post: Bluesky Variant
**Platform**: Bluesky
**Post**:
You have a really good question about your cluster. Everyone agrees it's good. Nobody can answer it in under 2 minutes without splitting a terminal into four panes and running grep until they find it.
That's not a flaw in your question. That's a flaw in how K8s visibility tools are designed.
**CMO Note**: Slightly longer and more conversational for Bluesky's format. Same core message—empathy + problem naming—but with more room to breathe. This version gets slightly more pointed at the tooling itself.
---
### Post: Mastodon Variant
**Platform**: Mastodon
**Post**:
The hardest part of Kubernetes maturity isn't resource management or networking or pod placement. It's the moment when you realize you have observability *data* everywhere, but visibility nowhere — and combining them requires knowing which three tools to juggle.
There's a better way to design this.
**CMO Note**: Most technical framing, appeals to operators who think about K8s maturity as a journey. "Data vs. visibility" distinction is specific enough to resonate. Concluding line is subtle: suggests a different approach without naming the product.
---
## 2. Risky but Worth Discussing
None for this batch — slow-burn posts are inherently low-risk since they're empathy-first, not pitch-first.
---
## 3. Backlog (Evergreen)
This post works evergreen (true anytime after the "Why We Built" batch has gone out) and can be part of a longer narrative arc.
Suggested follow-up posts (future):
- "3 ways teams solve this today" — comparative without favoring ours
- "Observation vs. visibility: what's the difference?" — educational explainer
- "What does the right K8s dashboard look like?" — leading question that sets up product reveal
---
## Why This Works
1. **No product mention** — Pure empathy for the operator experience
2. **Specific frustration** — Not "we make things better," but "this common moment sucks"
3. **Opens door for follow-up** — The next post can be "here's one solution," but this one just raises the question
4. **Sets up narrative arc** — "Why We Built These" explains pain points; this post reminds people why those pain points matter
5. **Conversational tone** — Not a complaint, not a pitch, just "yeah, that moment is real"
---
## Timing Notes
- Post 1-2 weeks after "Why We Built These" batch (suggest March 18-25 timeframe)
- Evergreen content that can be scheduled anytime
- Ideal as part of the slow-burn curiosity campaign (PR #15) before the KubeCon push
@@ -0,0 +1,170 @@
# Social Media Batch — Industry Commentary on Kubernetes Operations Culture
## Strategic Summary
Hot takes on the absurdities, failures, and genuine problems in how teams actually operate Kubernetes. Not product pitches. Not tutorials. Just observations about the gap between "Kubernetes best practices" and "what we're doing at 2am when the cluster's on fire." These posts position Privileged Escalation as operators who understand the actual pain, not consultants selling the dream.
Timing: Evergreen content, works anytime. Some posts pair well with technical releases; others stand alone. This batch establishes credibility and voice before KubeCon campaign.
---
## 1. Ready to Post
### Post 1: The Observability Checklist Theater
**Platform**: Twitter/X
**Post**:
"We have observability," she said, pointing to three separate dashboards, four different APIs, and a grep script she's afraid to touch.
Observability isn't having data. It's knowing what to do with it when you're 30 seconds into an incident.
Most Kubernetes deployments have the first part. Nobody has the second.
**CMO Note**: Identifies a real gap (data vs visibility vs actionability) without mocking the teams experiencing it. Dry acknowledgment of a universal pain point. Sets up future posts about "what good observability looks like." No product mention needed — the empathy does the work.
---
### Post 2: The 3-Line PR That Ate 6 Weeks
**Platform**: Bluesky
**Post**:
Your platform team has a 3-line PR waiting for maintainer approval. It's been 42 days.
Not because the code is bad. Not because they're thinking about it. They're just... busy.
This is why every infrastructure team ends up maintaining their own forks of things. Not because they wanted to. Because the alternative was waiting forever.
**CMO Note**: Speaks directly to the maintainer bottleneck that creates fragmentation in the ecosystem. Bluesky audience (more irony-literate) appreciates the "42 days" specificity. The "own forks" insight is a callback to our sealed-secrets fork acknowledgment from earlier batches.
---
### Post 3: The README That Became the Docs
**Platform**: Mastodon
**Post**:
Kubernetes documentation is a weird thing. The official docs are great. The best practices docs are aspirational. The actual documentation for how to run your thing is 40 lines in a README written by someone at 11pm because they're shipping in the morning.
Three years later, that README is the only source of truth. It hasn't been updated in two years. Someone just hired reads it and wonders why nothing matches.
This is fine.
**CMO Note**: Honest observation about documentation drift and pragmatism in ops. "This is fine" ending is dark humor that Mastodon audience gets. Not judgmental, just realistic. Positions us as people who understand that perfect documentation is a fantasy.
---
## 2. Risky but Worth Discussing
### Post 4: The Consolidation Trap (Skip for Safety)
**Platform**: Twitter/X
**Post**:
Every infrastructure company eventually ships one tool that tries to do everything.
It is always bloated. It is always slow. It is always worse at specific things than the 6 single-purpose tools it replaced.
We had the opportunity to do this. We didn't. We're weird that way.
**CMO Note**: RISKY. This is a soft dig at competitors (Lens, Rancher, etc.) and might read as salty if not careful. However, it's grounded in our actual decision (6 plugins instead of one). Could land well with operators who are exhausted by consolidation fantasies. Recommend getting CMO sign-off before posting. If approved, schedule it after "Why We Built These" batch so context is fresh.
---
### Post 5: Observability Theater: The Security Checkbox
**Platform**: Bluesky
**Post**:
"We have visibility into our supply chain."
Translation: We run a scanner once a quarter and it generates a report nobody reads.
"We monitor resource usage."
Translation: Prometheus metrics exist. We haven't looked at them in a year but they're technically there.
Real observability isn't a checkbox. It's a practice. It takes work. Most teams don't do it.
**CMO Note**: MILDLY RISKY. Could read as too critical of teams trying their best. However, it's honest about the gap between aspirational and actual practices. Strong with engineering leaders who are frustrated with their own observability theater. Recommend pairing with a constructive follow-up post about "what real visibility looks like."
---
## 3. Backlog (Evergreen, Lower Urgency)
### Post 6: The Dependency Management Hellscape
**Platform**: Twitter/X
**Post**:
You have 1,247 transitive dependencies in your Kubernetes cluster.
You know what 14 of them do.
Nobody knows who maintains the other 1,233. Nobody knows what version of OpenSSL they actually use. If one of them breaks, you have a 2-week-long blame game ahead of you.
This is the cloud native era.
**CMO Note**: BACKLOG. Evergreen tech-industry grumbling. Node_modules meme energy. Good for reaching people who are deep in dependency hell and looking for commiseration. Can post anytime without losing relevance. Low risk, high relatability.
---
### Post 7: The Platform Team as Glorified Operators
**Platform**: LinkedIn
**Post**:
The job listing said "Platform Engineer."
What they meant: "You will spend 60% of your time un-breaking things that broke automatically, 30% fighting with vendors, and 10% actually building the platform you were hired to build."
Platform engineering is good work. But we're not honest about what it is yet. It's operations wearing a different hat.
**CMO Note**: BACKLOG. Professional tone for LinkedIn. Speaks to platform engineering leaders who are exhausted. The honesty about role mismatch will resonate with your actual audience. Low controversy, high empathy. Works anytime. Could be paired with recruitment/community posts about "what good platform engineering support looks like."
---
## Post Selection Recommendation
**For This Week** (Pre-KubeCon): Posts 1, 2, 3
- Establish credibility as operators who get it
- Set tone before educational batches ("Why We Built These")
- Build audience affinity
**For Next Week** (During KubeCon): Hold — focus on KubeCon campaign posts
**For Later** (March 28+): Posts 4, 5, 6, 7
- Post 4 only if CMO approves and timing feels right
- Posts 5, 6, 7 are genuinely evergreen — use as filler/buffer content
---
## Voice Check
✅ Dry observations, not punchlines
✅ Mild grievances, not venting
✅ Credibility through specificity ("42 days," "1,247 dependencies," "11pm")
✅ Empathy for teams, not mockery
✅ Operator perspective (not vendor, not consultant)
✅ No corporate language, no "exciting to announce"
✅ Each post stands alone; no threading needed
---
## Tags & Platform Consistency
- Twitter/X: Short, punchy, no threads
- Bluesky: Slightly longer, conversation-friendly
- LinkedIn: Professional tone, slightly longer form
- Mastodon: More technical, darker humor accepted
All posts include implicit "this resonates because you've lived it" angle rather than "you should agree with us."
---
## Dependencies
- Posts 1-3: Self-contained, can post anytime
- Post 4: Requires "Why We Built These" context (posted 1+ week prior)
- Posts 5-7: Evergreen, zero dependencies