Observability
One operational view for every customer deployment.
When your software runs inside customer clouds, the usual self-hosting playbook breaks. There are no shared dashboards, no central log storage, and no way to ask "is this customer healthy?" without a support ticket. Alien is built so you can answer that question without leaving your control plane.
Observability in Alien covers three signal families across every deployment:
- Health and lifecycle — release version, platform, region, deployment status.
- Application telemetry — logs, metrics, and traces produced by your code.
- Runtime signals — container health, HTTP traffic, queue depth, and other operational metrics gathered by Alien's data plane.
All three are correlated by deployment, so when something looks off you can move from "all deployments" down to "this customer, this release" without piecing it together by hand.
Deployment health
The manager already knows about every deployment, the release each is running, the cloud and region it's pinned to, and when it last checked in. That state is exposed on the dashboard and through the manager API, so support and engineering teams can:
- See which customer environments are healthy, stale, updating, or pinned to an older release.
- Spot deployments that haven't picked up the latest release (push vs pull difference, network reachability, configuration drift).
- Trigger pinned rollbacks for one deployment or all of them — see Releases.
There's no need to scrape customer cloud consoles or wait for screenshots to know what's running.
Application telemetry
Inside customer environments your code produces normal logs, metrics, and traces. Alien forwards those signals out of the customer cloud over OpenTelemetry-compatible pipelines.
You wire the destination once in the manager configuration — your existing observability backend (Datadog, Grafana, Honeycomb, S3, etc.). Every active deployment automatically tags telemetry with:
- Deployment id, customer label, and cloud/region
- Release version and stack name
- Resource and runtime metadata (function name, container name, replica)
See Self-Hosting → Configuration for the exporter settings.
Runtime signals
You don't have to instrument everything by hand. Alien's data plane already produces a small set of useful signals out of the box:
- Container and replica health, restarts, and last error.
- HTTP-level metrics (request counts, status classes, latency) for ingress-facing containers.
- Queue depth and consumer lag for managed queues.
- Storage and KV usage where the underlying cloud service exposes it.
These travel through the same telemetry pipeline as your application data and arrive tagged with the same deployment context.
Operational, not customer data
Observability data is deliberately scoped to runtime, infrastructure, and product health rather than the customer's application data. The goal is to keep operational telemetry safe for your support engineers to look at, without dragging regulated customer records back to your control plane.
If you do need to inspect customer-side data on demand, the right tool is a remote command — it runs inside the customer environment, returns only what you ask for, and leaves an audit trail.
Putting it together
A typical investigation looks like this:
- The dashboard shows a deployment marked unhealthy after a new release.
- You filter telemetry by that deployment and that release — error rate spikes, latency rises.
- Logs show the offending request pattern; a trace narrows it down to a single Function.
- You pin the deployment back to the previous release while you ship a fix.
No customer hand-holding, no asking "can you send us the logs", no version drift. Every deployment behaves like part of one operational surface.