Docs For AI
Performance

Self-Hosted Performance Platform

Deploy a private performance monitoring and alerting platform with trend dashboards and threshold-based alerts

Self-Hosted Performance Platform

Build and deploy a private, internal performance monitoring platform. This guide covers the complete architecture — from browser-side data collection through storage, trend visualization (with time-range selection), and configurable threshold alerting.

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│  Browser (Real Users)                                            │
│  ┌──────────────┐                                                │
│  │  web-vitals   │── sendBeacon ──┐                              │
│  │  SDK snippet  │                │                              │
│  └──────────────┘                │                              │
└──────────────────────────────────┼──────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│  Ingestion Layer                                                 │
│  ┌──────────────────────────────────────────┐                    │
│  │  Collector API  (Node.js / Go)           │                    │
│  │  - validate & enrich events              │                    │
│  │  - batch write to storage                │                    │
│  └──────────┬───────────────────────────────┘                    │
│             │                                                    │
│             ▼                                                    │
│  ┌──────────────────┐   ┌──────────────────┐                    │
│  │  ClickHouse       │   │  Redis            │                    │
│  │  (metrics store)  │   │  (alert state)    │                    │
│  └──────────────────┘   └──────────────────┘                    │
│             │                     │                              │
│             ▼                     ▼                              │
│  ┌──────────────────────────────────────────┐                    │
│  │  Query API  (Node.js / Go)               │                    │
│  │  - trend queries with time range         │                    │
│  │  - percentile aggregations               │                    │
│  │  - alert rule evaluation                 │                    │
│  └──────────┬───────────────────────────────┘                    │
│             │                                                    │
│             ▼                                                    │
│  ┌──────────────────────────────────────────┐                    │
│  │  Dashboard UI  (Next.js)                  │                    │
│  │  - trend charts with time picker         │                    │
│  │  - alert rule configuration              │                    │
│  │  - team notifications                    │                    │
│  └──────────────────────────────────────────┘                    │
└──────────────────────────────────────────────────────────────────┘

Why self-host?

  • Full data ownership — metrics never leave your network
  • No per-seat or per-event pricing
  • Custom dimensions (business context, experiments, feature flags)
  • Integration with internal alerting systems (Slack, PagerDuty, WeCom, DingTalk)

Choosing an Architecture: ClickHouse vs ELK

This guide provides two complete architectures. Choose based on your team's situation.

ClickHouse + GrafanaELK Stack
Best forPure performance analytics, high-volume metricsUnified observability (logs + metrics + traces)
Aggregation speedExtremely fast — columnar storage with native quantile()Slower for percentile aggregations at scale
Storage efficiency~10:1 compression ratio (columnar)~3:1 (inverted index overhead)
DashboardGrafana (powerful, needs configuration) or custom UIKibana (rich out-of-box, drag-and-drop, zero code)
AlertingCustom engine or Grafana AlertingElastAlert2 or Kibana Alerting (built-in)
Full-text searchNot supportedNative — search through error logs alongside metrics
APM integrationMetrics only (pair with Jaeger for traces)Elastic APM covers browser, server, DB traces in one place
Setup complexityLower (fewer components)Higher (more services, more RAM)
RAM requirement~2 GB minimum~4–8 GB minimum (ES is memory-hungry)
Team already has ELK?Need to deploy separatelyReuse existing cluster — just add APM + index

Recommendation

  • Choose ClickHouse if your primary goal is fast metrics dashboards and you want minimal infrastructure
  • Choose ELK if your team already runs ELK, or you want unified logs + metrics + traces + error tracking in one platform with Kibana's drag-and-drop dashboards
  • Both architectures share the same Browser SDK (section 1 below)

Architecture A: ClickHouse Stack

1. Data Collection (Browser SDK)

A lightweight SDK that reports Core Web Vitals and custom metrics to your collector.

// sdk/perf-sdk.ts
import { onCLS, onINP, onLCP, onFCP, onTTFB, type Metric } from 'web-vitals';

interface PerfEvent {
  metric: string;
  value: number;
  rating: string;
  page: string;
  device: string;
  connection: string;
  timestamp: number;
  sessionId: string;
  appVersion: string;
}

const SESSION_ID = crypto.randomUUID();

function getDevice(): string {
  if (/Mobi|Android/i.test(navigator.userAgent)) return 'mobile';
  if (/Tablet|iPad/i.test(navigator.userAgent)) return 'tablet';
  return 'desktop';
}

function send(metric: Metric) {
  const event: PerfEvent = {
    metric: metric.name,
    value: metric.value,
    rating: metric.rating,
    page: location.pathname,
    device: getDevice(),
    connection: (navigator as any).connection?.effectiveType ?? 'unknown',
    timestamp: Date.now(),
    sessionId: SESSION_ID,
    appVersion: document.querySelector('meta[name="app-version"]')?.getAttribute('content') ?? 'unknown',
  };

  const blob = new Blob([JSON.stringify(event)], { type: 'application/json' });
  navigator.sendBeacon('/api/perf/collect', blob);
}

export function initPerfSDK() {
  onLCP(send);
  onINP(send);
  onCLS(send);
  onFCP(send);
  onTTFB(send);
}

Usage in Next.js:

// app/layout.tsx
'use client';
import { useEffect } from 'react';

export function PerfProvider({ children }: { children: React.ReactNode }) {
  useEffect(() => {
    import('./perf-sdk').then(({ initPerfSDK }) => initPerfSDK());
  }, []);
  return <>{children}</>;
}

2. Ingestion Service (Collector API)

Receives events, validates them, and batch-inserts into ClickHouse.

ClickHouse Table Schema

-- Create the metrics table
CREATE TABLE perf_metrics (
  metric      LowCardinality(String),   -- LCP, INP, CLS, FCP, TTFB
  value       Float64,
  rating      LowCardinality(String),   -- good, needs-improvement, poor
  page        String,
  device      LowCardinality(String),   -- mobile, tablet, desktop
  connection  LowCardinality(String),   -- 4g, 3g, 2g, slow-2g, unknown
  app_version LowCardinality(String),
  session_id  String,
  timestamp   DateTime64(3),

  -- Partition by day for efficient time-range queries
  -- Order by metric + page for fast aggregation
) ENGINE = MergeTree()
  PARTITION BY toYYYYMMDD(timestamp)
  ORDER BY (metric, page, timestamp);

-- Materialized view: pre-aggregate hourly percentiles
CREATE MATERIALIZED VIEW perf_hourly_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(hour)
ORDER BY (metric, page, device, hour)
AS
SELECT
  metric,
  page,
  device,
  toStartOfHour(timestamp) AS hour,
  quantileState(0.50)(value) AS p50_state,
  quantileState(0.75)(value) AS p75_state,
  quantileState(0.95)(value) AS p95_state,
  countState()              AS count_state
FROM perf_metrics
GROUP BY metric, page, device, hour;

Collector API (Node.js)

// server/collector.ts
import { createClient } from '@clickhouse/client';
import { Router, json } from 'express';

const clickhouse = createClient({
  url: process.env.CLICKHOUSE_URL ?? 'http://localhost:8123',
  database: 'perf',
});

const VALID_METRICS = new Set(['LCP', 'INP', 'CLS', 'FCP', 'TTFB']);
const BATCH_SIZE = 500;
const FLUSH_INTERVAL_MS = 5000;

let buffer: PerfEvent[] = [];

interface PerfEvent {
  metric: string;
  value: number;
  rating: string;
  page: string;
  device: string;
  connection: string;
  timestamp: number;
  sessionId: string;
  appVersion: string;
}

async function flush() {
  if (buffer.length === 0) return;
  const batch = buffer.splice(0, buffer.length);

  await clickhouse.insert({
    table: 'perf_metrics',
    values: batch.map(e => ({
      metric: e.metric,
      value: e.value,
      rating: e.rating,
      page: e.page,
      device: e.device,
      connection: e.connection,
      app_version: e.appVersion,
      session_id: e.sessionId,
      timestamp: new Date(e.timestamp).toISOString(),
    })),
    format: 'JSONEachRow',
  });

  console.log(`Flushed ${batch.length} events to ClickHouse`);
}

// Flush periodically
setInterval(flush, FLUSH_INTERVAL_MS);

export const collectorRouter = Router();

collectorRouter.post('/collect', json(), (req, res) => {
  const event: PerfEvent = req.body;

  // Validate
  if (!VALID_METRICS.has(event.metric)) {
    return res.status(400).json({ error: 'Invalid metric' });
  }
  if (typeof event.value !== 'number' || event.value < 0) {
    return res.status(400).json({ error: 'Invalid value' });
  }

  buffer.push(event);

  // Flush if buffer is full
  if (buffer.length >= BATCH_SIZE) {
    flush().catch(console.error);
  }

  res.status(204).end();
});

3. Query API (Trend + Alerting Data)

Provide endpoints for the dashboard to query trends and for the alerting engine to evaluate thresholds.

// server/query.ts
import { createClient } from '@clickhouse/client';
import { Router } from 'express';

const clickhouse = createClient({
  url: process.env.CLICKHOUSE_URL ?? 'http://localhost:8123',
  database: 'perf',
});

export const queryRouter = Router();

// ── Trend API ───────────────────────────────────────────────────
// GET /api/perf/trend?metric=LCP&page=/&start=2026-01-01&end=2026-02-01&granularity=hour
queryRouter.get('/trend', async (req, res) => {
  const { metric, page, device, start, end, granularity = 'hour' } = req.query;

  const granularityFn = granularity === 'day' ? 'toStartOfDay' :
                         granularity === 'hour' ? 'toStartOfHour' :
                         'toStartOfFifteenMinutes';

  const conditions: string[] = ['1 = 1'];
  const params: Record<string, string> = {};

  if (metric)  { conditions.push('metric = {metric:String}');   params.metric = metric as string; }
  if (page)    { conditions.push('page = {page:String}');       params.page = page as string; }
  if (device)  { conditions.push('device = {device:String}');   params.device = device as string; }
  if (start)   { conditions.push('timestamp >= {start:String}'); params.start = start as string; }
  if (end)     { conditions.push('timestamp <= {end:String}');   params.end = end as string; }

  const query = `
    SELECT
      ${granularityFn}(timestamp) AS time,
      quantile(0.50)(value) AS p50,
      quantile(0.75)(value) AS p75,
      quantile(0.95)(value) AS p95,
      count()               AS samples,
      countIf(rating = 'good')  / count() * 100 AS good_pct,
      countIf(rating = 'poor')  / count() * 100 AS poor_pct
    FROM perf_metrics
    WHERE ${conditions.join(' AND ')}
    GROUP BY time
    ORDER BY time
  `;

  const result = await clickhouse.query({ query, query_params: params, format: 'JSONEachRow' });
  const data = await result.json();
  res.json(data);
});

// ── Overview API ────────────────────────────────────────────────
// GET /api/perf/overview?start=2026-01-01&end=2026-02-01
queryRouter.get('/overview', async (req, res) => {
  const { start, end } = req.query;

  const query = `
    SELECT
      metric,
      page,
      device,
      quantile(0.50)(value) AS p50,
      quantile(0.75)(value) AS p75,
      quantile(0.95)(value) AS p95,
      count()               AS samples,
      countIf(rating = 'good')  / count() * 100 AS good_pct,
      countIf(rating = 'poor')  / count() * 100 AS poor_pct
    FROM perf_metrics
    WHERE timestamp BETWEEN {start:String} AND {end:String}
    GROUP BY metric, page, device
    ORDER BY metric, page, device
  `;

  const result = await clickhouse.query({
    query,
    query_params: { start: start as string, end: end as string },
    format: 'JSONEachRow',
  });
  const data = await result.json();
  res.json(data);
});

// ── Current Percentile (for alerting) ───────────────────────────
// GET /api/perf/current?metric=LCP&percentile=75&windowMinutes=60
queryRouter.get('/current', async (req, res) => {
  const { metric, percentile = '75', windowMinutes = '60' } = req.query;

  const query = `
    SELECT
      quantile({p:Float64} / 100)(value) AS value,
      count() AS samples
    FROM perf_metrics
    WHERE metric = {metric:String}
      AND timestamp >= now() - INTERVAL {window:UInt32} MINUTE
  `;

  const result = await clickhouse.query({
    query,
    query_params: {
      metric: metric as string,
      p: percentile as string,
      window: windowMinutes as string,
    },
    format: 'JSONEachRow',
  });
  const [row] = await result.json<{ value: number; samples: number }>();
  res.json(row);
});

4. Alerting Engine

A scheduled evaluator that checks configured thresholds and sends notifications.

Alert Rule Configuration

// server/alert-rules.ts
export interface AlertRule {
  id: string;
  name: string;
  metric: string;              // LCP, INP, CLS, FCP, TTFB
  percentile: number;          // 50, 75, 95
  threshold: number;           // metric value threshold
  windowMinutes: number;       // time window to evaluate
  cooldownMinutes: number;     // minimum gap between repeated alerts
  severity: 'warning' | 'critical';
  channels: AlertChannel[];    // where to send notifications
  enabled: boolean;
  page?: string;               // optional: scope to a specific page
}

export interface AlertChannel {
  type: 'slack' | 'webhook' | 'email' | 'wecom' | 'dingtalk';
  target: string;              // webhook URL, email address, etc.
}

// Default rules — can be overridden via the dashboard UI
export const defaultRules: AlertRule[] = [
  {
    id: 'lcp-warning',
    name: 'LCP p75 > 2.5s',
    metric: 'LCP',
    percentile: 75,
    threshold: 2500,
    windowMinutes: 60,
    cooldownMinutes: 120,
    severity: 'warning',
    channels: [{ type: 'slack', target: process.env.SLACK_WEBHOOK_URL! }],
    enabled: true,
  },
  {
    id: 'lcp-critical',
    name: 'LCP p75 > 4s',
    metric: 'LCP',
    percentile: 75,
    threshold: 4000,
    windowMinutes: 15,
    cooldownMinutes: 30,
    severity: 'critical',
    channels: [{ type: 'slack', target: process.env.SLACK_WEBHOOK_URL! }],
    enabled: true,
  },
  {
    id: 'inp-warning',
    name: 'INP p75 > 200ms',
    metric: 'INP',
    percentile: 75,
    threshold: 200,
    windowMinutes: 60,
    cooldownMinutes: 120,
    severity: 'warning',
    channels: [{ type: 'slack', target: process.env.SLACK_WEBHOOK_URL! }],
    enabled: true,
  },
  {
    id: 'cls-warning',
    name: 'CLS p75 > 0.1',
    metric: 'CLS',
    percentile: 75,
    threshold: 0.1,
    windowMinutes: 60,
    cooldownMinutes: 120,
    severity: 'warning',
    channels: [{ type: 'slack', target: process.env.SLACK_WEBHOOK_URL! }],
    enabled: true,
  },
];

Alert Evaluator

// server/alert-evaluator.ts
import Redis from 'ioredis';
import { AlertRule, AlertChannel } from './alert-rules';

const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379');

interface MetricsClient {
  getPercentile(metric: string, percentile: number, windowMinutes: number, page?: string): Promise<{ value: number; samples: number }>;
}

export class AlertEvaluator {
  constructor(
    private metricsClient: MetricsClient,
    private rules: AlertRule[]
  ) {}

  // Run on a schedule (e.g. every 1 minute via setInterval or cron)
  async evaluate() {
    for (const rule of this.rules) {
      if (!rule.enabled) continue;

      try {
        const { value, samples } = await this.metricsClient.getPercentile(
          rule.metric, rule.percentile, rule.windowMinutes, rule.page
        );

        // Need enough samples to be meaningful
        if (samples < 30) continue;

        if (value > rule.threshold) {
          await this.fire(rule, value, samples);
        } else {
          // Clear alert state when recovered
          await redis.del(`alert:fired:${rule.id}`);
        }
      } catch (err) {
        console.error(`Alert evaluation failed for rule ${rule.id}:`, err);
      }
    }
  }

  private async fire(rule: AlertRule, value: number, samples: number) {
    const cooldownKey = `alert:fired:${rule.id}`;
    const lastFired = await redis.get(cooldownKey);

    if (lastFired) return; // Still in cooldown

    // Set cooldown
    await redis.setex(cooldownKey, rule.cooldownMinutes * 60, Date.now().toString());

    // Store alert history
    await redis.lpush('alert:history', JSON.stringify({
      ruleId: rule.id,
      ruleName: rule.name,
      metric: rule.metric,
      value,
      threshold: rule.threshold,
      samples,
      severity: rule.severity,
      timestamp: new Date().toISOString(),
    }));
    await redis.ltrim('alert:history', 0, 999); // Keep last 1000

    // Send notifications
    for (const channel of rule.channels) {
      await this.notify(channel, rule, value, samples);
    }
  }

  private async notify(channel: AlertChannel, rule: AlertRule, value: number, samples: number) {
    const formatValue = (metric: string, v: number) =>
      metric === 'CLS' ? v.toFixed(3) : `${v.toFixed(0)} ms`;

    const message = {
      title: `[${rule.severity.toUpperCase()}] ${rule.name}`,
      body: [
        `**${rule.metric}** p${rule.percentile} = **${formatValue(rule.metric, value)}** (threshold: ${formatValue(rule.metric, rule.threshold)})`,
        `Window: ${rule.windowMinutes} min | Samples: ${samples}`,
        rule.page ? `Page: ${rule.page}` : 'All pages',
        `Time: ${new Date().toISOString()}`,
      ].join('\n'),
    };

    switch (channel.type) {
      case 'slack':
        await fetch(channel.target, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            text: `${message.title}\n${message.body}`,
          }),
        });
        break;

      case 'wecom':
        await fetch(channel.target, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            msgtype: 'markdown',
            markdown: { content: `${message.title}\n${message.body}` },
          }),
        });
        break;

      case 'dingtalk':
        await fetch(channel.target, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            msgtype: 'markdown',
            markdown: { title: message.title, text: message.body },
          }),
        });
        break;

      case 'webhook':
        await fetch(channel.target, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ ...message, rule, value, samples }),
        });
        break;
    }
  }
}

Run the Evaluator

// server/index.ts
import { AlertEvaluator } from './alert-evaluator';
import { defaultRules } from './alert-rules';

const evaluator = new AlertEvaluator(metricsClient, defaultRules);

// Evaluate every 60 seconds
setInterval(() => {
  evaluator.evaluate().catch(console.error);
}, 60_000);

5. Dashboard UI

A Next.js dashboard with time-range trend charts and alert rule management.

Trend Chart Component

// components/TrendChart.tsx
'use client';
import { useState, useEffect } from 'react';
import {
  LineChart, Line, XAxis, YAxis, Tooltip, CartesianGrid,
  ResponsiveContainer, ReferenceLine,
} from 'recharts';

interface TrendPoint {
  time: string;
  p50: number;
  p75: number;
  p95: number;
  good_pct: number;
}

interface TrendChartProps {
  metric: string;
  page?: string;
  threshold?: number;
}

const TIME_RANGES = [
  { label: '1H',  value: '1h',  granularity: '15min' },
  { label: '6H',  value: '6h',  granularity: '15min' },
  { label: '24H', value: '24h', granularity: 'hour'  },
  { label: '7D',  value: '7d',  granularity: 'hour'  },
  { label: '30D', value: '30d', granularity: 'day'   },
];

function getStartDate(range: string): string {
  const now = Date.now();
  const ms: Record<string, number> = {
    '1h': 3600000,
    '6h': 21600000,
    '24h': 86400000,
    '7d': 604800000,
    '30d': 2592000000,
  };
  return new Date(now - (ms[range] ?? 86400000)).toISOString();
}

export function TrendChart({ metric, page, threshold }: TrendChartProps) {
  const [range, setRange] = useState('24h');
  const [data, setData] = useState<TrendPoint[]>([]);
  const [loading, setLoading] = useState(true);

  const granularity = TIME_RANGES.find(r => r.value === range)?.granularity ?? 'hour';

  useEffect(() => {
    setLoading(true);
    const params = new URLSearchParams({
      metric,
      start: getStartDate(range),
      end: new Date().toISOString(),
      granularity,
    });
    if (page) params.set('page', page);

    fetch(`/api/perf/trend?${params}`)
      .then(r => r.json())
      .then(setData)
      .finally(() => setLoading(false));
  }, [metric, page, range, granularity]);

  const formatValue = (v: number) =>
    metric === 'CLS' ? v.toFixed(3) : `${v.toFixed(0)} ms`;

  return (
    <div>
      <div style={{ display: 'flex', justifyContent: 'space-between', marginBottom: 16 }}>
        <h3>{metric} Trend</h3>
        <div style={{ display: 'flex', gap: 4 }}>
          {TIME_RANGES.map(r => (
            <button
              key={r.value}
              onClick={() => setRange(r.value)}
              style={{
                padding: '4px 12px',
                borderRadius: 4,
                border: '1px solid #ddd',
                background: range === r.value ? '#0070f3' : '#fff',
                color: range === r.value ? '#fff' : '#333',
                cursor: 'pointer',
              }}
            >
              {r.label}
            </button>
          ))}
        </div>
      </div>

      {loading ? (
        <div style={{ height: 300, display: 'flex', alignItems: 'center', justifyContent: 'center' }}>
          Loading...
        </div>
      ) : (
        <ResponsiveContainer width="100%" height={300}>
          <LineChart data={data}>
            <CartesianGrid strokeDasharray="3 3" />
            <XAxis
              dataKey="time"
              tickFormatter={(t) => new Date(t).toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })}
            />
            <YAxis tickFormatter={formatValue} />
            <Tooltip
              labelFormatter={(t) => new Date(t as string).toLocaleString()}
              formatter={(v: number) => formatValue(v)}
            />
            <Line type="monotone" dataKey="p50" stroke="#10b981" name="p50" dot={false} />
            <Line type="monotone" dataKey="p75" stroke="#f59e0b" name="p75" strokeWidth={2} dot={false} />
            <Line type="monotone" dataKey="p95" stroke="#ef4444" name="p95" dot={false} />
            {threshold && (
              <ReferenceLine y={threshold} stroke="#ef4444" strokeDasharray="5 5" label="Threshold" />
            )}
          </LineChart>
        </ResponsiveContainer>
      )}
    </div>
  );
}

Alert Rule Management UI

// components/AlertRuleEditor.tsx
'use client';
import { useState } from 'react';

interface AlertRule {
  id: string;
  name: string;
  metric: string;
  percentile: number;
  threshold: number;
  windowMinutes: number;
  cooldownMinutes: number;
  severity: 'warning' | 'critical';
  channels: { type: string; target: string }[];
  enabled: boolean;
}

export function AlertRuleEditor({ rules: initial }: { rules: AlertRule[] }) {
  const [rules, setRules] = useState(initial);
  const [editing, setEditing] = useState<string | null>(null);

  async function saveRule(rule: AlertRule) {
    await fetch('/api/perf/alerts/rules', {
      method: 'PUT',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(rule),
    });
    setEditing(null);
  }

  async function toggleRule(id: string, enabled: boolean) {
    setRules(prev => prev.map(r => r.id === id ? { ...r, enabled } : r));
    await fetch(`/api/perf/alerts/rules/${id}/toggle`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ enabled }),
    });
  }

  const formatThreshold = (metric: string, value: number) =>
    metric === 'CLS' ? value.toFixed(2) : `${value} ms`;

  return (
    <div>
      <h3>Alert Rules</h3>
      <table style={{ width: '100%', borderCollapse: 'collapse' }}>
        <thead>
          <tr>
            <th>Status</th>
            <th>Name</th>
            <th>Metric</th>
            <th>Percentile</th>
            <th>Threshold</th>
            <th>Window</th>
            <th>Severity</th>
            <th>Actions</th>
          </tr>
        </thead>
        <tbody>
          {rules.map(rule => (
            <tr key={rule.id}>
              <td>
                <input
                  type="checkbox"
                  checked={rule.enabled}
                  onChange={(e) => toggleRule(rule.id, e.target.checked)}
                />
              </td>
              <td>{rule.name}</td>
              <td>{rule.metric}</td>
              <td>p{rule.percentile}</td>
              <td>{formatThreshold(rule.metric, rule.threshold)}</td>
              <td>{rule.windowMinutes} min</td>
              <td>
                <span style={{
                  padding: '2px 8px',
                  borderRadius: 4,
                  background: rule.severity === 'critical' ? '#fecaca' : '#fef3c7',
                  color: rule.severity === 'critical' ? '#dc2626' : '#d97706',
                }}>
                  {rule.severity}
                </span>
              </td>
              <td>
                <button onClick={() => setEditing(rule.id)}>Edit</button>
              </td>
            </tr>
          ))}
        </tbody>
      </table>
    </div>
  );
}

Dashboard Page

// app/dashboard/page.tsx
import { TrendChart } from '@/components/TrendChart';
import { AlertRuleEditor } from '@/components/AlertRuleEditor';

export default async function DashboardPage() {
  const rules = await fetch(`${process.env.API_URL}/api/perf/alerts/rules`).then(r => r.json());

  return (
    <div style={{ maxWidth: 1200, margin: '0 auto', padding: 24 }}>
      <h1>Performance Dashboard</h1>

      <section>
        <h2>Core Web Vitals Trends</h2>
        <div style={{ display: 'grid', gap: 24 }}>
          <TrendChart metric="LCP" threshold={2500} />
          <TrendChart metric="INP" threshold={200} />
          <TrendChart metric="CLS" threshold={0.1} />
        </div>
      </section>

      <section style={{ marginTop: 48 }}>
        <h2>Alerting</h2>
        <AlertRuleEditor rules={rules} />
      </section>
    </div>
  );
}

6. Docker Compose Deployment

One-command deployment for the entire stack.

# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24
    ports:
      - "8123:8123"    # HTTP interface
      - "9000:9000"    # Native interface
    volumes:
      - clickhouse-data:/var/lib/clickhouse
      - ./init-db.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      CLICKHOUSE_DB: perf
      CLICKHOUSE_USER: perf
      CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD}
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes

  collector:
    build:
      context: .
      dockerfile: Dockerfile.server
    ports:
      - "3001:3001"
    environment:
      CLICKHOUSE_URL: http://clickhouse:8123
      REDIS_URL: redis://redis:6379
      PORT: 3001
    depends_on:
      - clickhouse
      - redis

  dashboard:
    build:
      context: .
      dockerfile: Dockerfile.dashboard
    ports:
      - "3000:3000"
    environment:
      API_URL: http://collector:3001
      NEXT_PUBLIC_API_URL: http://localhost:3001
    depends_on:
      - collector

  alerter:
    build:
      context: .
      dockerfile: Dockerfile.server
    command: ["node", "dist/alert-runner.js"]
    environment:
      CLICKHOUSE_URL: http://clickhouse:8123
      REDIS_URL: redis://redis:6379
      SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL}
    depends_on:
      - clickhouse
      - redis

volumes:
  clickhouse-data:
  redis-data:

Start the Platform

# Create .env with your secrets
cat > .env << 'EOL'
CLICKHOUSE_PASSWORD=your-secure-password
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/yyy/zzz
EOL

# Start all services
docker compose up -d

# Verify
docker compose ps
# clickhouse   running   0.0.0.0:8123->8123/tcp
# redis        running   0.0.0.0:6379->6379/tcp
# collector    running   0.0.0.0:3001->3001/tcp
# dashboard    running   0.0.0.0:3000->3000/tcp
# alerter      running

Nginx Reverse Proxy (Production)

# /etc/nginx/conf.d/perf-platform.conf
upstream dashboard {
  server 127.0.0.1:3000;
}

upstream collector {
  server 127.0.0.1:3001;
}

server {
  listen 443 ssl;
  server_name perf.internal.example.com;

  ssl_certificate     /etc/ssl/certs/perf.crt;
  ssl_certificate_key /etc/ssl/private/perf.key;

  # Dashboard
  location / {
    proxy_pass http://dashboard;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }

  # Collector API — accessed by browser SDK
  location /api/perf/collect {
    proxy_pass http://collector/collect;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;

    # Rate limiting
    limit_req zone=collector burst=100 nodelay;
  }

  # Query API — accessed by dashboard
  location /api/perf/ {
    proxy_pass http://collector/;
    proxy_set_header Host $host;

    # Internal only
    allow 10.0.0.0/8;
    allow 172.16.0.0/12;
    deny all;
  }
}

limit_req_zone $binary_remote_addr zone=collector:10m rate=50r/s;

7. Data Retention and Maintenance

-- Auto-drop partitions older than 90 days
ALTER TABLE perf_metrics DROP PARTITION
  WHERE toYYYYMMDD(timestamp) < toYYYYMMDD(now() - INTERVAL 90 DAY);

-- Schedule via cron or ClickHouse system scheduler
SYSTEM STOP MERGES perf_metrics;
OPTIMIZE TABLE perf_metrics FINAL;
SYSTEM START MERGES perf_metrics;
# Add to crontab — run daily at 3 AM
echo '0 3 * * * clickhouse-client --query "ALTER TABLE perf.perf_metrics DROP PARTITION WHERE toYYYYMMDD(timestamp) < toYYYYMMDD(now() - INTERVAL 90 DAY)"' | crontab -

Architecture B: ELK Stack

Use Elasticsearch for storage, Kibana for zero-code dashboards, and ElastAlert2 for threshold alerting. If your team already has an ELK cluster, you only need to add the APM Server and configure new indices.

ELK Architecture Overview

┌─────────────────────────────────────────────────────────┐
│  Browser                                                 │
│  ┌───────────────────┐                                   │
│  │ Elastic APM RUM    │── beacon ──┐                     │
│  │ Agent (or custom   │            │                     │
│  │ web-vitals SDK)    │            │                     │
│  └───────────────────┘            │                     │
└───────────────────────────────────┼─────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  ┌─────────────────┐   ┌──────────────────────────┐     │
│  │  APM Server      │   │  Logstash (optional)     │     │
│  │  (ingest RUM +   │   │  (ingest custom events)  │     │
│  │  server traces)  │   │                          │     │
│  └────────┬────────┘   └────────────┬─────────────┘     │
│           │                         │                    │
│           ▼                         ▼                    │
│  ┌─────────────────────────────────────────────┐        │
│  │  Elasticsearch                               │        │
│  │  - apm-* indices (RUM + APM data)            │        │
│  │  - perf-metrics-* indices (custom events)    │        │
│  │  - ILM for automatic retention               │        │
│  └────────────────────┬────────────────────────┘        │
│                       │                                  │
│           ┌───────────┴───────────┐                      │
│           ▼                       ▼                      │
│  ┌─────────────────┐   ┌──────────────────────┐         │
│  │  Kibana          │   │  ElastAlert2          │         │
│  │  - Lens charts   │   │  - threshold rules    │         │
│  │  - time picker   │   │  - Slack / WeCom /    │         │
│  │  - APM UI        │   │    DingTalk / email   │         │
│  └─────────────────┘   └──────────────────────┘         │
└─────────────────────────────────────────────────────────┘

1. Elastic APM RUM Agent

The official Elastic RUM agent automatically captures page loads, Core Web Vitals, user interactions, and JS errors.

// lib/elastic-rum.ts
import { init as initApm } from '@elastic/apm-rum';

const apm = initApm({
  serviceName: 'my-frontend',
  serverUrl: process.env.NEXT_PUBLIC_APM_SERVER_URL!, // e.g. https://apm.internal.example.com
  serviceVersion: process.env.NEXT_PUBLIC_APP_VERSION,
  environment: process.env.NODE_ENV,

  // Core Web Vitals are captured automatically
  // Additional configuration:
  distributedTracingOrigins: ['https://api.example.com'], // Correlate frontend → backend traces
  transactionSampleRate: 1.0,   // 100% of page loads (adjust for high-traffic sites)
});

export default apm;
// app/layout.tsx
'use client';
import { useEffect } from 'react';

export function ElasticRUMProvider({ children }: { children: React.ReactNode }) {
  useEffect(() => {
    import('../lib/elastic-rum');
  }, []);
  return <>{children}</>;
}

What the RUM agent captures automatically:

  • Page load transactions — navigation timing, resource loading
  • Core Web Vitals — LCP, CLS, INP, FCP, TTFB (as transaction marks)
  • User interactions — click, route change (as transactions)
  • JS errors — uncaught exceptions and promise rejections
  • HTTP requests — XHR and Fetch with timing and correlation IDs

2. Alternative: Custom SDK → Logstash

If you prefer the lightweight web-vitals SDK from Architecture A, route events through Logstash into Elasticsearch.

// Use the same perf-sdk.ts from section 1, but point sendBeacon to Logstash HTTP input
navigator.sendBeacon('https://perf.internal.example.com/api/perf/collect', blob);
# logstash/pipeline/perf.conf
input {
  http {
    port => 5044
    codec => json
  }
}

filter {
  # Parse and enrich
  date {
    match => ["timestamp", "UNIX_MS"]
    target => "@timestamp"
  }

  mutate {
    rename => { "metric" => "perf.metric" }
    rename => { "value"  => "perf.value"  }
    rename => { "rating" => "perf.rating" }
    rename => { "page"   => "url.path"    }
    rename => { "device" => "user_agent.device" }
  }

  # Add good/poor thresholds for visualization
  if [perf.metric] == "LCP" {
    if [perf.value] <= 2500 { mutate { add_field => { "perf.bucket" => "good" } } }
    else if [perf.value] <= 4000 { mutate { add_field => { "perf.bucket" => "needs-improvement" } } }
    else { mutate { add_field => { "perf.bucket" => "poor" } } }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "perf-metrics-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "${ES_PASSWORD}"
  }
}

3. Elasticsearch Index Template

Define mappings and ILM for automatic retention.

// PUT _index_template/perf-metrics
{
  "index_patterns": ["perf-metrics-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "index.lifecycle.name": "perf-metrics-ilm",
      "index.lifecycle.rollover_alias": "perf-metrics"
    },
    "mappings": {
      "properties": {
        "@timestamp":        { "type": "date" },
        "perf.metric":       { "type": "keyword" },
        "perf.value":        { "type": "float" },
        "perf.rating":       { "type": "keyword" },
        "perf.bucket":       { "type": "keyword" },
        "url.path":          { "type": "keyword" },
        "user_agent.device": { "type": "keyword" },
        "connection":        { "type": "keyword" },
        "app_version":       { "type": "keyword" },
        "session_id":        { "type": "keyword" }
      }
    }
  }
}
// PUT _ilm/policy/perf-metrics-ilm
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_primary_shard_size": "10gb"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "forcemerge": { "max_num_segments": 1 },
          "shrink":     { "number_of_shards": 1 }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}

4. Kibana Dashboards (Zero Code)

Kibana provides time-range trend charts, device breakdowns, and percentile visualizations without writing any code.

Creating a Web Vitals Dashboard in Kibana:

  1. Go to Kibana → Analytics → Dashboard → Create

  2. LCP Trend (Line chart)

    • Visualization type: Lens → Line
    • Index: perf-metrics-*
    • X-axis: @timestamp (date histogram, auto interval)
    • Y-axis: Percentile of perf.value → p50, p75, p95
    • Filter: perf.metric: LCP
    • Add Reference line at y=2500 (threshold)
  3. Good / Needs Improvement / Poor (Donut chart)

    • Visualization type: Lens → Donut
    • Slice by: perf.bucket
    • Filter: perf.metric: LCP
  4. Metrics by Page (Table)

    • Visualization type: Lens → Table
    • Rows: url.path
    • Metrics: Percentile 75 of perf.value, Count
    • Filter by metric using dashboard-level filter control
  5. Device Breakdown (Bar chart)

    • X-axis: user_agent.device
    • Y-axis: Percentile 75 of perf.value
    • Split series by: perf.metric

The Kibana time picker (top right) works automatically — users can switch between last 1 hour, 24 hours, 7 days, 30 days, or any custom range. All panels update together.

If using Elastic APM RUM Agent, Kibana's built-in APM UI (Kibana → Observability → APM) provides:

  • Service overview with throughput and latency
  • Transaction details with waterfall view
  • Core Web Vitals dashboard (pre-built)
  • Error tracking with stack traces
  • Service map showing frontend → backend dependencies

5. ElastAlert2 for Threshold Alerting

ElastAlert2 is an open-source alerting framework that queries Elasticsearch on a schedule and fires alerts based on rules.

# elastalert/config.yaml
rules_folder: /opt/elastalert/rules
run_every:
  minutes: 1
buffer_time:
  minutes: 60
es_host: elasticsearch
es_port: 9200
es_username: elastic
es_password: ${ES_PASSWORD}
writeback_index: elastalert_status
# elastalert/rules/lcp-warning.yaml
name: "LCP p75 exceeds 2.5s"
type: metric_aggregation
index: perf-metrics-*

# Query filter
filter:
  - term:
      perf.metric: LCP

metric_agg_key: perf.value
metric_agg_type: percentiles
metric_agg_percents: [75]
max_threshold: 2500
min_doc_count: 30

# Time window
timeframe:
  minutes: 60

# Cooldown — don't re-alert for 2 hours
realert:
  hours: 2

# Slack notification
alert:
  - slack
slack_webhook_url: "${SLACK_WEBHOOK_URL}"
slack_channel_override: "#perf-alerts"
slack_username_override: "PerfBot"
slack_msg_color: "warning"
alert_subject: "LCP p75 exceeded 2.5s threshold"
alert_text: |
  *LCP p75 exceeded threshold*
  Current p75: {0[75.0]} ms
  Threshold: 2500 ms
  Window: last 60 minutes
  Time: {match[@timestamp]}
alert_text_type: alert_text_only
# elastalert/rules/lcp-critical.yaml
name: "LCP p75 exceeds 4s (critical)"
type: metric_aggregation
index: perf-metrics-*

filter:
  - term:
      perf.metric: LCP

metric_agg_key: perf.value
metric_agg_type: percentiles
metric_agg_percents: [75]
max_threshold: 4000
min_doc_count: 30

timeframe:
  minutes: 15

realert:
  minutes: 30

alert:
  - slack
slack_webhook_url: "${SLACK_WEBHOOK_URL}"
slack_channel_override: "#perf-alerts"
slack_msg_color: "danger"
alert_subject: "CRITICAL: LCP p75 exceeded 4s"
# elastalert/rules/inp-warning.yaml
name: "INP p75 exceeds 200ms"
type: metric_aggregation
index: perf-metrics-*

filter:
  - term:
      perf.metric: INP

metric_agg_key: perf.value
metric_agg_type: percentiles
metric_agg_percents: [75]
max_threshold: 200
min_doc_count: 30

timeframe:
  minutes: 60

realert:
  hours: 2

alert:
  - slack
slack_webhook_url: "${SLACK_WEBHOOK_URL}"
slack_channel_override: "#perf-alerts"
slack_msg_color: "warning"
alert_subject: "INP p75 exceeded 200ms threshold"

For WeCom / DingTalk, use the post alert type:

# elastalert/rules/lcp-wecom.yaml
name: "LCP alert (WeCom)"
# ... same metric_aggregation config as above ...

alert:
  - post
http_post_url: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY"
http_post_headers:
  Content-Type: "application/json"
http_post_payload:
  msgtype: "markdown"
  markdown:
    content: >
      **[WARNING] LCP p75 exceeded threshold**
      > Current: {0[75.0]} ms | Threshold: 2500 ms
      > Window: 60 min | Time: {match[@timestamp]}

6. ELK Docker Compose

# docker-compose-elk.yml
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=true
      - ELASTIC_PASSWORD=${ES_PASSWORD}
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
    ports:
      - "9200:9200"
    volumes:
      - es-data:/usr/share/elasticsearch/data
    ulimits:
      memlock:
        soft: -1
        hard: -1

  kibana:
    image: docker.elastic.co/kibana/kibana:8.17.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

  apm-server:
    image: docker.elastic.co/apm/apm-server:8.17.0
    environment:
      - output.elasticsearch.hosts=["http://elasticsearch:9200"]
      - output.elasticsearch.username=elastic
      - output.elasticsearch.password=${ES_PASSWORD}
      - apm-server.rum.enabled=true
      - apm-server.rum.allow_origins=["*"]
      - apm-server.rum.allow_headers=["Content-Type"]
    ports:
      - "8200:8200"
    depends_on:
      - elasticsearch

  # Optional: only needed if using custom SDK instead of Elastic APM RUM
  logstash:
    image: docker.elastic.co/logstash/logstash:8.17.0
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    environment:
      - ES_PASSWORD=${ES_PASSWORD}
    ports:
      - "5044:5044"
    depends_on:
      - elasticsearch

  elastalert:
    image: jertel/elastalert2:latest
    volumes:
      - ./elastalert/config.yaml:/opt/elastalert/config.yaml
      - ./elastalert/rules:/opt/elastalert/rules
    environment:
      - ES_PASSWORD=${ES_PASSWORD}
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
    depends_on:
      - elasticsearch

volumes:
  es-data:
# Start the ELK stack
cat > .env << 'EOL'
ES_PASSWORD=your-secure-password
KIBANA_PASSWORD=your-kibana-password
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/yyy/zzz
EOL

docker compose -f docker-compose-elk.yml up -d

# Verify
# Elasticsearch: http://localhost:9200
# Kibana:        http://localhost:5601
# APM Server:    http://localhost:8200

7. ELK Data Retention

ILM (Index Lifecycle Management) handles retention automatically — configured in the index template above. No cron jobs needed.

Index Lifecycle:
hot (0–7 days)  → rollover at 7d or 10 GB
warm (7–30 days) → force merge, shrink
delete (90 days) → auto-delete

To adjust retention, update the ILM policy in Kibana: Kibana → Stack Management → Index Lifecycle Policies → perf-metrics-ilm


Best Practices

Self-Hosted Platform Guidelines

  1. Start simple — both ClickHouse and a single-node ELK can handle millions of events per day
  2. Batch writes — buffer events and flush periodically; never write one row at a time
  3. Pre-aggregate — ClickHouse materialized views or ES transforms speed up dashboard queries
  4. Set data retention — ClickHouse partition drops or ES ILM; 90 days is a good default
  5. Rate-limit the collector — protect against SDK bugs or traffic spikes flooding your storage
  6. Cooldown on alerts — avoid alert fatigue by enforcing minimum gaps between repeated notifications
  7. Require minimum samples — skip alerting when data volume is too low to be statistically meaningful
  8. Secure the platform — put the dashboard and query API behind your internal network or VPN
  9. Choose based on your team — reuse existing ELK if available; otherwise ClickHouse is leaner

On this page