Managing path cleaning rules
Path cleaning rules normalize $pathname and $entry_pathname so that pages
sharing the same template (/users/123/profile, /users/456/profile, …) collapse
into one row (/users/<id>/profile) in Web analytics tiles, Paths insights, and
any HogQL query that calls apply_path_cleaning. They are the right answer when
a breakdown is fragmented across thousands of near-identical URLs.
This skill teaches you how to:
- recognize when path cleaning is the right tool
- inspect real paths to find what needs cleaning
- write
regex+aliasrules in re2 syntax with the project's placeholder convention - test rules before saving them
- order rules so specific patterns aren't swallowed by generic ones
- apply the rules via MCP
Data model
Team.path_cleaning_filters is a JSON list of PathCleaningFilter objects:
{
"regex": "/users/\\d+/profile",
"alias": "/users/<id>/profile",
"order": 0
}
regex— a re2 pattern. No need to escape/. Anchor with^/$when you mean it.alias— the literal replacement. Use angle-bracket placeholders (<id>,<slug>,<uuid>,<date>) by convention so the cleaned path stays human-readable. The alias is not a regex template — backreferences are not supported.order— integer. Rules apply sequentially inorderascending, each rule's output feeds the next.
Application is replaceRegexpAll(pathname, regex, alias) per rule, chained.
Source: posthog/hogql/property.py:613.
Workflow
1. Confirm path cleaning is the right move
Ask yourself: is the user complaining about cardinality (too many distinct paths
in a chart), or do they want a per-URL drill-down? Path cleaning is for the
former. If they want per-URL data, suggest a property filter on $pathname
instead.
2. Inspect the real paths
Don't guess at patterns — query them. With the execute-sql MCP tool:
SELECT properties.$pathname AS path, count() AS views
FROM events
WHERE event = '$pageview'
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY path
ORDER BY views DESC
LIMIT 200
Scan the result for:
- numeric IDs:
/users/123,/orders/4242 - UUIDs:
/sessions/8f3c1a3b-… - slugs:
/posts/why-i-love-posthog - dates:
/archive/2024-09-12 - locales:
/en-US/,/fr-FR/ - pagination:
?page=3,/page/3/
3. Draft regex + alias
| Pattern | Example match | regex |
alias |
|---|---|---|---|
| Numeric segment | /users/123/profile |
/users/\d+/profile |
/users/<id>/profile |
| UUID v4 | /sessions/8f3c1a3b-… |
/sessions/[0-9a-f-]{36} |
/sessions/<uuid> |
| Slug | /posts/why-posthog |
/posts/[a-z0-9-]+$ |
/posts/<slug> |
| ISO date | /archive/2024-09-12 |
/archive/\d{4}-\d{2}-\d{2} |
/archive/<date> |
| Locale prefix | /en-US/about |
^/[a-z]{2}-[A-Z]{2}/ |
/<locale>/ |
| Trailing query/page | /blog?page=3 |
\?page=\d+$ |
(empty alias drops it) |
Anchoring rules of thumb:
- start the regex with
^only when the segment must be at the beginning of the path - end with
$to keep a generic rule (e.g.\d+$) from matching mid-path segments
4. Test before saving
Three options, pick one:
Settings page tester:
/settings/project#path_cleaninghas a built-in "test path" input that replays the full ordered chain.Project HogQL (via
execute-sql):SELECT replaceRegexpAll('/users/42/profile', '/users/\d+/profile', '/users/<id>/profile')Chain
replaceRegexpAllcalls in the same order the rules will run if you want to verify multi-rule interaction.Built-in AI helper: there is already an
AiRegexHelpermodal accessible from the rule editor (Help me with Regexbutton) that turns natural language into a regex. Suggest it to the user when they say "I don't know regex" — but always validate the output against real paths via the tester.
5. Order rules from most-specific to most-general
Sequential application means a generic rule placed first will swallow everything that should have hit a specific rule.
order=0 /users/me/profile → /users/me/profile (specific, runs first)
order=1 /users/\d+/profile → /users/<id>/profile
order=2 /users/[a-z0-9-]+ → /users/<slug> (catch-all, runs last)
If /users/[a-z0-9-]+ ran first it would also match /users/me/profile and
make the more specific rule unreachable.
6. Apply via MCP
Use the project-settings-update tool with the full list (the field is
replaced, not merged):
{
"path_cleaning_filters": [
{ "regex": "/users/me/profile", "alias": "/users/me/profile", "order": 0 },
{ "regex": "/users/\\d+/profile", "alias": "/users/<id>/profile", "order": 1 },
{ "regex": "/users/[a-z0-9-]+", "alias": "/users/<slug>", "order": 2 }
]
}
Always read the existing rules first (project settings include
path_cleaning_filters) and merge — overwriting silently destroys whatever the
team has already configured.
Where the rules apply
When the user (or a HogQL query) opts in:
- Web analytics: the Path cleaning toggle in the page header
(
PathCleaningToggle.tsx) - Paths insights: the path cleaning toggle in the insight filters
- HogQL: any query that calls
apply_path_cleaning(path_expr, team)
The rules are stored once per project — they are not insight-scoped.
Common pitfalls
- Backreferences in
aliasneed double-escaping — ClickHouse'sreplaceRegexpAllsupports\0(whole match) and\1–\9(capture groups). In a JSON field or SQL string literal the backslash must be doubled, so use\\1inpath_cleaning_filters/ HogQL to get the\1backreference at the ClickHouse layer. - Forgetting
$—\d+without an end anchor matches every numeric run in any path, so/blog/2024-09-12/postbecomes/blog/<num>-<num>-<num>/postwhen you only meant to match the year segment. Use\d+$or\d+(/|$)depending on intent. - Escaping
/— re2 does not require it.\/works but adds noise. - Case sensitivity — re2 is case-sensitive by default. Use
(?i)at the start of the pattern for case-insensitive matching, e.g.(?i)/users/\d+. - Replacing the whole list —
path_cleaning_filtersis overwrite, not append. Always start from the current list. - Rules apply globally — adding a rule can change historical numbers in every Web analytics / Paths chart that has cleaning enabled. Warn the user before applying anything destructive.