jpq

jq-style JSON filtering with Python expressions.

2026-05-13

Table of Contents

jpq is a tiny CLI I wrote because I have jq reflexes but a Python brain. JSON comes in on stdin, gets bound to this, an expression runs against it, and the result goes back out as JSON. No DSL to memorise — just Python.

Last updated 2026-05-15, against jpq v0.1.5.

Note
Output is now syntax-highlighted automatically when stdout is a TTY — no flag needed. Pass --no-color or set NO_COLOR=1 to opt out; piping or redirecting still produces plain JSON, so downstream tools won’t see ANSI escape codes.

Install

uv tool install jpq

pipx and pip also work — see the README.

The model

Three things to remember, and then you’re done:

Stdin JSON is parsed and bound to this.
The expression you pass is eval’d with re, os, collections, itertools, statistics, math, datetime, pathlib pre-imported, plus every builtin, plus an env("NAME") helper for env vars.
The result is dumped back to stdout as JSON. set, datetime, and pathlib.Path are coerced automatically; -c / --compact strips the indentation when you want to pipe further.

That’s it. Anything you can write as a single Python expression, you can pass to jpq.

A demo dataset

The examples below run against a small fake log file — 18 events across four days. Stick the curl behind an alias so the commands stay readable:

alias events='curl -s https://gist.githubusercontent.com/rayannott/8c29315b01c55903c25c0a4337a2e041/raw/d0ccb1a979c889c968fa014c14124f92ae2d245f/logs.json'

events | jpq 'len(this["events"])'
# 18

The schema, as a 3-event preview (one auth, one build, one deploy):

{
  "fetched_at": "2026-05-13T14:00:00+00:00",
  "events": [
    {"id": "evt_001", "ts": "2026-05-10T08:14:22+00:00", "level": "INFO",  "category": "auth",   "score": 0.91, "message": "user alice logged in from 10.0.0.42",             "tags": ["mfa", "prod"]},
    {"id": "evt_013", "ts": "2026-05-12T14:20:00+00:00", "level": "ERROR", "category": "build",  "score": 0.20, "message": "build #1239 failed after 420s on agent 10.0.0.7", "tags": ["nightly", "dev"]},
    {"id": "evt_015", "ts": "2026-05-13T06:00:00+00:00", "level": "INFO",  "category": "deploy", "score": 0.93, "message": "deploy v0.4.3 to staging by dave",                "tags": ["dev"]}
  ]
}

Examples

1. Count the events — builtins

events | jpq 'len(this["events"])'

2. Level distribution — `collections.Counter`

events | jpq 'collections.Counter(e["level"] for e in this["events"])'

{
  "INFO": 12,
  "WARN": 4,
  "ERROR": 2
}

3. Tag frequency — `Counter` over a flattened generator

events | jpq 'collections.Counter(t for e in this["events"] for t in e["tags"])'

{
  "mfa": 7,
  "prod": 8,
  "nightly": 3,
  "flaky": 2,
  "dev": 10,
  "pr": 4,
  "canary": 1,
  "new-user": 1,
  "rollback": 1
}

4. Mean score per category — `itertools.groupby` + `statistics`

events | jpq '{k: round(statistics.mean(e["score"] for e in g), 3)
 for k, g in itertools.groupby(
     sorted(this["events"], key=lambda e: e["category"]),
     key=lambda e: e["category"])}'

{
  "auth": 0.604,
  "build": 0.649,
  "deploy": 0.775
}

Note: itertools.groupby only groups consecutive equal keys, so if you forget to sorted() first you get garbage.

5. Daily buckets — `groupby` keyed by `datetime.date`

events | jpq '{str(d): [e["id"] for e in g]
 for d, g in itertools.groupby(
     sorted(this["events"], key=lambda e: e["ts"]),
     key=lambda e: datetime.datetime.fromisoformat(e["ts"]).date())}'

{
  "2026-05-10": ["evt_001", "evt_002", "evt_003", "evt_004", "evt_005"],
  "2026-05-11": ["evt_006", "evt_007", "evt_008", "evt_009"],
  "2026-05-12": ["evt_010", "evt_011", "evt_012", "evt_013", "evt_014"],
  "2026-05-13": ["evt_015", "evt_016", "evt_017", "evt_018"]
}

6. Parse build messages — `re.search` with named groups

Free-text messages become structured records.

events | jpq '[re.search(r"build #(?P<num>\d+).*?(?P<dur>\d+)s", e["message"]).groupdict()
 for e in this["events"]
 if e["category"] == "build" and e["level"] != "INFO"]'

[
  {"num": "1234", "dur": "312"},
  {"num": "1237", "dur": "510"},
  {"num": "1239", "dur": "420"}
]

7. IP-bearing events, partitioned by `level`

Filter to events whose message contains an IP, then bucket by whether they’re INFO or not:

events | jpq '{k: [e["id"] for e in this["events"]
     if re.search(r"\d+\.\d+\.\d+\.\d+", e["message"])
     and ("info" if e["level"] == "INFO" else "non-info") == k]
 for k in ("info", "non-info")}'

{
  "info": ["evt_001", "evt_007", "evt_010", "evt_012", "evt_018"],
  "non-info": ["evt_003", "evt_013", "evt_014"]
}

8. Score summary — `statistics` + `math`

events | jpq '{"avg": round(statistics.mean(e["score"] for e in this["events"]), 3),
 "stdev": round(statistics.stdev(e["score"] for e in this["events"]), 3),
 "log2_n": round(math.log2(len(this["events"])), 3)}'

{
  "avg": 0.659,
  "stdev": 0.285,
  "log2_n": 4.17
}

9. Pipe `jpq` into `jpq` to keep each stage small

jpq’s output is JSON, which makes it valid jpq input. Splitting a heavy transformation across two pipes keeps each stage trivially debuggable (run the first one alone and eyeball the result) and lets you reuse the intermediate shape.

Compare aggregating build durations as one expression versus as two pipes. Stage 1 parses the messages into structured records; stage 2 aggregates over them:

events \
  | jpq '[re.search(r"build #(?P<num>\d+).*?(?P<dur>\d+)s", e["message"]).groupdict()
          for e in this["events"] if e["category"] == "build"]' \
  | jpq '{"mean_dur_s": round(statistics.mean(int(r["dur"]) for r in this), 1),
          "max_dur_s": max(int(r["dur"]) for r in this),
          "stdev_dur_s": round(statistics.stdev(int(r["dur"]) for r in this), 1),
          "n_builds": len(this)}'

{
  "mean_dur_s": 334.3,
  "max_dur_s": 510,
  "stdev_dur_s": 95.4,
  "n_builds": 7
}

Drop the second | jpq ... and you see what stage 1 produced — a clean list of {"num": ..., "dur": ...} records — which is also the answer to “why is my final number wrong?”. Try writing the same logic as a single nested expression and you’ll appreciate the pipe.

10. JSONL on stdin — slurp with `jq -s`

jpq expects a single JSON value on stdin, not JSONL (one JSON object per line). I deliberately didn’t add a flag for this: jq -s already turns a stream of values into a list, and there’s no reason to reinvent it.

cat events.jsonl | jq -s '.' | jpq 'collections.Counter(e["level"] for e in this)'

Same expression as example 2, just with a different input shape. The jq -s step is the only thing that changes.

Tips

-c / --compact strips indentation. Useful when piping into another tool that wants single-line JSON.
Exit codes mean something: 3 on bad stdin, 4 on an expression that fails to parse or raises at runtime, 5 on a non-JSON-serialisable result. Scripts can branch on these.
Colourisation is on by default when stdout is a TTY (see the note at the top). jpq | jq still works if you prefer jq’s palette — the two are friends, not rivals.

Tip
If a one-liner gets long, break it across multiple lines inside the same quoted string — the expression compiler doesn’t care about whitespace, and your shell will keep the quote open until you close it.

Install#

The model#

A demo dataset#

Examples#

1. Count the events — builtins#

2. Level distribution — collections.Counter#

3. Tag frequency — Counter over a flattened generator#

4. Mean score per category — itertools.groupby + statistics#

5. Daily buckets — groupby keyed by datetime.date#

6. Parse build messages — re.search with named groups#

7. IP-bearing events, partitioned by level#

8. Score summary — statistics + math#

9. Pipe jpq into jpq to keep each stage small#

10. JSONL on stdin — slurp with jq -s#

Tips#