Physical AI
← All projects
Project 02Phase AData fluency·Hardware: Laptop
COLLECTCURATE

11 — MCAP & ROS 2 Plumbing

A small, focused project that fills the data plumbing gap in the rest of this portfolio: producing, reading, slicing, querying, and visualizing the actual log formats that modern AV/robotics teams ship — MCAP and ROS 2 bag2 — and building a scenario-tag mining CLI that is the kernel of what Applied Intuition's Logfile Studio does at industrial scale.


Goal

By the end of this project you will be able to:

  1. Produce a multi-topic MCAP file from scratch using Foxglove well-known schemas (/camera, /lidar, /imu, /tf).
  2. Read an MCAP file efficiently — stream messages, decode by schema, and time-align topics.
  3. Round-trip a ROS 2 bag2 to MCAP and back via ros2 bag convert (optional Linux path).
  4. Inspect logs visually in Foxglove Studio (drag-and-drop, no cloud needed).
  5. Mine a corpus of N MCAP files for scenario tags via a CLI tool: e.g. "find every clip where the ego stopped at a four-way stop in rain at night."
  6. Map MCAP topics + schemas to ASAM OpenLABEL annotation slots — the bridge from raw logs to labels.

This is the smallest end-to-end loop that exercises the layer Applied Intuition's Data Intelligence team works on every day.


Loops touched

This project explicitly exercises two of the five canonical data-engine loops:

  • COLLECT. We synthesize multi-topic MCAP files with realistic schemas, validate they index correctly, and verify they round-trip through ROS 2 bag2. This is the same shape as ingesting fleet logs into a centralized store. The synthetic generator simulates a 60-second drive with /camera, /lidar, /imu, and /tf channels; the per-clip scenario metadata is embedded as MCAP Metadata records — the same mechanism a real fleet uploader uses to attach vehicle_id, firmware_version, route_id, and similar provenance keys.
  • CURATE. The tag_mining.py CLI scans a corpus, applies rule-based + small-model-based filters, and emits a CSV of hits keyed by (file, start_ns, end_ns, tags). This is curation: turning a haystack into a search index. The CLI fuses two sources of tags — channel-derived metadata (cheap, deterministic) and detector-derived metadata (a function evaluated over real messages) — and preserves provenance per tag so a reviewer can debug why any given clip was flagged.

The other three loops (LABEL, TRAIN, EVAL) are downstream consumers of the curated index this project produces. In particular, the OpenLABEL JSON we emit in step 10 of the notebook is the artifact a labeling tool would consume: scenario tags become frame_intervals, sensors become streams, and ego/world frames become coordinate_systems. That is the seam between log and label, and Logfile Studio sits squarely on it.


Why this matters for AI Data Intelligence

Most "AV foundation model" or "scenario mining" demos assume the data is already in pandas or parquet. It isn't. Real autonomy logs live in:

  • MCAP — the de-facto serialization-agnostic container for modern robotics. Foxglove created it, ROS 2 Iron made it the default bag format, and every serious AV company stores logs in MCAP or something MCAP-shaped (compressed, indexed, multi-topic, schema-self-describing).
  • ROS 2 bag2 — the source format for nearly every robot log on Earth that goes through ROS. Even teams that re-encode to MCAP usually ingest from bag2 first.

Applied Intuition's Logfile Studio is fundamentally a tool for reading + slicing + searching this layer. Without fluency here, every higher-level pitch about "data intelligence" is hand-waving. You cannot mine scenarios you cannot decode.

This project closes that gap. It is the prerequisite skill stack for the role.

A more concrete way to see the gap: the other ten projects in this portfolio touch FiftyOne, SAM 2, CARLA, Cosmos, OpenVLA, LIBERO, BDD100K, BEVFormer, Bench2Drive, and Gaussian Splatting. None of them open an MCAP file. Every one of them assumes the data is already a folder of PNGs, a pandas.DataFrame, or a webdataset shard. That assumption is the output of the layer we are building here, not the input. If a data-intelligence role is graded on "can you reason about logs end-to-end," the missing link is exactly the link this project provides.

A second way to frame it: most candidates for the role will come in fluent in one layer — either model training or labeling tooling. Fluency in the format and indexing layer between them is rarer, harder to bluff, and most directly aligned with the actual day-to-day work. That is the leverage point.


Prerequisites

  • Python 3.10+
  • Familiarity with argparse / click, basic protobuf concepts, and one of pandas/polars.
  • Optional: ROS 2 Jazzy (Ubuntu 24.04) or Humble (Ubuntu 22.04) for the round-trip path. Not required for the MCAP-only path.

Hardware

  • Laptop, no GPU. Everything runs on CPU.
  • Linux strongly preferred if you want the ROS 2 round-trip step (ros2 bag convert). Ubuntu 24.04 + ROS 2 Jazzy is the easiest combination as of 2026.
  • macOS works for the MCAP-only path. The Foxglove Studio desktop app is cross-platform (macOS, Linux, Windows). You will skip the ros2 bag step on Mac, which is fine — the CLI tool and notebook still demonstrate the full data flow.
  • ~2 GB free disk for the synthetic corpus.

Setup

cd projects/02-mcap-ros2-plumbing
bash setup.sh
source .venv/bin/activate

setup.sh creates a venv and installs from requirements.txt. It does not install ROS 2 — that is a separate, distro-specific step. See the ROS 2 install pointer at the end of setup.sh if you want the round-trip path.

Verified-working versions (May 2026):

PackagePinned versionPurpose
mcap1.3.1Core reader/writer
mcap-protobuf-support0.5.4Protobuf encode/decode helpers
mcap-ros2-support0.5.7ROS 2 IDL decode
foxglove-schemas-protobuf0.3.0Well-known schema bindings
click8.1+CLI framework
pandas, numpy, opencv-pythonlatestTagging logic & frame extraction

For visualization, install Foxglove desktop app from https://foxglove.dev/download. Free tier; no account or cloud needed for local files.

For the optional ROS 2 path: install ROS 2 Jazzy following https://docs.ros.org/en/jazzy/Installation.html. Then apt install ros-jazzy-rosbag2-storage-mcap.


Steps

  1. Run setup.sh. Activate the venv. Run python -c "import mcap; print(mcap.__version__)". Should print 1.3.x.
  2. Install Foxglove desktop. Verify it opens.
  3. Open notebook.py in Jupytext / VS Code. Walk top-to-bottom. Section 1 verifies installs.
  4. Synthesize a multi-topic MCAP. Section 2 emits data/synthetic_drive.mcap with a 60-second drive containing /camera/front, /lidar/top, /imu, /tf — using Foxglove well-known schemas (RawImage, PointCloud, Imu mapped to sensor_msgs/Imu-shape, FrameTransform).
  5. Read it back. Section 3 opens the file with mcap.reader.make_reader, iterates with iter_decoded_messages, and time-aligns the camera + lidar streams to within 10 ms.
  6. Drag the MCAP into Foxglove. Confirm the 3D panel renders the lidar; the Image panel renders the camera; the Plot panel shows IMU traces. Take a screenshot, save under docs/foxglove_screenshot.png.
  7. (Optional, Linux + ROS 2.) Section 4 records a tiny live bag with ros2 bag record -a -s mcap, then runs ros2 bag convert round-trip. Confirms the file the CLI emits is bit-for-bit usable in the ROS 2 ecosystem.
  8. Synthesize a corpus. Section 5 generates 20 MCAP files with varying scenario metadata (weather, time-of-day, intersection-type, ego-motion). This is your test corpus.
  9. Run tag_mining.py. From the project root: python tag_mining.py --corpus data/corpus/ --filter "rain AND night AND four_way_stop" --out hits.csv. You should get a CSV with the hits.
  10. Map to OpenLABEL. Section 6 of the notebook walks through how each MCAP topic maps to an OpenLABEL slot (objects, frames, streams, tags). This is the bridge from logs to labels.
  11. Reflection. Section 7 explicitly maps what we built to what Logfile Studio does at industrial scale: same primitives, three orders of magnitude bigger.
  12. (User TODO.) Extend the tagger with a CLIP-text query over front-camera frames extracted from MCAP. The notebook scaffolds the entry point.

Done criterion

You can answer the following question via your CLI, end-to-end, on a corpus of MCAP files:

"Given a corpus of N MCAP files, find me all clips where the ego stopped at a four-way stop in rain at night."

Your tool returns a CSV with (file, start_ns, end_ns, matched_tags, confidence) rows. The Foxglove inspection round-trip confirms the hits are real.


Common pitfalls

  1. Schema vs. message type confusion. MCAP separates schemas (structure definitions) from channels (named topic + schema pairs) from messages (payloads). A common bug: registering a schema once but accidentally creating two channels with the same topic name. Always reuse the channel ID returned by writer.register_channel.
  2. Time-sync issues. log_time and publish_time are not the same. publish_time is when the message was originally produced; log_time is when the writer wrote it. For replay/sync logic you almost always want log_time. For "real-world causality" you want publish_time. Pick one and document it.
  3. ROS 2 distro / FastDDS install gotchas. On Ubuntu 22.04, ROS 2 Humble ships with FastDDS by default; on 24.04, Jazzy ships with Cyclone DDS as recommended. If ros2 bag record hangs with no error, your DDS layer is broken — check RMW_IMPLEMENTATION env var. Also: apt install ros-${ROS_DISTRO}-rosbag2-storage-mcap is a separate package from rosbag2.
  4. MCAP indexing for fast random access. A non-indexed (a.k.a. "streaming") MCAP file forces a full scan to find a topic in a time range. For a corpus of large files, always finalize with writer.finish() (which flushes the index) rather than os._exit. Verify with mcap info <file> — it should show "Chunk indexes: N" non-zero.
  5. Large-file streaming. iter_messages(topics=[...]) with topic filtering uses the chunk index to seek; without filtering, you're scanning. For multi-GB files, always pass topics= and a start_time/end_time window. Don't materialize whole streams as lists.
  6. Foxglove well-known schemas vs. ROS 2 messages. Foxglove's foxglove.RawImage and sensor_msgs/Image are similar but not identical (field names, encoding strings). Pick one and stick with it per topic. The Foxglove schemas are recommended for non-ROS pipelines because they don't require a ROS install to decode.
  7. ros2 bag convert requires a YAML config, not just CLI flags. The "convert" subcommand wants -i input.yaml -o output.yaml with explicit storage IDs. Don't expect ros2 bag convert in.db3 out.mcap to work directly.

Further reading


Interview prep — questions this project should let me answer

A short list of questions a Data Intelligence interviewer might reasonably ask, and what doing this project should let me say:

  • "How would you index 100 PB of MCAP for fast scenario queries?" — Two-tier: chunk-level metadata in the file (already there via Statistics + chunk indexes), and a separate columnar metadata store (Iceberg or DuckDB-on-Parquet) with one row per (file, channel, chunk) carrying the message-count, time range, and any pre-computed scenario tags. Hot queries hit the columnar store and only fetch chunks that pass; cold queries fall back to scanning. The MCAP file is the source of truth; the columnar store is a cache.
  • "Why MCAP and not bag2?" — MCAP is serialization-agnostic (protobuf, ROS 1, ROS 2, JSON, Flatbuffers, custom — all in one container), self-describing (schemas embedded), and natively chunked + indexed for time-windowed reads. ROS 2 bag2 (sqlite3) is a SQLite database that happens to hold messages — fine for single-machine, awkward for distributed. As of ROS 2 Iron, MCAP became the default rosbag2 storage plugin, which is the strongest possible tooling-side endorsement.
  • "How do you decide a clip is 'a four-way stop in rain at night'?" — Three signals fused: (1) recorded scenario metadata if the fleet upload tagged it, (2) rule-based detection on real messages (e.g. ego_stopped from /imu integrated against /tf), (3) learned signals (CLIP-text similarity over front-camera frames, or a small classifier on the lidar BEV). Confidence becomes the union of evidence; precision is tuned by the AND/OR/threshold structure of the filter.
  • "What's the failure mode of a metadata-only pipeline?" — Metadata is always lying or missing. Fleets upload firmware that drops half the tags; labelers tag inconsistently; rare scenarios have no metadata at all. The whole point of detector-derived tags is to recover from that. The whole point of preserving evidence per tag is so when the pipeline is wrong, you can find out why fast.
  • "What does Logfile Studio actually do, in your model?" — Two things at the core: (1) a queryable index over a fleet of MCAP, and (2) a viewer (Foxglove-style) that opens any clip the query returns. Everything else — scenario tagging, label hand-off, replay, debugging — is built on those two primitives.

Notes on framing for AI interviews

Three sentences I want to be able to say cold:

  1. "MCAP is to robotics logs what Parquet is to analytics: a chunked, indexed, schema-self-describing container — and like Parquet, the value isn't the file format, it's the queries it enables at scale."
  2. "The interesting question isn't 'can you parse MCAP?' — it's 'given 100 PB of MCAP, can you find the 30 seconds your model is failing on?' That's a search problem, and the right index is the difference between a 4-hour cron and a 4-second query."
  3. "OpenLABEL is the schema, MCAP is the container, and scenario tagging is the join. Logfile Studio is fundamentally a query engine over that join."

If I can defend those three sentences with the artifacts in this folder, the project has done its job.

Files in this project

  • README.md
  • notebook.py
  • requirements.txt
  • setup.sh
  • tag_mining.py

Notebook (notebook.py) is in jupytext percent format — open in VS Code or convert with jupytext --to notebook.