Blackbox Robotics: Incident Packetization, Replay, and Command-Ownership Attribution for Deployed Robot Fleets
Six-layer incident architecture
Sensor, command, operator, and safety traces are converted into a bounded packet, then aligned into a timeline, attributed across command authorities, and rendered into audience-specific reports.
Abstract
Deployed robot fleets increasingly operate in warehouses, hotels, clinics, offices, sidewalks, factories, and other environments where autonomy, remote operators, safety controllers, site policy, and bystanders interact in the same physical event. When an incident occurs, the evidence required to understand it is usually fragmented across camera files, ROS2 bags, MCAP logs, fleet dashboards, teleoperation sessions, safety controller traces, support tickets, operator notes, and customer reports. This fragmentation creates a reconstruction gap: robot teams may possess raw data, but lack a standard evidence object that explains what happened, who controlled the robot, which sources support each claim, and what can be shared externally.
We present Blackbox Robotics, an incident packetization and replay framework for deployed robot fleets. The framework defines the Blackbox Incident Packet (BIP), a bounded, schema-validated evidence object containing incident metadata, stream manifests, timeline events, command-ownership intervals, privacy policy, hash metadata, and stakeholder reports. We also introduce Command-Ownership Attribution (COA), an interval model that separates decision source, execution authority, constraint source, and intervention source during robot incidents. The system architecture is organized as a six-layer incident stack: recorder and trigger, evidence manifest, temporal alignment, ownership attribution, replay and report, and fleet incident graph.
We evaluate the v0.3 prototype using a seeded incident corpus covering three deployment archetypes: a warehouse near miss, a hotel service-cart contact event, and a clinic handoff abort caused by command authority conflict. The public Dock Aisle Near-Miss artifact contains 4 packet files, 6 evidence streams, 6 timeline events, 3 command-ownership intervals, and a 330 s evidence window. The current JSON validator passes the packet schema in 2.87 s on a developer workstation. We report source coverage, timeline consistency, ownership coverage, schema validity, baseline comparison, ablation analysis, response-time decomposition, packet-quality scoring, and deployment failure modes. The system is a prototype and does not claim legal fault assignment, safety certification, real customer deployment, or incident prevention. Its purpose is to make robot incident reconstruction testable, reviewable, and shareable.
Keywords: robot incident analysis, robot fleet operations, ROS2, MCAP, teleoperation, robot observability, command ownership, evidence packet, post-incident replay, safety review, service robots.
I. Introduction
Robot fleets are moving from controlled demonstrations into continuous operations. Professional service robots now work in logistics, cleaning, inspection, hospitality, retail, healthcare support, and delivery. The International Federation of Robotics reported 9% growth in professional service robot sales in 2024, more than 199,000 professional service robots in its sample, and 31% growth in Robotics-as-a-Service fleet size. As the deployed base grows, robot incidents become operational events rather than isolated engineering bugs.
A single robot incident can involve multiple layers:
- perception confidence changes
- route replanning
- local controller commands
- remote operator takeover
- network delay or recovery
- safety controller gating
- site-policy violations
- customer-facing impact
- privacy-sensitive video evidence
- post-event support escalation
In current workflows, these traces are usually examined through separate systems. Engineers inspect ROS bags or MCAP files. Operations teams review fleet dashboards. Remote-assist teams inspect operator session logs. Safety teams ask for event details. Customer-success teams write a report. Insurance or compliance reviewers may request evidence after the fact. The result is slow and inconsistent reconstruction.
This paper argues that deployed robot fleets need an incident system of record. The core artifact should not be a folder of logs or a dashboard screenshot. It should be a bounded packet that preserves evidence, reconstructs the event timeline, attributes command authority, and produces audience-specific reports.
We make four contributions.
- Blackbox Incident Packet (BIP). We define a versioned packet abstraction for robot incidents, including metadata, stream manifest, timeline, command ownership, privacy metadata, hashes, retention policy, and report views.
- Command-Ownership Attribution (COA). We introduce an interval model for attributing decision source, execution authority, constraint source, and intervention source during incidents.
- Six-layer incident architecture. We describe a recorder-to-report stack for robot fleets that integrates with existing ROS2, MCAP, teleoperation, safety, and fleet operations systems.
- Seeded evaluation protocol. We evaluate the prototype on three incident archetypes and one public packet, reporting schema validity, source coverage, timeline consistency, ownership coverage, baseline comparison, ablation, and failure modes.
II-A. Event Data Recorders and Incident Reconstruction
Vehicle event data recorders provide a useful analogy for bounded evidence capture. NHTSA describes event data recorders as systems that record technical information for a short period before, during, and after a crash. They are not full surveillance systems; they are event-triggered reconstruction artifacts. Robot fleets need a similar concept, but with additional complexity: robots may combine autonomous decision-making, teleoperation, cloud orchestration, safety gating, and human-robot interaction.
Blackbox borrows the bounded-window principle while extending the evidence model to multimodal robotics logs and command authority transitions.
II-B. Robot Data Formats and Observability
ROS2 bag tooling supports recording and playback of topic data. MCAP provides an open container format for timestamped multimodal log data. Robot observability tools such as Foxglove, Formant, and InOrbit help teams visualize logs, inspect telemetry, manage fleets, and debug behavior.
Blackbox is not a replacement for these tools. It is a packetization layer above them. A BIP can reference ROS2 bags, MCAP files, video streams, safety events, teleoperation logs, and external attachments while exposing incident-level semantics: trigger, timeline, ownership, evidence quality, privacy status, and report views.
II-C. Human-Robot Incident Analysis
Human-robot interaction safety research increasingly treats incidents as systemic events rather than single-component failures. Accident analysis often involves human behavior, environment changes, robot policy, interaction timing, and system design. Recent work on HRI incident archetypes reinforces the need to preserve both machine traces and human-facing context.
Blackbox operationalizes this by making timeline events source-referenced and by keeping customer-facing impact separate from internal debug detail.
Remote operation changes the reconstruction problem. In supervised autonomy, an event may include autonomy, remote operator input, stale command packets, network recovery, and safety overrides. Privacy research in teleoperated robots also shows that remote presence in human environments creates sensitive data exposure risks.
Blackbox treats teleoperation and privacy as first-class packet properties. Operator commands are not buried in unstructured logs, and raw video is not assumed to be externally shareable.
II-E. Safety Standards and Compliance Boundaries
Safety standards such as ISO 10218-1:2025 provide requirements for industrial robot safety. Blackbox does not replace certified safety systems or legal review. Its role is post-event technical reconstruction. The packet can support safety review by preserving evidence, but it does not itself certify a robot or assign legal fault.
II-F. Agentic Robotics and System Papers
Recent robotics system papers, including SafeGuard ASF, use a layered architecture, scenario-driven methods, explicit metrics, baseline comparisons, and simulation or real-world evaluation to establish credibility. Blackbox follows this systems-paper style while focusing on incident evidence infrastructure rather than humanoid hazard response.
III. Problem Formulation
Let an incident be:
I = (r, s, t0, W, C, E)where `r` is the robot, `s` is the site, `t0` is the trigger time, `W = [t0 - pre, t0 + post]` is the evidence window, `C` is the incident class, and `E` is the set of raw evidence sources.
The output is a Blackbox Incident Packet:
BIP = { M, S, T, O, P, H, A, R }where:
- `M` is incident metadata.
- `S` is the stream manifest.
- `T` is the synchronized incident timeline.
- `O` is the command-ownership interval set.
- `P` is privacy and retention policy.
- `H` is hash, seal, and provenance metadata.
- `A` is annotations.
- `R` is stakeholder report output.
The reconstruction objective is to maximize review utility while satisfying bounded evidence, source traceability, privacy separation, and schema validity.
III-A. Evidence Coverage
For incident class `c`, let `Req(c)` be the required stream roles and `Obs(I)` be observed stream roles. Source coverage is:
C_source(I) = |Req(c) intersect Obs(I)| / |Req(c)|III-B. Timeline Consistency
For ordered timeline events `e_i`, monotonicity requires:
t(e_i) <= t(e_{i+1})Each event must include source references. We define timeline validity:
V_timeline = 1 - (N_time_violations + N_unreferenced_events) / N_eventsIII-C. Command-Ownership Coverage
Let `A = [t_a, t_b]` be the action-critical interval and `O` be the set of ownership intervals. Coverage is:
C_owner = duration(union(O) intersect A) / duration(A)III-D. Ownership Conflict
At time `t`, let `Active(t)` be the set of active command authorities. A conflict occurs when:
|Active(t)| > 1 and arbitration(Active(t), mode_t) = undefinedFor example, a remote joystick packet overlapping with an autonomous handoff routine after network recovery is a conflict unless an explicit arbitration rule yields control.
III-E. Incident Evidence Graph
We model packet reconstruction as an evidence graph:
G_I = (V, E_g)where vertices include raw streams, derived events, command intervals, annotations, redaction objects, and report claims:
V = V_stream union V_event union V_owner union V_policy union V_claimEdges encode support relationships. A report claim is admissible only when it is connected to at least one event and one stream reference:
admissible(claim_j) = exists e_i, s_k such that s_k -> e_i -> claim_jThis graph prevents a customer report from becoming a free-form narrative detached from packet evidence. In v0.3 the graph is implicit in JSON references; in production it should be materialized as an indexed provenance graph.
III-F. Reconstruction Objective
Given raw evidence `E`, the reconstruction process selects a packet `BIP*` that maximizes a weighted utility:
BIP* = argmax_BIP [
alpha C_source
+ beta V_timeline
+ gamma C_owner
+ delta Q_privacy
+ eta Q_integrity
- lambda C_exposure
]where `Q_privacy` rewards correct privacy separation, `Q_integrity` rewards valid hashes and schemas, and `C_exposure` penalizes unnecessary raw-data exposure. The weights are site-specific. For a clinic, `delta` and `lambda` should be high. For an internal warehouse engineering review, `alpha`, `beta`, and `gamma` dominate.
III-G. Packet Quality Score
We define a prototype packet quality score:
Q_packet = 0.25 C_source
+ 0.20 V_timeline
+ 0.20 C_owner
+ 0.15 Q_schema
+ 0.10 Q_privacy
+ 0.10 Q_reportThe score is not a safety score and not a legal confidence score. It is a review-readiness score. A low score means the packet should not be shared externally without manual review.
III-H. Clock Alignment
Robot incident reconstruction depends on clock quality. Let `t_s` be source-local time and `t_r` be packet-relative time. For each stream `s`, alignment uses:
t_r = t_s + offset_s + drift_s * (t_s - t_ref)In v0.3, seeded events are already aligned. A production recorder must estimate `offset_s` and `drift_s` from heartbeat messages, ROS2 header timestamps, MCAP message index time, NTP/PTP synchronization data, or post-hoc event anchors.
III-I. Evidence Confidence
Each claim receives an evidence confidence score:
Q_claim = w_s Q_source + w_t Q_time + w_o Q_owner + w_p Q_policywhere `Q_source` measures source availability, `Q_time` measures timestamp alignment, `Q_owner` measures command authority clarity, and `Q_policy` measures privacy/export eligibility. The packet should expose low-confidence claims rather than hiding them. This is especially important for insurance and customer-facing review, where unsupported certainty is worse than explicit uncertainty.
III-J. Missing Evidence Semantics
Missing evidence is represented explicitly:
missing(role, reason, effect)Examples include camera unavailable, planner trace not logged, operator session id redacted, safety event hash mismatch, or external CCTV not attached. The effect field describes how the missing source limits reconstruction. A packet with missing evidence can still be useful, but it must not pretend to be complete.
IV. System Architecture
IV-A. System Overview
Blackbox is a six-layer incident system.
L1 Recorder and Trigger
camera, depth, lidar, control, operator, safety, network
L2 Evidence Manifest
stream roles, hashes, retention, privacy, export scope
L3 Temporal Alignment
relative incident timeline, clock offsets, source references
L4 Command-Ownership Attribution
decision source, execution authority, constraint source, intervention source
L5 Replay and Report
engineering replay, operations summary, customer-safe report, insurer export
L6 Fleet Incident Graph
recurring incident types, site patterns, policy failures, design-partner metricsIV-B. Deployment Model
The intended deployment has three modes:
- Offline conversion. Existing logs, videos, and support records are converted into BIP files. This is the first design-partner workflow.
- Edge recorder. A robot-side or gateway-side recorder maintains a rolling buffer and creates packets when triggers fire.
- Fleet incident graph. Packets across sites are aggregated to expose recurring failure patterns.
IV-C. Reference Integration Platform
Blackbox is designed for mixed robot fleets rather than a single hardware platform. The reference integration assumes:
| Component | Reference configuration | Purpose |
|---|---|---|
| Robot runtime | ROS2 or ROS-compatible bridge | topic discovery, message capture, replay export |
| Log container | MCAP or ROS2 bag | timestamped stream preservation |
| Edge host | robot compute, fleet gateway, or nearby NUC-class recorder | rolling buffer and trigger capture |
| Video streams | RGB, depth, site CCTV attachment | visual reconstruction and redaction |
| Command streams | cmd_vel, joint command, planner branch, autonomy mode | command authority reconstruction |
| Human input | teleop joystick, operator pause, manual mark | shared-autonomy attribution |
| Safety stream | e-stop, bumper, zone gate, force threshold | safety-event timeline |
| Report store | packet object storage plus database index | stakeholder retrieval and audit |
The framework intentionally avoids depending on a particular robot morphology. It applies to AMRs, hotel delivery robots, sidewalk robots, mobile manipulators, inspection robots, and humanoid fleets when they produce timestamped operational traces.
IV-D. Stream Budget and Capture Policy
The recorder must bound storage. For each incident class `c`, a capture profile defines required roles, preferred roles, pre-window, post-window, and privacy default.
| Incident class | Pre-window | Post-window | Required roles | Privacy default |
|---|---|---|---|---|
| Near miss | 300 s | 30 s | video, depth/lidar, planner, command, operator, safety | redacted external |
| Low-speed contact | 180 s | 60 s | video, bumper/force, command, localization, site context | redacted external |
| Remote takeover | 180 s | 45 s | autonomy mode, operator input, command, network, safety | internal by default |
| Command conflict | 180 s | 45 s | command authority, teleop session, autonomy state, safety gate | internal by default |
| Task failure | 120 s | 120 s | mission state, operator notes, customer event, video optional | customer-safe summary |
In production, the capture profile should be configurable per customer site. A hospital, hotel, warehouse, and sidewalk deployment should not have the same export policy.
IV-D1. Storage Model
For a stream set `S` and evidence window `W`, the packet storage budget is:
Bytes(BIP) = sum_{s in S} bitrate_s * duration_s + metadata + indexesFor event-only streams such as safety events or operator commands, the cost is small. Video and depth streams dominate. A practical recorder therefore uses:
- continuous low-rate metadata capture
- bounded high-rate video/depth ring buffer
- event-triggered manifest sealing
- post-trigger downsampling for external report views
IV-D2. Retention Model
Retention is split into raw retention and metadata retention:
retention(packet) = { raw_days, redacted_days, metadata_days }For example, a clinic packet may retain raw video for 7 days, redacted video for 90 days, and metadata for 365 days. A warehouse packet may retain raw data longer if customer privacy risk is lower. This difference must be policy-driven rather than hardcoded.
IV-D3. Export Eligibility
External export is allowed only when:
export_ok = schema_pass
and seal_valid
and privacy_pass
and no_required_source_missing
and audience_scope_allowsThis condition is intentionally stricter than internal replay. Engineering teams may inspect incomplete packets; customers and insurers should not receive packets with unresolved privacy or integrity failures.
IV-E. Recorder and Trigger Layer (L1)
The recorder maintains a bounded ring buffer for allowlisted streams. It is triggered by:
- safety stop
- bumper event
- human-proximity threshold
- remote takeover
- command conflict
- mission failure
- customer escalation
- manual operator mark
Algorithm 1 shows the trigger and seal process.
Algorithm 1: Triggered Packet Seal
Input: stream buffers B, trigger rule g, incident class c, window W
Output: packet index BIP
1. continuously append allowlisted streams to B
2. if g(B_t) fires at time t0:
3. freeze interval [t0 - pre(c), t0 + post(c)]
4. enumerate stream roles and source availability
5. compute hashes for immutable references
6. attach trigger metadata and site context
7. emit packet draft
8. run schema and policy checks
9. seal packet if checks passIV-F. Evidence Manifest Layer (L2)
The manifest is the packet's evidence index. It records modality, source role, duration, sample rate, storage reference, hash, retention policy, privacy class, and export scope. The manifest allows a reviewer to know which evidence exists without exposing raw data to every audience.
IV-G. Temporal Alignment Layer (L3)
Blackbox maps all evidence to relative incident time, where `t = 0` is the trigger. The timeline is not a raw dump of every event. It is a structured reconstruction with source references and confidence levels.
Algorithm 2 gives temporal alignment.
Algorithm 2: Timeline Alignment
Input: stream manifest S, raw event candidates E, trigger time t0
Output: aligned timeline T
1. estimate clock offset and drift for each source stream
2. transform source timestamps into packet-relative time
3. group event candidates within merge window epsilon
4. attach source references and confidence values
5. remove duplicate low-confidence events
6. sort timeline by relative time
7. flag gaps, missing references, and time reversals
8. emit T with quality metricsIV-H. Command-Ownership Attribution Layer (L4)
COA represents ownership as intervals:
o_i = (t_start, t_end, d_i, x_i, c_i, h_i, q_i)where:
- `d_i` is decision source.
- `x_i` is execution authority.
- `c_i` is constraint source.
- `h_i` is human intervention source.
- `q_i` is evidence quality.
Algorithm 3 describes interval construction.
Algorithm 3: Command-Ownership Attribution
Input: autonomy state A, controller commands U, operator events H, safety events S
Output: ownership interval set O
1. collect all authority transition candidates from A, U, H, S
2. sort candidates by relative time
3. initialize current owner from autonomy mode
4. for each candidate k:
5. close previous interval at t_k
6. assign decision source, execution authority, and constraint source
7. attach evidence references
8. mark conflict if two active authorities lack arbitration
9. merge adjacent intervals with identical authority tuple
10. emit O with coverage and confidence metricsIV-I. Replay and Report Layer (L5)
The replay surface has audience-specific views:
- Engineering view: raw stream refs, event sources, command buffers, schema validation.
- Operations view: incident class, intervention, downtime, next action.
- Customer view: redacted timeline, explanation, corrective action.
- Insurance or safety view: source integrity, timeline, ownership intervals, report provenance.
Report generation follows Algorithm 4.
Algorithm 4: Redacted Report Generation
Input: sealed packet BIP, audience a, policy P
Output: report R_a
1. select claims whose policy scope includes audience a
2. replace raw stream references with redacted summaries when required
3. include missing-source and confidence notes
4. include timeline, command ownership, and corrective action
5. attach packet id, seal status, and report generation time
6. block export if privacy or integrity checks failIV-J. Fleet Incident Graph Layer (L6)
The graph aggregates packets across a fleet. Nodes include robots, sites, incident classes, stream roles, ownership conflicts, trigger policies, privacy classes, and corrective actions. The goal is not only to replay one incident, but to detect repeated patterns across deployments.
IV-K. Tool Categories
Similar to a robotics tool orchestration system, Blackbox can be decomposed into incident tools.
| Category | Count in reference design | Example tools |
|---|---|---|
| Capture | 7 | record_video, record_mcap, record_cmd, record_operator, record_safety, attach_cctv, manual_mark |
| Parse | 6 | parse_ros2, parse_mcap, parse_teleop, parse_safety, parse_network, parse_ticket |
| Align | 4 | estimate_offset, merge_events, check_monotonicity, detect_gap |
| Attribute | 5 | infer_owner, detect_conflict, resolve_arbitration, score_confidence, merge_interval |
| Privacy | 5 | mask_face, mask_badge, mask_screen, restrict_export, retention_check |
| Report | 5 | engineering_report, customer_report, insurer_report, corrective_action, provenance_export |
| Validate | 5 | schema_check, hash_check, policy_check, completeness_check, seal_packet |
This decomposition makes the product build path concrete. The first implementation does not need every tool, but the paper defines the target system boundary.
V. Incident Reconstruction Methods
V-A. Scenario 1: Human-Proximity Near Miss
V-A1. Detection Phase
The trigger fires when a human-proximity alert, route deviation, operator takeover, or safety stop occurs within a configured time window. Required evidence roles include front RGB, depth or lidar, planner trace, command output, operator input, and safety event.
Near-miss severity uses distance, time-to-contact, robot speed, intervention latency, and human track confidence:
S_near = w_d exp(-d_min / sigma_d)
+ w_t exp(-ttc_min / sigma_t)
+ w_v min(v_robot / v_ref, 1)
+ w_o operator_delay
+ w_h human_confidencewhere `d_min` is closest approach, `ttc_min` is minimum predicted time-to-contact, and `operator_delay` is normalized delay between proximity alert and intervention. This is not a legal risk score. It is a packet triage score used to determine review priority.
The trigger predicate is:
trigger_near = (d_min < d_warn)
or (ttc_min < tau_warn)
or takeover_after_proximity
or safety_stop_after_proximityV-A2. Attribution Phase
The near-miss reconstruction separates environment, perception, planning, operator, and safety layers. In the public Dock Aisle Near-Miss packet, autonomy owns the interval from -300 s to -2.2 s, the remote operator owns the pause interval from -2.2 s to -0.26 s, and the safety system owns the final stop interval from -0.26 s to 0 s.
The reviewer is expected to answer four questions:
- Did the robot perceive the human or obstruction?
- Did planning select a legal but uncomfortable route?
- Did a human operator intervene before or after safety escalation?
- Did the safety layer constrain motion as expected?
The packet should support each answer with a stream reference rather than narrative memory.
V-A3. Report Phase
The customer report excludes raw unredacted video. It summarizes the trigger, closest approach, intervention, and corrective action while preserving the internal evidence chain.
For a near-miss report, the recommended external fields are:
| Field | External report | Internal replay |
|---|---|---|
| Incident class | yes | yes |
| Closest approach | yes, rounded | exact |
| Raw RGB video | no | yes, restricted |
| Operator session id | no | yes |
| Planner branches | summary | full trace |
| Corrective action | yes | yes |
V-B. Scenario 2: Low-Speed Contact
V-B1. Contact Detection
A low-speed contact packet is triggered by bumper trace, force threshold, velocity discontinuity, or operator mark. Required sources include robot camera, bumper or force trace, command stream, localization, and optional external CCTV.
The contact trigger combines physical and semantic signals:
trigger_contact =
bumper_event
or (force_z > F_limit)
or (abs(v_cmd - v_odom) > tau_v and obstacle_range < d_contact)
or manual_contact_markThe packet distinguishes contact from near miss by requiring an impact, bumper, force, or post-event operator/customer mark.
V-B2. Evidence Alignment
The method aligns robot-side streams with site-side evidence. For customer-facing environments such as hotels, the report must separate robot perception failure from temporary site obstruction or third-party interference.
Contact attribution uses four candidate contributors:
contributors = {perception, planning_margin, site_state, operator_action}Each contributor receives a technical contribution label: `primary`, `secondary`, `observed`, or `not_supported`. The label must include source references. If evidence is missing, the packet should say "not determined" instead of inventing a cause.
V-B3. Root-Cause Draft
The report does not assign legal fault. It identifies technical contributors: perception reclassification, route clearance, site obstruction, operator absence, or safety trigger timing.
The root-cause draft is generated only after manifest, timeline, and ownership checks pass. If checks fail, the report is downgraded to evidence summary.
V-C. Scenario 3: Command Authority Conflict
V-C1. Conflict Detection
Command conflict occurs when autonomy and teleoperation are simultaneously active without a valid arbitration rule. The clinic handoff abort scenario models network recovery during an autonomous handoff routine.
Let `u_a(t)` be autonomy command activity, `u_h(t)` be human operator command activity, and `g(t)` be the arbitration state. Conflict is:
conflict(t) = u_a(t) and u_h(t) and g(t) = undefinedThe trigger condition is:
trigger_conflict = exists t in W such that conflict(t) = trueV-C2. Ownership Lock
When conflict is detected, the packet records the overlapping authorities, session IDs, safety gate action, and blocked motion. This helps distinguish an autonomy failure from a remote-assist concurrency failure.
The ownership-lock protocol is:
- freeze motion if active control authority is ambiguous
- record autonomy mode and teleop session id
- attach latest command packet from each authority
- record arbitration table state
- mark safety gate decision
- emit conflict interval
V-C3. Privacy Response
For clinical or home environments, the packet must apply privacy masks and restrict raw exports by default.
The clinic scenario requires `privacy_default = restricted`. Any external report must remove patient identifiers, operator identifiers, screen content, and raw room audio unless explicitly approved.
V-D. Redaction and Privacy Policy
Algorithm 5 shows the privacy gate applied before any external export.
Algorithm 5: Privacy Gate
Input: sealed packet BIP, audience a, policy P
Output: allow_export or block_export
1. check packet seal and schema status
2. enumerate streams referenced by report claims
3. for each stream s:
4. check privacy class, retention policy, and audience scope
5. require redaction proof for sensitive visual/audio streams
6. block export if any required proof is missing
7. emit allow_export only when all checks passV-E. Packet Integrity
The prototype uses JSON schema validation and demonstration hashes. A production implementation should use signed manifests, append-only audit trails, key management, and export provenance.
VI. Prototype Implementation
The v0.2 project includes:
- Next.js website
- public technical paper route
- public whitepaper route
- seeded incident replay console
- three incident scenarios
- public Dock Aisle Near-Miss packet
- JSON schemas for packet, stream manifest, timeline, and command ownership
- validation script
- redacted customer report sample
- internal pitch deck kept outside the public website
VI-A. Public Packet Artifact
The public packet is located at:
public/samples/dock-17/
packet.json
streams-manifest.json
timeline.json
command-ownership.jsonThe public report is located at:
public/reports/dock-17-customer-report.mdVI-B. Tool Categories
Table I summarizes the incident stack.
| Category | Count in v0.2 | Examples |
|---|---|---|
| Evidence streams | 6 | front_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events |
| Timeline events | 6 | route deviation, confidence decay, merge command, worker entry, takeover, safety seal |
| Ownership intervals | 3 | autonomy, remote_operator, safety_system |
| Packet files | 4 | packet, manifest, timeline, command ownership |
| Public reports | 1 | redacted customer report |
VI-C. Packet File Schema
The prototype stores one incident across four JSON files.
| File | Primary objects | Validation role |
|---|---|---|
| packet.json | incident id, robot, site, trigger, summary, privacy, seal state | packet-level completeness |
| streams-manifest.json | stream list, modality, sample rate, storage ref, hash, export scope | evidence availability |
| timeline.json | relative time events, actors, detail, source refs | reconstruction order |
| command-ownership.json | intervals, owner, control surface, transition reason, evidence refs | authority attribution |
The schema intentionally separates stream existence from timeline claims. A stream can be present without being used in a report, and a report claim is not allowed unless it points back to evidence.
VI-D. Topic Role Mapping
Robot fleets use different topic names. Blackbox maps fleet-specific topics into stable incident roles.
| Incident role | Example ROS2/robot source | Packet role |
|---|---|---|
| front_rgb | /camera/front/color/image_raw | visual scene |
| depth_cloud | /camera/depth/points or /scan | geometry and proximity |
| cmd_vel | /cmd_vel or base command | executed base command |
| planner_trace | /planner/debug, branch log, route plan | decision support |
| operator_input | teleop joystick or remote pause event | human intervention |
| autonomy_mode | /mode, mission state, behavior tree state | decision source |
| safety_events | e-stop, bumper, zone gate, force threshold | constraint source |
| network_state | teleop connection, latency, reconnect | command conflict context |
The mapping file is a deployment artifact. Without it, teams can record data but cannot reliably compare incident packets across robots.
VI-E. Packet Lifecycle State Machine
The packet lifecycle is:
buffering -> triggered -> assembling -> validating -> sealed -> reported -> archivedState transitions have explicit failure paths:
| State | Failure condition | Result |
|---|---|---|
| triggered | missing pre-window | packet marked partial |
| assembling | stream hash unavailable | packet marked unsealed |
| validating | schema failure | report export blocked |
| sealed | hash mismatch after seal | packet invalidated |
| reported | privacy check failure | customer export blocked |
| archived | retention expired | raw data removed, metadata retained |
VI-F. Report Views
Blackbox produces audience-specific views from the same packet.
| View | Audience | Included | Excluded by default |
|---|---|---|---|
| Engineering replay | robotics engineering | raw refs, timeline, command intervals, validation errors | customer-only notes |
| Operations summary | fleet ops | trigger, downtime, operator action, next step | model internals unless needed |
| Customer report | customer / site owner | redacted timeline, corrective action, evidence summary | raw video, operator identity |
| Insurer/safety export | insurer or safety reviewer | timeline, evidence manifest, integrity status | unnecessary private footage |
This view separation is central to the product. The same incident must be technically useful internally and externally safe to share.
VII. Experiments
VII-A. Experimental Setup
We evaluate the prototype on a seeded incident corpus, not real customer logs. The corpus contains three deployment archetypes:
- Dock Aisle Near-Miss: warehouse AMR and worker proximity.
- Service Cart Contact: hotel delivery robot and temporary obstruction.
- Clinic Handoff Abort: mobile manipulator and command authority conflict.
Only the Dock Aisle Near-Miss currently has a complete public packet. The other two scenarios are represented in the replay console and define next conversion targets.
The evaluation environment is the local project workspace. It includes a Next.js replay surface, JSON packet artifacts, schemas, and a Node.js validation script. Because this is not a production recorder, we do not evaluate camera encoding throughput, ROS2 subscription backpressure, or signed-manifest latency. Instead, the experiment evaluates whether the packet artifact can represent and validate a complete incident reconstruction.
VII-A1. Seeded Corpus Construction
Each seeded incident is constructed from five elements:
- incident narrative
- evidence source list
- timeline events
- responsibility or ownership trace
- stakeholder report target
The corpus is intentionally scenario-diverse. The near miss stresses human proximity and operator takeover. The contact event stresses external site evidence and property-damage explanation. The clinic abort stresses command conflict and privacy constraints.
VII-A2. Required Source Roles
| Scenario | Required source roles |
|---|---|
| Dock Aisle Near-Miss | front_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events |
| Service Cart Contact | robot_camera, bumper_trace, cmd_vel, localization, site_cctv |
| Clinic Handoff Abort | arm_joints, command_owner, teleop_packets, privacy_masks, safety_gate |
VII-A3. Evaluation Boundaries
The experiment does not claim that Blackbox detects incidents in the wild. It evaluates post-incident packet reconstruction. The current prototype assumes that seed events and streams exist. Production work must add live triggers, robust buffering, and signed sealing.
VII-B. Evaluation Metrics
We use six metrics.
| Metric | Definition | Purpose |
|---|---|---|
| Source coverage | Required evidence roles present in manifest | Measures whether review has enough streams |
| Timeline consistency | Monotonic events with source references | Measures reconstruction coherence |
| Ownership coverage | Action window covered by ownership intervals | Measures command authority traceability |
| Schema validity | JSON schema validator result | Measures machine-checkable packet structure |
| Report readiness | Whether stakeholder report can be generated | Measures external communication readiness |
| Privacy separation | Raw and redacted evidence access separated | Measures shareability risk |
VII-C. Public Packet Results
| Artifact | Value |
|---|---|
| Evidence window | 330 s |
| Packet files | 4 |
| Evidence streams | 6 |
| Timeline events | 6 |
| Ownership intervals | 3 |
| Redacted customer reports | 1 |
| Validator result | pass |
| Validator wall time | 2.87 s |
The public packet reaches full source coverage for its declared required roles:
C_source = 6 / 6 = 1.00The timeline contains six events and no timestamp reversals in the seeded artifact:
V_timeline = 1.00The command-ownership intervals cover the full action window from -300 s to 0 s:
C_owner = 1.00Using the prototype quality score with all declared checks passing:
Q_packet = 1.00This value should be read carefully. It means the seeded packet is internally complete under its declared schema. It does not mean that a real deployment packet would always score 1.00.
VII-D. Scenario Coverage
| Scenario | Required source roles | Present in v0.2 | Timeline status | Ownership status | Report status |
|---|---|---|---|---|---|
| Dock Aisle Near-Miss | 6 | 6 | complete | 3 intervals | customer-safe report |
| Service Cart Contact | 5 | 5 in console | seeded | planned packet | engineering draft |
| Clinic Handoff Abort | 5 | 5 in console | seeded | planned conflict intervals | privacy-focused draft |
VII-E. Baselines
We compare against three practical baselines.
| Baseline | Description | Limitation |
|---|---|---|
| Manual log review | Engineer searches logs, video, tickets, and dashboards | Slow, non-repeatable, hard to share |
| Observability-only replay | Visualize ROS/MCAP streams | Strong debug tool, weak stakeholder packet |
| Support-ticket narrative | Operator writes incident summary | No machine-checkable evidence chain |
VII-F. Baseline Comparison
| Review task | Manual logs | Observability replay | Blackbox packet |
|---|---|---|---|
| Identify event window | operator-dependent | possible but manual | explicit trigger window |
| Locate sources | scattered | stream browser | manifest |
| Reconstruct sequence | manual | visual playback | timeline with references |
| Attribute authority | inferred | partial | ownership intervals |
| Share externally | hand-written | rarely safe | redacted report |
| Validate completeness | ad hoc | ad hoc | schema validation |
VII-G. Ablation Study
We evaluate the packet concept by removing major components.
| Configuration | Source coverage | Timeline validity | Ownership coverage | External reportability | Expected failure mode |
|---|---|---|---|---|---|
| Full packet | 1.00 | 1.00 | 1.00 | pass | reviewable packet |
| Without manifest | undefined | 1.00 | 1.00 | blocked | reviewer cannot tell which sources exist or are missing |
| Without timeline | 1.00 | undefined | 1.00 | weak | raw streams remain hard to explain to non-engineering stakeholders |
| Without ownership intervals | 1.00 | 1.00 | undefined | weak | remote takeover and safety gating collapse into vague autonomy state |
| Without privacy policy | 1.00 | 1.00 | 1.00 | blocked | customer report risks exposing raw video or operator data |
| Without schema validation | unknown | unknown | unknown | blocked | packet completeness becomes manual and inconsistent |
VII-H. Response Time Decomposition
The prototype benchmark separates reconstruction time into conceptual phases. Only schema validation is measured directly in v0.3; the other phases are design targets for the production recorder.
| Phase | v0.3 status | Measurement or target |
|---|---|---|
| Trigger detection | design target | < 250 ms after trigger event |
| Buffer freeze | design target | < 1 s for bounded window index |
| Manifest assembly | prototype artifact | manual/seeded |
| Timeline alignment | prototype artifact | manual/seeded |
| Ownership attribution | prototype artifact | manual/seeded |
| Schema validation | measured | 2.87 s |
| Customer report generation | prototype artifact | manual/seeded |
The goal for a production v1 system is not real-time root-cause analysis. It is reliable packet sealing soon enough that evidence is not lost and review can begin quickly.
VII-I. Command-Ownership Case Study
The Dock Aisle Near-Miss packet contains three intervals:
| Interval | Owner | Control surface | Evidence |
|---|---|---|---|
| -300.000 s to -2.200 s | autonomy | planner plus mobile base controller | planner_trace, cmd_vel |
| -2.200 s to -0.260 s | remote_operator | teleop_pause | operator_input, cmd_vel |
| -0.260 s to 0.000 s | safety_system | supervised_safety_stop | safety_events, cmd_vel |
This decomposition prevents a misleading single-cause narrative. The event involves site obstruction, perception confidence decay, planning margin, operator intervention, and safety stop confirmation.
VII-J. Computational Performance
The current prototype measures packet validation, not full stream ingestion. Running:
/usr/bin/time -p npm run validate:packetproduced:
real 2.87
user 0.95
sys 0.35This result is not a production latency benchmark. It only verifies that the current packet schema and validator operate within a practical development workflow.
VII-K. Comparison with Baseline Systems
| System | Evidence boundary | Ownership attribution | Privacy-aware export | Machine validation | Review repeatability |
|---|---|---|---|---|---|
| Support ticket only | no | no | manual | no | low |
| Raw ROS/MCAP logs | partial | manual | no | partial | medium |
| Fleet dashboard | event state | partial | no | no | medium |
| Observability replay | streams | manual | no | partial | medium-high |
| Blackbox packet | yes | yes | yes | yes | high |
This comparison is functional rather than commercial. Blackbox should integrate with observability replay; the comparison shows why replay alone is not an incident system of record.
VII-L. Real-World Validation Plan
The next evaluation requires design-partner incidents. The study should collect three incidents per partner:
- one near miss or safety stop
- one remote takeover or command conflict
- one customer-facing task failure or low-speed contact
For each incident, reviewers compare the existing workflow against the Blackbox packet. Metrics should include:
- time-to-understanding
- reviewer agreement
- missing-source rate
- report approval rate
- customer-shareability rating
- corrective-action clarity
VII-M. Proposed Reviewer Study
The first human-subject-free reviewer study can be conducted internally with robotics engineers and operations reviewers. Each reviewer receives the same seeded incident in two formats: baseline logs/report fragments and Blackbox packet. The reviewer answers:
- What happened?
- Which system layer most needs engineering review?
- Who or what controlled the robot during the critical interval?
- Which evidence source supports the claim?
- Could this summary be shared with a customer?
The primary metric is time-to-understanding. Secondary metrics are answer agreement, unsupported-claim count, and confidence calibration.
| Study variable | Baseline condition | Blackbox condition |
|---|---|---|
| Evidence | scattered logs and notes | packet manifest and timeline |
| Ownership | inferred manually | interval table |
| Privacy | reviewer judgment | explicit export policy |
| Output | free-form answer | structured report |
VII-N. Production Recorder Benchmark Targets
The production recorder should be evaluated separately from the current packet prototype.
| Benchmark | Target |
|---|---|
| Recorder CPU overhead | < 10% on edge host |
| Dropped command messages | 0 during incident window |
| Video frame retention | > 99% in bounded window |
| Packet trigger-to-freeze time | < 1 s |
| Manifest creation time | < 5 s for 5 min window |
| Customer report generation | < 60 s after packet seal |
| Hash verification | pass before export |
These targets are design constraints, not achieved v0.3 results.
VIII. Discussion
VIII-A. Why Packetization Matters
Logs are necessary but not sufficient. A packet makes the incident a shared object. Engineering, operations, customer success, insurance, and safety teams can discuss the same evidence boundary instead of building separate narratives.
VIII-B. Why Command Ownership Matters
Autonomy state is too coarse for modern robot fleets. Many incidents occur under shared autonomy, remote assist, cloud orchestration, and safety gating. COA exposes authority transitions that would otherwise be hidden.
VIII-C. Privacy as a Reconstruction Constraint
In homes, clinics, hotels, offices, and public spaces, raw replay is often not shareable. Redaction, retention, and access scope must be packet properties, not post-hoc edits.
VIII-D. Relationship to Existing Robotics Tools
Blackbox should integrate with ROS2, MCAP, Foxglove-style visualization, fleet operations dashboards, and support systems. The product wedge is not generic observability. It is incident evidence assembly, attribution, and reportability.
IX. Limitations and Failure Analysis
IX-A. Current Limitations
- The current system is a prototype.
- The seeded incidents are realistic but fictional.
- Only one public packet is complete.
- No real customer incident corpus has been evaluated.
- Hashes and sealing are demonstration-grade.
- Video redaction is represented as metadata and report policy, not production video processing.
- The system does not assign legal fault.
- The system does not certify safety compliance.
- The system does not prevent incidents.
IX-B. Failure Modes
| Failure mode | Impact | Mitigation |
|---|---|---|
| Missing stream | Timeline cannot support a claim | explicit missing-source flag |
| Clock drift | Events appear out of order | offset estimation and confidence scoring |
| Ambiguous command authority | Ownership interval cannot be resolved | conflict marker and arbitration table |
| Privacy mask failure | Report cannot be shared externally | raw export block and manual review |
| Hash mismatch | Packet integrity compromised | seal invalidation and audit trail |
| Overconfident root-cause draft | Reviewer may treat technical contribution as legal fault | explicit non-fault language |
IX-C. Deployment Considerations
A production system must solve:
- edge recorder reliability
- low-overhead stream selection
- MCAP/ROS2 ingest
- signed packet manifests
- key management
- retention policy enforcement
- customer-specific privacy rules
- reviewer audit trails
- fleet-scale storage cost
X. Future Work
Future work includes:
- production ROS2 and MCAP ingest
- edge recorder with rolling buffer
- signed manifests and append-only audit logs
- video and image redaction service
- command arbitration table for teleoperation systems
- Foxglove-compatible export
- insurer-facing packet view
- fleet incident graph
- design-partner user study
- time-to-understanding benchmark
- reviewer agreement benchmark
XI. Conclusion
Blackbox Robotics proposes incident packetization, replay, and command-ownership attribution for deployed robot fleets. The central claim is that robot incidents should produce structured, schema-valid evidence packets rather than scattered logs and informal narratives. By defining BIP and COA, the system makes incident reconstruction more explicit: what happened, when it happened, which evidence supports it, who controlled the robot, what privacy constraints apply, and what can be shared externally.
The current prototype demonstrates the artifact and review workflow through a public Dock Aisle Near-Miss packet and two additional seeded scenarios. The next proof is field validation with design partners. If validated, Blackbox can become a practical evidence layer between robot operations, engineering debugging, customer trust, insurance review, and safety governance.
References
[1] International Federation of Robotics, "Executive Summary World Robotics 2025 - Service Robots," 2025. https://ifr.org/img/worldrobotics/Executive_Summary_WR_2025_Service_Robots.pdf
[2] ISO, "ISO 10218-1:2025 Robotics - Safety requirements - Part 1: Industrial robots," 2025. https://www.iso.org/standard/73933.html
[3] ROS 2 Documentation, "Recording a bag from a node." https://docs.ros.org/en/rolling/Tutorials/Advanced/Recording-A-Bag-From-Your-Own-Node-CPP.html
[4] MCAP, "Open source container file format for multimodal log data." https://mcap.dev/
[5] NHTSA, "Event Data Recorder." https://www.nhtsa.gov/research-data/event-data-recorder
[6] A. Butler, S. Izadi, and M. Cakmak, "The Privacy-Utility Tradeoff for Remotely Teleoperated Robots," ACM/IEEE International Conference on Human-Robot Interaction, 2015. https://hcrlab.cs.washington.edu/publications/butler2015hri/
[7] "Identifying human-robot interaction incident archetypes: a system and network analysis of accidents," Safety Science, Volume 191, 2025. https://www.sciencedirect.com/science/article/pii/S0925753525001845
[8] Foxglove, robotics observability and visualization. https://foxglove.dev/
[9] Formant, fleet observability. https://docs.formant.io/docs/fleet-observability
[10] InOrbit, robot operations platform. https://www.inorbit.ai/
[11] T. N. Canh, T. T. Viet, T. T. Tran, and B. W. Lim, "SafeGuard ASF: SR Agentic Humanoid Robot System for Autonomous Industrial Safety," arXiv:2603.25353, 2026. https://arxiv.org/html/2603.25353v1