Technical Paper v0.3

Blackbox Robotics: Incident Packetization, Replay, and Command-Ownership Attribution for Deployed Robot Fleets

Blackbox Robotics Research TeamIndependent projectJune 2026

systems-paper structureexplicit limitationspublic BIP v0.3

Figure 1

Six-layer incident architecture

L1Recorder and Trigger

L2Evidence Manifest

L3Temporal Alignment

L4Command Ownership

L5Replay and Report

L6Fleet Incident Graph

Sensor, command, operator, and safety traces are converted into a bounded packet, then aligned into a timeline, attributed across command authorities, and rendered into audience-specific reports.

Abstract

Deployed robot fleets increasingly operate in warehouses, hotels, clinics, offices, sidewalks, factories, and other environments where autonomy, remote operators, safety controllers, site policy, and bystanders interact in the same physical event. When an incident occurs, the evidence required to understand it is usually fragmented across camera files, ROS2 bags, MCAP logs, fleet dashboards, teleoperation sessions, safety controller traces, support tickets, operator notes, and customer reports. This fragmentation creates a reconstruction gap: robot teams may possess raw data, but lack a standard evidence object that explains what happened, who controlled the robot, which sources support each claim, and what can be shared externally.

We present Blackbox Robotics, an incident packetization and replay framework for deployed robot fleets. The framework defines the Blackbox Incident Packet (BIP), a bounded, schema-validated evidence object containing incident metadata, stream manifests, timeline events, command-ownership intervals, privacy policy, hash metadata, and stakeholder reports. We also introduce Command-Ownership Attribution (COA), an interval model that separates decision source, execution authority, constraint source, and intervention source during robot incidents. The system architecture is organized as a six-layer incident stack: recorder and trigger, evidence manifest, temporal alignment, ownership attribution, replay and report, and fleet incident graph.

We evaluate the v0.3 prototype using a seeded incident corpus covering three deployment archetypes: a warehouse near miss, a hotel service-cart contact event, and a clinic handoff abort caused by command authority conflict. The public Dock Aisle Near-Miss artifact contains 4 packet files, 6 evidence streams, 6 timeline events, 3 command-ownership intervals, and a 330 s evidence window. The current JSON validator passes the packet schema in 2.87 s on a developer workstation. We report source coverage, timeline consistency, ownership coverage, schema validity, baseline comparison, ablation analysis, response-time decomposition, packet-quality scoring, and deployment failure modes. The system is a prototype and does not claim legal fault assignment, safety certification, real customer deployment, or incident prevention. Its purpose is to make robot incident reconstruction testable, reviewable, and shareable.

Keywords: robot incident analysis, robot fleet operations, ROS2, MCAP, teleoperation, robot observability, command ownership, evidence packet, post-incident replay, safety review, service robots.

I. Introduction

Robot fleets are moving from controlled demonstrations into continuous operations. Professional service robots now work in logistics, cleaning, inspection, hospitality, retail, healthcare support, and delivery. The International Federation of Robotics reported 9% growth in professional service robot sales in 2024, more than 199,000 professional service robots in its sample, and 31% growth in Robotics-as-a-Service fleet size. As the deployed base grows, robot incidents become operational events rather than isolated engineering bugs.

A single robot incident can involve multiple layers:

perception confidence changes
route replanning
local controller commands
remote operator takeover
network delay or recovery
safety controller gating
site-policy violations
customer-facing impact
privacy-sensitive video evidence
post-event support escalation

In current workflows, these traces are usually examined through separate systems. Engineers inspect ROS bags or MCAP files. Operations teams review fleet dashboards. Remote-assist teams inspect operator session logs. Safety teams ask for event details. Customer-success teams write a report. Insurance or compliance reviewers may request evidence after the fact. The result is slow and inconsistent reconstruction.

This paper argues that deployed robot fleets need an incident system of record. The core artifact should not be a folder of logs or a dashboard screenshot. It should be a bounded packet that preserves evidence, reconstructs the event timeline, attributes command authority, and produces audience-specific reports.

We make four contributions.

Blackbox Incident Packet (BIP). We define a versioned packet abstraction for robot incidents, including metadata, stream manifest, timeline, command ownership, privacy metadata, hashes, retention policy, and report views.

Command-Ownership Attribution (COA). We introduce an interval model for attributing decision source, execution authority, constraint source, and intervention source during incidents.

Six-layer incident architecture. We describe a recorder-to-report stack for robot fleets that integrates with existing ROS2, MCAP, teleoperation, safety, and fleet operations systems.

Seeded evaluation protocol. We evaluate the prototype on three incident archetypes and one public packet, reporting schema validity, source coverage, timeline consistency, ownership coverage, baseline comparison, ablation, and failure modes.

II-A. Event Data Recorders and Incident Reconstruction

Vehicle event data recorders provide a useful analogy for bounded evidence capture. NHTSA describes event data recorders as systems that record technical information for a short period before, during, and after a crash. They are not full surveillance systems; they are event-triggered reconstruction artifacts. Robot fleets need a similar concept, but with additional complexity: robots may combine autonomous decision-making, teleoperation, cloud orchestration, safety gating, and human-robot interaction.

Blackbox borrows the bounded-window principle while extending the evidence model to multimodal robotics logs and command authority transitions.

II-B. Robot Data Formats and Observability

ROS2 bag tooling supports recording and playback of topic data. MCAP provides an open container format for timestamped multimodal log data. Robot observability tools such as Foxglove, Formant, and InOrbit help teams visualize logs, inspect telemetry, manage fleets, and debug behavior.

Blackbox is not a replacement for these tools. It is a packetization layer above them. A BIP can reference ROS2 bags, MCAP files, video streams, safety events, teleoperation logs, and external attachments while exposing incident-level semantics: trigger, timeline, ownership, evidence quality, privacy status, and report views.

II-C. Human-Robot Incident Analysis

Human-robot interaction safety research increasingly treats incidents as systemic events rather than single-component failures. Accident analysis often involves human behavior, environment changes, robot policy, interaction timing, and system design. Recent work on HRI incident archetypes reinforces the need to preserve both machine traces and human-facing context.

Blackbox operationalizes this by making timeline events source-referenced and by keeping customer-facing impact separate from internal debug detail.

II-D. Teleoperation, Shared Autonomy, and Privacy

Remote operation changes the reconstruction problem. In supervised autonomy, an event may include autonomy, remote operator input, stale command packets, network recovery, and safety overrides. Privacy research in teleoperated robots also shows that remote presence in human environments creates sensitive data exposure risks.

Blackbox treats teleoperation and privacy as first-class packet properties. Operator commands are not buried in unstructured logs, and raw video is not assumed to be externally shareable.

II-E. Safety Standards and Compliance Boundaries

Safety standards such as ISO 10218-1:2025 provide requirements for industrial robot safety. Blackbox does not replace certified safety systems or legal review. Its role is post-event technical reconstruction. The packet can support safety review by preserving evidence, but it does not itself certify a robot or assign legal fault.

II-F. Agentic Robotics and System Papers

Recent robotics system papers, including SafeGuard ASF, use a layered architecture, scenario-driven methods, explicit metrics, baseline comparisons, and simulation or real-world evaluation to establish credibility. Blackbox follows this systems-paper style while focusing on incident evidence infrastructure rather than humanoid hazard response.

III. Problem Formulation

Let an incident be:

I = (r, s, t0, W, C, E)

where `r` is the robot, `s` is the site, `t0` is the trigger time, `W = [t0 - pre, t0 + post]` is the evidence window, `C` is the incident class, and `E` is the set of raw evidence sources.

The output is a Blackbox Incident Packet:

BIP = { M, S, T, O, P, H, A, R }

where:

`M` is incident metadata.
`S` is the stream manifest.
`T` is the synchronized incident timeline.
`O` is the command-ownership interval set.
`P` is privacy and retention policy.
`H` is hash, seal, and provenance metadata.
`A` is annotations.
`R` is stakeholder report output.

The reconstruction objective is to maximize review utility while satisfying bounded evidence, source traceability, privacy separation, and schema validity.

III-A. Evidence Coverage

For incident class `c`, let `Req(c)` be the required stream roles and `Obs(I)` be observed stream roles. Source coverage is:

C_source(I) = |Req(c) intersect Obs(I)| / |Req(c)|

III-B. Timeline Consistency

For ordered timeline events `e_i`, monotonicity requires:

t(e_i) <= t(e_{i+1})

Each event must include source references. We define timeline validity:

V_timeline = 1 - (N_time_violations + N_unreferenced_events) / N_events

III-C. Command-Ownership Coverage

Let `A = [t_a, t_b]` be the action-critical interval and `O` be the set of ownership intervals. Coverage is:

C_owner = duration(union(O) intersect A) / duration(A)

III-D. Ownership Conflict

At time `t`, let `Active(t)` be the set of active command authorities. A conflict occurs when:

|Active(t)| > 1 and arbitration(Active(t), mode_t) = undefined

For example, a remote joystick packet overlapping with an autonomous handoff routine after network recovery is a conflict unless an explicit arbitration rule yields control.

III-E. Incident Evidence Graph

We model packet reconstruction as an evidence graph:

G_I = (V, E_g)

where vertices include raw streams, derived events, command intervals, annotations, redaction objects, and report claims:

V = V_stream union V_event union V_owner union V_policy union V_claim

Edges encode support relationships. A report claim is admissible only when it is connected to at least one event and one stream reference:

admissible(claim_j) = exists e_i, s_k such that s_k -> e_i -> claim_j

This graph prevents a customer report from becoming a free-form narrative detached from packet evidence. In v0.3 the graph is implicit in JSON references; in production it should be materialized as an indexed provenance graph.

III-F. Reconstruction Objective

Given raw evidence `E`, the reconstruction process selects a packet `BIP*` that maximizes a weighted utility:

BIP* = argmax_BIP [
  alpha C_source
  + beta V_timeline
  + gamma C_owner
  + delta Q_privacy
  + eta Q_integrity
  - lambda C_exposure
]

where `Q_privacy` rewards correct privacy separation, `Q_integrity` rewards valid hashes and schemas, and `C_exposure` penalizes unnecessary raw-data exposure. The weights are site-specific. For a clinic, `delta` and `lambda` should be high. For an internal warehouse engineering review, `alpha`, `beta`, and `gamma` dominate.

III-G. Packet Quality Score

We define a prototype packet quality score:

Q_packet = 0.25 C_source
         + 0.20 V_timeline
         + 0.20 C_owner
         + 0.15 Q_schema
         + 0.10 Q_privacy
         + 0.10 Q_report

The score is not a safety score and not a legal confidence score. It is a review-readiness score. A low score means the packet should not be shared externally without manual review.

III-H. Clock Alignment

Robot incident reconstruction depends on clock quality. Let `t_s` be source-local time and `t_r` be packet-relative time. For each stream `s`, alignment uses:

t_r = t_s + offset_s + drift_s * (t_s - t_ref)

In v0.3, seeded events are already aligned. A production recorder must estimate `offset_s` and `drift_s` from heartbeat messages, ROS2 header timestamps, MCAP message index time, NTP/PTP synchronization data, or post-hoc event anchors.

III-I. Evidence Confidence

Each claim receives an evidence confidence score:

Q_claim = w_s Q_source + w_t Q_time + w_o Q_owner + w_p Q_policy

where `Q_source` measures source availability, `Q_time` measures timestamp alignment, `Q_owner` measures command authority clarity, and `Q_policy` measures privacy/export eligibility. The packet should expose low-confidence claims rather than hiding them. This is especially important for insurance and customer-facing review, where unsupported certainty is worse than explicit uncertainty.

III-J. Missing Evidence Semantics

Missing evidence is represented explicitly:

missing(role, reason, effect)

Examples include camera unavailable, planner trace not logged, operator session id redacted, safety event hash mismatch, or external CCTV not attached. The effect field describes how the missing source limits reconstruction. A packet with missing evidence can still be useful, but it must not pretend to be complete.

IV. System Architecture

IV-A. System Overview

Blackbox is a six-layer incident system.

L1 Recorder and Trigger
   camera, depth, lidar, control, operator, safety, network

L2 Evidence Manifest
   stream roles, hashes, retention, privacy, export scope

L3 Temporal Alignment
   relative incident timeline, clock offsets, source references

L4 Command-Ownership Attribution
   decision source, execution authority, constraint source, intervention source

L5 Replay and Report
   engineering replay, operations summary, customer-safe report, insurer export

L6 Fleet Incident Graph
   recurring incident types, site patterns, policy failures, design-partner metrics

IV-B. Deployment Model

The intended deployment has three modes:

Offline conversion. Existing logs, videos, and support records are converted into BIP files. This is the first design-partner workflow.

Edge recorder. A robot-side or gateway-side recorder maintains a rolling buffer and creates packets when triggers fire.

Fleet incident graph. Packets across sites are aggregated to expose recurring failure patterns.

IV-C. Reference Integration Platform

Blackbox is designed for mixed robot fleets rather than a single hardware platform. The reference integration assumes:

Component	Reference configuration	Purpose
Robot runtime	ROS2 or ROS-compatible bridge	topic discovery, message capture, replay export
Log container	MCAP or ROS2 bag	timestamped stream preservation
Edge host	robot compute, fleet gateway, or nearby NUC-class recorder	rolling buffer and trigger capture
Video streams	RGB, depth, site CCTV attachment	visual reconstruction and redaction
Command streams	cmd_vel, joint command, planner branch, autonomy mode	command authority reconstruction
Human input	teleop joystick, operator pause, manual mark	shared-autonomy attribution
Safety stream	e-stop, bumper, zone gate, force threshold	safety-event timeline
Report store	packet object storage plus database index	stakeholder retrieval and audit

The framework intentionally avoids depending on a particular robot morphology. It applies to AMRs, hotel delivery robots, sidewalk robots, mobile manipulators, inspection robots, and humanoid fleets when they produce timestamped operational traces.

IV-D. Stream Budget and Capture Policy

The recorder must bound storage. For each incident class `c`, a capture profile defines required roles, preferred roles, pre-window, post-window, and privacy default.

Incident class	Pre-window	Post-window	Required roles	Privacy default
Near miss	300 s	30 s	video, depth/lidar, planner, command, operator, safety	redacted external
Low-speed contact	180 s	60 s	video, bumper/force, command, localization, site context	redacted external
Remote takeover	180 s	45 s	autonomy mode, operator input, command, network, safety	internal by default
Command conflict	180 s	45 s	command authority, teleop session, autonomy state, safety gate	internal by default
Task failure	120 s	120 s	mission state, operator notes, customer event, video optional	customer-safe summary

In production, the capture profile should be configurable per customer site. A hospital, hotel, warehouse, and sidewalk deployment should not have the same export policy.

IV-D1. Storage Model

For a stream set `S` and evidence window `W`, the packet storage budget is:

Bytes(BIP) = sum_{s in S} bitrate_s * duration_s + metadata + indexes

For event-only streams such as safety events or operator commands, the cost is small. Video and depth streams dominate. A practical recorder therefore uses:

continuous low-rate metadata capture
bounded high-rate video/depth ring buffer
event-triggered manifest sealing
post-trigger downsampling for external report views

IV-D2. Retention Model

Retention is split into raw retention and metadata retention:

retention(packet) = { raw_days, redacted_days, metadata_days }

For example, a clinic packet may retain raw video for 7 days, redacted video for 90 days, and metadata for 365 days. A warehouse packet may retain raw data longer if customer privacy risk is lower. This difference must be policy-driven rather than hardcoded.

IV-D3. Export Eligibility

External export is allowed only when:

export_ok = schema_pass
          and seal_valid
          and privacy_pass
          and no_required_source_missing
          and audience_scope_allows

This condition is intentionally stricter than internal replay. Engineering teams may inspect incomplete packets; customers and insurers should not receive packets with unresolved privacy or integrity failures.

IV-E. Recorder and Trigger Layer (L1)

The recorder maintains a bounded ring buffer for allowlisted streams. It is triggered by:

safety stop
bumper event
human-proximity threshold
remote takeover
command conflict
mission failure
customer escalation
manual operator mark

Algorithm 1 shows the trigger and seal process.

Algorithm 1: Triggered Packet Seal
Input: stream buffers B, trigger rule g, incident class c, window W
Output: packet index BIP

1. continuously append allowlisted streams to B
2. if g(B_t) fires at time t0:
3.     freeze interval [t0 - pre(c), t0 + post(c)]
4.     enumerate stream roles and source availability
5.     compute hashes for immutable references
6.     attach trigger metadata and site context
7.     emit packet draft
8.     run schema and policy checks
9.     seal packet if checks pass

IV-F. Evidence Manifest Layer (L2)

The manifest is the packet's evidence index. It records modality, source role, duration, sample rate, storage reference, hash, retention policy, privacy class, and export scope. The manifest allows a reviewer to know which evidence exists without exposing raw data to every audience.

IV-G. Temporal Alignment Layer (L3)

Blackbox maps all evidence to relative incident time, where `t = 0` is the trigger. The timeline is not a raw dump of every event. It is a structured reconstruction with source references and confidence levels.

Algorithm 2 gives temporal alignment.

Algorithm 2: Timeline Alignment
Input: stream manifest S, raw event candidates E, trigger time t0
Output: aligned timeline T

1. estimate clock offset and drift for each source stream
2. transform source timestamps into packet-relative time
3. group event candidates within merge window epsilon
4. attach source references and confidence values
5. remove duplicate low-confidence events
6. sort timeline by relative time
7. flag gaps, missing references, and time reversals
8. emit T with quality metrics

IV-H. Command-Ownership Attribution Layer (L4)

COA represents ownership as intervals:

o_i = (t_start, t_end, d_i, x_i, c_i, h_i, q_i)

where:

`d_i` is decision source.
`x_i` is execution authority.
`c_i` is constraint source.
`h_i` is human intervention source.
`q_i` is evidence quality.

Algorithm 3 describes interval construction.

Algorithm 3: Command-Ownership Attribution
Input: autonomy state A, controller commands U, operator events H, safety events S
Output: ownership interval set O

1. collect all authority transition candidates from A, U, H, S
2. sort candidates by relative time
3. initialize current owner from autonomy mode
4. for each candidate k:
5.     close previous interval at t_k
6.     assign decision source, execution authority, and constraint source
7.     attach evidence references
8.     mark conflict if two active authorities lack arbitration
9. merge adjacent intervals with identical authority tuple
10. emit O with coverage and confidence metrics

IV-I. Replay and Report Layer (L5)

The replay surface has audience-specific views:

Engineering view: raw stream refs, event sources, command buffers, schema validation.
Operations view: incident class, intervention, downtime, next action.
Customer view: redacted timeline, explanation, corrective action.
Insurance or safety view: source integrity, timeline, ownership intervals, report provenance.

Report generation follows Algorithm 4.

Algorithm 4: Redacted Report Generation
Input: sealed packet BIP, audience a, policy P
Output: report R_a

1. select claims whose policy scope includes audience a
2. replace raw stream references with redacted summaries when required
3. include missing-source and confidence notes
4. include timeline, command ownership, and corrective action
5. attach packet id, seal status, and report generation time
6. block export if privacy or integrity checks fail

IV-J. Fleet Incident Graph Layer (L6)

The graph aggregates packets across a fleet. Nodes include robots, sites, incident classes, stream roles, ownership conflicts, trigger policies, privacy classes, and corrective actions. The goal is not only to replay one incident, but to detect repeated patterns across deployments.

IV-K. Tool Categories

Similar to a robotics tool orchestration system, Blackbox can be decomposed into incident tools.

Category	Count in reference design	Example tools
Capture	7	record_video, record_mcap, record_cmd, record_operator, record_safety, attach_cctv, manual_mark
Parse	6	parse_ros2, parse_mcap, parse_teleop, parse_safety, parse_network, parse_ticket
Align	4	estimate_offset, merge_events, check_monotonicity, detect_gap
Attribute	5	infer_owner, detect_conflict, resolve_arbitration, score_confidence, merge_interval
Privacy	5	mask_face, mask_badge, mask_screen, restrict_export, retention_check
Report	5	engineering_report, customer_report, insurer_report, corrective_action, provenance_export
Validate	5	schema_check, hash_check, policy_check, completeness_check, seal_packet

This decomposition makes the product build path concrete. The first implementation does not need every tool, but the paper defines the target system boundary.

V. Incident Reconstruction Methods

V-A. Scenario 1: Human-Proximity Near Miss

V-A1. Detection Phase

The trigger fires when a human-proximity alert, route deviation, operator takeover, or safety stop occurs within a configured time window. Required evidence roles include front RGB, depth or lidar, planner trace, command output, operator input, and safety event.

Near-miss severity uses distance, time-to-contact, robot speed, intervention latency, and human track confidence:

S_near = w_d exp(-d_min / sigma_d)
       + w_t exp(-ttc_min / sigma_t)
       + w_v min(v_robot / v_ref, 1)
       + w_o operator_delay
       + w_h human_confidence

where `d_min` is closest approach, `ttc_min` is minimum predicted time-to-contact, and `operator_delay` is normalized delay between proximity alert and intervention. This is not a legal risk score. It is a packet triage score used to determine review priority.

The trigger predicate is:

trigger_near = (d_min < d_warn)
            or (ttc_min < tau_warn)
            or takeover_after_proximity
            or safety_stop_after_proximity

V-A2. Attribution Phase

The near-miss reconstruction separates environment, perception, planning, operator, and safety layers. In the public Dock Aisle Near-Miss packet, autonomy owns the interval from -300 s to -2.2 s, the remote operator owns the pause interval from -2.2 s to -0.26 s, and the safety system owns the final stop interval from -0.26 s to 0 s.

The reviewer is expected to answer four questions:

Did the robot perceive the human or obstruction?
Did planning select a legal but uncomfortable route?
Did a human operator intervene before or after safety escalation?
Did the safety layer constrain motion as expected?

The packet should support each answer with a stream reference rather than narrative memory.

V-A3. Report Phase

The customer report excludes raw unredacted video. It summarizes the trigger, closest approach, intervention, and corrective action while preserving the internal evidence chain.

For a near-miss report, the recommended external fields are:

Field	External report	Internal replay
Incident class	yes	yes
Closest approach	yes, rounded	exact
Raw RGB video	no	yes, restricted
Operator session id	no	yes
Planner branches	summary	full trace
Corrective action	yes	yes

V-B. Scenario 2: Low-Speed Contact

V-B1. Contact Detection

A low-speed contact packet is triggered by bumper trace, force threshold, velocity discontinuity, or operator mark. Required sources include robot camera, bumper or force trace, command stream, localization, and optional external CCTV.

The contact trigger combines physical and semantic signals:

trigger_contact =
  bumper_event
  or (force_z > F_limit)
  or (abs(v_cmd - v_odom) > tau_v and obstacle_range < d_contact)
  or manual_contact_mark

The packet distinguishes contact from near miss by requiring an impact, bumper, force, or post-event operator/customer mark.

V-B2. Evidence Alignment

The method aligns robot-side streams with site-side evidence. For customer-facing environments such as hotels, the report must separate robot perception failure from temporary site obstruction or third-party interference.

Contact attribution uses four candidate contributors:

contributors = {perception, planning_margin, site_state, operator_action}

Each contributor receives a technical contribution label: `primary`, `secondary`, `observed`, or `not_supported`. The label must include source references. If evidence is missing, the packet should say "not determined" instead of inventing a cause.

V-B3. Root-Cause Draft

The report does not assign legal fault. It identifies technical contributors: perception reclassification, route clearance, site obstruction, operator absence, or safety trigger timing.

The root-cause draft is generated only after manifest, timeline, and ownership checks pass. If checks fail, the report is downgraded to evidence summary.

V-C. Scenario 3: Command Authority Conflict

V-C1. Conflict Detection

Command conflict occurs when autonomy and teleoperation are simultaneously active without a valid arbitration rule. The clinic handoff abort scenario models network recovery during an autonomous handoff routine.

Let `u_a(t)` be autonomy command activity, `u_h(t)` be human operator command activity, and `g(t)` be the arbitration state. Conflict is:

conflict(t) = u_a(t) and u_h(t) and g(t) = undefined

The trigger condition is:

trigger_conflict = exists t in W such that conflict(t) = true

V-C2. Ownership Lock

When conflict is detected, the packet records the overlapping authorities, session IDs, safety gate action, and blocked motion. This helps distinguish an autonomy failure from a remote-assist concurrency failure.

The ownership-lock protocol is:

freeze motion if active control authority is ambiguous
record autonomy mode and teleop session id
attach latest command packet from each authority
record arbitration table state
mark safety gate decision
emit conflict interval

V-C3. Privacy Response

For clinical or home environments, the packet must apply privacy masks and restrict raw exports by default.

The clinic scenario requires `privacy_default = restricted`. Any external report must remove patient identifiers, operator identifiers, screen content, and raw room audio unless explicitly approved.

V-D. Redaction and Privacy Policy

Algorithm 5 shows the privacy gate applied before any external export.

Algorithm 5: Privacy Gate
Input: sealed packet BIP, audience a, policy P
Output: allow_export or block_export

1. check packet seal and schema status
2. enumerate streams referenced by report claims
3. for each stream s:
4.     check privacy class, retention policy, and audience scope
5.     require redaction proof for sensitive visual/audio streams
6.     block export if any required proof is missing
7. emit allow_export only when all checks pass

V-E. Packet Integrity

The prototype uses JSON schema validation and demonstration hashes. A production implementation should use signed manifests, append-only audit trails, key management, and export provenance.

VI. Prototype Implementation

The v0.2 project includes:

Next.js website
public technical paper route
public whitepaper route
seeded incident replay console
three incident scenarios
public Dock Aisle Near-Miss packet
JSON schemas for packet, stream manifest, timeline, and command ownership
validation script
redacted customer report sample
internal pitch deck kept outside the public website

VI-A. Public Packet Artifact

The public packet is located at:

public/samples/dock-17/
  packet.json
  streams-manifest.json
  timeline.json
  command-ownership.json

The public report is located at:

public/reports/dock-17-customer-report.md

VI-B. Tool Categories

Table I summarizes the incident stack.

Category	Count in v0.2	Examples
Evidence streams	6	front_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events
Timeline events	6	route deviation, confidence decay, merge command, worker entry, takeover, safety seal
Ownership intervals	3	autonomy, remote_operator, safety_system
Packet files	4	packet, manifest, timeline, command ownership
Public reports	1	redacted customer report

VI-C. Packet File Schema

The prototype stores one incident across four JSON files.

File	Primary objects	Validation role
packet.json	incident id, robot, site, trigger, summary, privacy, seal state	packet-level completeness
streams-manifest.json	stream list, modality, sample rate, storage ref, hash, export scope	evidence availability
timeline.json	relative time events, actors, detail, source refs	reconstruction order
command-ownership.json	intervals, owner, control surface, transition reason, evidence refs	authority attribution

The schema intentionally separates stream existence from timeline claims. A stream can be present without being used in a report, and a report claim is not allowed unless it points back to evidence.

VI-D. Topic Role Mapping

Robot fleets use different topic names. Blackbox maps fleet-specific topics into stable incident roles.

Incident role	Example ROS2/robot source	Packet role
front_rgb	/camera/front/color/image_raw	visual scene
depth_cloud	/camera/depth/points or /scan	geometry and proximity
cmd_vel	/cmd_vel or base command	executed base command
planner_trace	/planner/debug, branch log, route plan	decision support
operator_input	teleop joystick or remote pause event	human intervention
autonomy_mode	/mode, mission state, behavior tree state	decision source
safety_events	e-stop, bumper, zone gate, force threshold	constraint source
network_state	teleop connection, latency, reconnect	command conflict context

The mapping file is a deployment artifact. Without it, teams can record data but cannot reliably compare incident packets across robots.

VI-E. Packet Lifecycle State Machine

The packet lifecycle is:

buffering -> triggered -> assembling -> validating -> sealed -> reported -> archived

State transitions have explicit failure paths:

State	Failure condition	Result
triggered	missing pre-window	packet marked partial
assembling	stream hash unavailable	packet marked unsealed
validating	schema failure	report export blocked
sealed	hash mismatch after seal	packet invalidated
reported	privacy check failure	customer export blocked
archived	retention expired	raw data removed, metadata retained

VI-F. Report Views

Blackbox produces audience-specific views from the same packet.

View	Audience	Included	Excluded by default
Engineering replay	robotics engineering	raw refs, timeline, command intervals, validation errors	customer-only notes
Operations summary	fleet ops	trigger, downtime, operator action, next step	model internals unless needed
Customer report	customer / site owner	redacted timeline, corrective action, evidence summary	raw video, operator identity
Insurer/safety export	insurer or safety reviewer	timeline, evidence manifest, integrity status	unnecessary private footage

This view separation is central to the product. The same incident must be technically useful internally and externally safe to share.

VII. Experiments

VII-A. Experimental Setup

We evaluate the prototype on a seeded incident corpus, not real customer logs. The corpus contains three deployment archetypes:

Dock Aisle Near-Miss: warehouse AMR and worker proximity.
Service Cart Contact: hotel delivery robot and temporary obstruction.
Clinic Handoff Abort: mobile manipulator and command authority conflict.

Only the Dock Aisle Near-Miss currently has a complete public packet. The other two scenarios are represented in the replay console and define next conversion targets.

The evaluation environment is the local project workspace. It includes a Next.js replay surface, JSON packet artifacts, schemas, and a Node.js validation script. Because this is not a production recorder, we do not evaluate camera encoding throughput, ROS2 subscription backpressure, or signed-manifest latency. Instead, the experiment evaluates whether the packet artifact can represent and validate a complete incident reconstruction.

VII-A1. Seeded Corpus Construction

Each seeded incident is constructed from five elements:

incident narrative
evidence source list
timeline events
responsibility or ownership trace
stakeholder report target

The corpus is intentionally scenario-diverse. The near miss stresses human proximity and operator takeover. The contact event stresses external site evidence and property-damage explanation. The clinic abort stresses command conflict and privacy constraints.

VII-A2. Required Source Roles

Scenario	Required source roles
Dock Aisle Near-Miss	front_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events
Service Cart Contact	robot_camera, bumper_trace, cmd_vel, localization, site_cctv
Clinic Handoff Abort	arm_joints, command_owner, teleop_packets, privacy_masks, safety_gate

VII-A3. Evaluation Boundaries

The experiment does not claim that Blackbox detects incidents in the wild. It evaluates post-incident packet reconstruction. The current prototype assumes that seed events and streams exist. Production work must add live triggers, robust buffering, and signed sealing.

VII-B. Evaluation Metrics

We use six metrics.

Metric	Definition	Purpose
Source coverage	Required evidence roles present in manifest	Measures whether review has enough streams
Timeline consistency	Monotonic events with source references	Measures reconstruction coherence
Ownership coverage	Action window covered by ownership intervals	Measures command authority traceability
Schema validity	JSON schema validator result	Measures machine-checkable packet structure
Report readiness	Whether stakeholder report can be generated	Measures external communication readiness
Privacy separation	Raw and redacted evidence access separated	Measures shareability risk

VII-C. Public Packet Results

Artifact	Value
Evidence window	330 s
Packet files	4
Evidence streams	6
Timeline events	6
Ownership intervals	3
Redacted customer reports	1
Validator result	pass
Validator wall time	2.87 s

The public packet reaches full source coverage for its declared required roles:

C_source = 6 / 6 = 1.00

The timeline contains six events and no timestamp reversals in the seeded artifact:

V_timeline = 1.00

The command-ownership intervals cover the full action window from -300 s to 0 s:

C_owner = 1.00

Using the prototype quality score with all declared checks passing:

Q_packet = 1.00

This value should be read carefully. It means the seeded packet is internally complete under its declared schema. It does not mean that a real deployment packet would always score 1.00.

VII-D. Scenario Coverage

Scenario	Required source roles	Present in v0.2	Timeline status	Ownership status	Report status
Dock Aisle Near-Miss	6	6	complete	3 intervals	customer-safe report
Service Cart Contact	5	5 in console	seeded	planned packet	engineering draft
Clinic Handoff Abort	5	5 in console	seeded	planned conflict intervals	privacy-focused draft

VII-E. Baselines

We compare against three practical baselines.

Baseline	Description	Limitation
Manual log review	Engineer searches logs, video, tickets, and dashboards	Slow, non-repeatable, hard to share
Observability-only replay	Visualize ROS/MCAP streams	Strong debug tool, weak stakeholder packet
Support-ticket narrative	Operator writes incident summary	No machine-checkable evidence chain

VII-F. Baseline Comparison

Review task	Manual logs	Observability replay	Blackbox packet
Identify event window	operator-dependent	possible but manual	explicit trigger window
Locate sources	scattered	stream browser	manifest
Reconstruct sequence	manual	visual playback	timeline with references
Attribute authority	inferred	partial	ownership intervals
Share externally	hand-written	rarely safe	redacted report
Validate completeness	ad hoc	ad hoc	schema validation

VII-G. Ablation Study

We evaluate the packet concept by removing major components.

Configuration	Source coverage	Timeline validity	Ownership coverage	External reportability	Expected failure mode
Full packet	1.00	1.00	1.00	pass	reviewable packet
Without manifest	undefined	1.00	1.00	blocked	reviewer cannot tell which sources exist or are missing
Without timeline	1.00	undefined	1.00	weak	raw streams remain hard to explain to non-engineering stakeholders
Without ownership intervals	1.00	1.00	undefined	weak	remote takeover and safety gating collapse into vague autonomy state
Without privacy policy	1.00	1.00	1.00	blocked	customer report risks exposing raw video or operator data
Without schema validation	unknown	unknown	unknown	blocked	packet completeness becomes manual and inconsistent

VII-H. Response Time Decomposition

The prototype benchmark separates reconstruction time into conceptual phases. Only schema validation is measured directly in v0.3; the other phases are design targets for the production recorder.

Phase	v0.3 status	Measurement or target
Trigger detection	design target	< 250 ms after trigger event
Buffer freeze	design target	< 1 s for bounded window index
Manifest assembly	prototype artifact	manual/seeded
Timeline alignment	prototype artifact	manual/seeded
Ownership attribution	prototype artifact	manual/seeded
Schema validation	measured	2.87 s
Customer report generation	prototype artifact	manual/seeded

The goal for a production v1 system is not real-time root-cause analysis. It is reliable packet sealing soon enough that evidence is not lost and review can begin quickly.

VII-I. Command-Ownership Case Study

The Dock Aisle Near-Miss packet contains three intervals:

Interval	Owner	Control surface	Evidence
-300.000 s to -2.200 s	autonomy	planner plus mobile base controller	planner_trace, cmd_vel
-2.200 s to -0.260 s	remote_operator	teleop_pause	operator_input, cmd_vel
-0.260 s to 0.000 s	safety_system	supervised_safety_stop	safety_events, cmd_vel

This decomposition prevents a misleading single-cause narrative. The event involves site obstruction, perception confidence decay, planning margin, operator intervention, and safety stop confirmation.

VII-J. Computational Performance

The current prototype measures packet validation, not full stream ingestion. Running:

/usr/bin/time -p npm run validate:packet

produced:

real 2.87
user 0.95
sys 0.35

This result is not a production latency benchmark. It only verifies that the current packet schema and validator operate within a practical development workflow.

VII-K. Comparison with Baseline Systems

System	Evidence boundary	Ownership attribution	Privacy-aware export	Machine validation	Review repeatability
Support ticket only	no	no	manual	no	low
Raw ROS/MCAP logs	partial	manual	no	partial	medium
Fleet dashboard	event state	partial	no	no	medium
Observability replay	streams	manual	no	partial	medium-high
Blackbox packet	yes	yes	yes	yes	high

This comparison is functional rather than commercial. Blackbox should integrate with observability replay; the comparison shows why replay alone is not an incident system of record.

VII-L. Real-World Validation Plan

The next evaluation requires design-partner incidents. The study should collect three incidents per partner:

one near miss or safety stop
one remote takeover or command conflict
one customer-facing task failure or low-speed contact

For each incident, reviewers compare the existing workflow against the Blackbox packet. Metrics should include:

time-to-understanding
reviewer agreement
missing-source rate
report approval rate
customer-shareability rating
corrective-action clarity

VII-M. Proposed Reviewer Study

The first human-subject-free reviewer study can be conducted internally with robotics engineers and operations reviewers. Each reviewer receives the same seeded incident in two formats: baseline logs/report fragments and Blackbox packet. The reviewer answers:

What happened?
Which system layer most needs engineering review?
Who or what controlled the robot during the critical interval?
Which evidence source supports the claim?
Could this summary be shared with a customer?

The primary metric is time-to-understanding. Secondary metrics are answer agreement, unsupported-claim count, and confidence calibration.

Study variable	Baseline condition	Blackbox condition
Evidence	scattered logs and notes	packet manifest and timeline
Ownership	inferred manually	interval table
Privacy	reviewer judgment	explicit export policy
Output	free-form answer	structured report

VII-N. Production Recorder Benchmark Targets

The production recorder should be evaluated separately from the current packet prototype.

Benchmark	Target
Recorder CPU overhead	< 10% on edge host
Dropped command messages	0 during incident window
Video frame retention	> 99% in bounded window
Packet trigger-to-freeze time	< 1 s
Manifest creation time	< 5 s for 5 min window
Customer report generation	< 60 s after packet seal
Hash verification	pass before export

These targets are design constraints, not achieved v0.3 results.

VIII. Discussion

VIII-A. Why Packetization Matters

Logs are necessary but not sufficient. A packet makes the incident a shared object. Engineering, operations, customer success, insurance, and safety teams can discuss the same evidence boundary instead of building separate narratives.

VIII-B. Why Command Ownership Matters

Autonomy state is too coarse for modern robot fleets. Many incidents occur under shared autonomy, remote assist, cloud orchestration, and safety gating. COA exposes authority transitions that would otherwise be hidden.

VIII-C. Privacy as a Reconstruction Constraint

In homes, clinics, hotels, offices, and public spaces, raw replay is often not shareable. Redaction, retention, and access scope must be packet properties, not post-hoc edits.

VIII-D. Relationship to Existing Robotics Tools

Blackbox should integrate with ROS2, MCAP, Foxglove-style visualization, fleet operations dashboards, and support systems. The product wedge is not generic observability. It is incident evidence assembly, attribution, and reportability.

IX. Limitations and Failure Analysis

IX-A. Current Limitations

The current system is a prototype.
The seeded incidents are realistic but fictional.
Only one public packet is complete.
No real customer incident corpus has been evaluated.
Hashes and sealing are demonstration-grade.
Video redaction is represented as metadata and report policy, not production video processing.
The system does not assign legal fault.
The system does not certify safety compliance.
The system does not prevent incidents.

IX-B. Failure Modes

Failure mode	Impact	Mitigation
Missing stream	Timeline cannot support a claim	explicit missing-source flag
Clock drift	Events appear out of order	offset estimation and confidence scoring
Ambiguous command authority	Ownership interval cannot be resolved	conflict marker and arbitration table
Privacy mask failure	Report cannot be shared externally	raw export block and manual review
Hash mismatch	Packet integrity compromised	seal invalidation and audit trail
Overconfident root-cause draft	Reviewer may treat technical contribution as legal fault	explicit non-fault language

IX-C. Deployment Considerations

A production system must solve:

edge recorder reliability
low-overhead stream selection
MCAP/ROS2 ingest
signed packet manifests
key management
retention policy enforcement
customer-specific privacy rules
reviewer audit trails
fleet-scale storage cost

X. Future Work

Future work includes:

production ROS2 and MCAP ingest
edge recorder with rolling buffer
signed manifests and append-only audit logs
video and image redaction service
command arbitration table for teleoperation systems
Foxglove-compatible export
insurer-facing packet view
fleet incident graph
design-partner user study
time-to-understanding benchmark
reviewer agreement benchmark

XI. Conclusion

Blackbox Robotics proposes incident packetization, replay, and command-ownership attribution for deployed robot fleets. The central claim is that robot incidents should produce structured, schema-valid evidence packets rather than scattered logs and informal narratives. By defining BIP and COA, the system makes incident reconstruction more explicit: what happened, when it happened, which evidence supports it, who controlled the robot, what privacy constraints apply, and what can be shared externally.

The current prototype demonstrates the artifact and review workflow through a public Dock Aisle Near-Miss packet and two additional seeded scenarios. The next proof is field validation with design partners. If validated, Blackbox can become a practical evidence layer between robot operations, engineering debugging, customer trust, insurance review, and safety governance.

References

[1] International Federation of Robotics, "Executive Summary World Robotics 2025 - Service Robots," 2025. https://ifr.org/img/worldrobotics/Executive_Summary_WR_2025_Service_Robots.pdf

[2] ISO, "ISO 10218-1:2025 Robotics - Safety requirements - Part 1: Industrial robots," 2025. https://www.iso.org/standard/73933.html

[3] ROS 2 Documentation, "Recording a bag from a node." https://docs.ros.org/en/rolling/Tutorials/Advanced/Recording-A-Bag-From-Your-Own-Node-CPP.html

[4] MCAP, "Open source container file format for multimodal log data." https://mcap.dev/

[5] NHTSA, "Event Data Recorder." https://www.nhtsa.gov/research-data/event-data-recorder

[6] A. Butler, S. Izadi, and M. Cakmak, "The Privacy-Utility Tradeoff for Remotely Teleoperated Robots," ACM/IEEE International Conference on Human-Robot Interaction, 2015. https://hcrlab.cs.washington.edu/publications/butler2015hri/

[7] "Identifying human-robot interaction incident archetypes: a system and network analysis of accidents," Safety Science, Volume 191, 2025. https://www.sciencedirect.com/science/article/pii/S0925753525001845

[8] Foxglove, robotics observability and visualization. https://foxglove.dev/

[9] Formant, fleet observability. https://docs.formant.io/docs/fleet-observability

[10] InOrbit, robot operations platform. https://www.inorbit.ai/

[11] T. N. Canh, T. T. Viet, T. T. Tran, and B. W. Lim, "SafeGuard ASF: SR Agentic Humanoid Robot System for Autonomous Industrial Safety," arXiv:2603.25353, 2026. https://arxiv.org/html/2603.25353v1