Back to product
Technical Paper v0.3

Blackbox Robotics: Incident Packetization, Replay, and Command-Ownership Attribution for Deployed Robot Fleets

Blackbox Robotics Research TeamIndependent projectJune 2026
systems-paper structureexplicit limitationspublic BIP v0.3
Figure 1

Six-layer incident architecture

L1Recorder and Trigger
L2Evidence Manifest
L3Temporal Alignment
L4Command Ownership
L5Replay and Report
L6Fleet Incident Graph

Sensor, command, operator, and safety traces are converted into a bounded packet, then aligned into a timeline, attributed across command authorities, and rendered into audience-specific reports.

Abstract

Deployed robot fleets increasingly operate in warehouses, hotels, clinics, offices, sidewalks, factories, and other environments where autonomy, remote operators, safety controllers, site policy, and bystanders interact in the same physical event. When an incident occurs, the evidence required to understand it is usually fragmented across camera files, ROS2 bags, MCAP logs, fleet dashboards, teleoperation sessions, safety controller traces, support tickets, operator notes, and customer reports. This fragmentation creates a reconstruction gap: robot teams may possess raw data, but lack a standard evidence object that explains what happened, who controlled the robot, which sources support each claim, and what can be shared externally.

We present Blackbox Robotics, an incident packetization and replay framework for deployed robot fleets. The framework defines the Blackbox Incident Packet (BIP), a bounded, schema-validated evidence object containing incident metadata, stream manifests, timeline events, command-ownership intervals, privacy policy, hash metadata, and stakeholder reports. We also introduce Command-Ownership Attribution (COA), an interval model that separates decision source, execution authority, constraint source, and intervention source during robot incidents. The system architecture is organized as a six-layer incident stack: recorder and trigger, evidence manifest, temporal alignment, ownership attribution, replay and report, and fleet incident graph.

We evaluate the v0.3 prototype using a seeded incident corpus covering three deployment archetypes: a warehouse near miss, a hotel service-cart contact event, and a clinic handoff abort caused by command authority conflict. The public Dock Aisle Near-Miss artifact contains 4 packet files, 6 evidence streams, 6 timeline events, 3 command-ownership intervals, and a 330 s evidence window. The current JSON validator passes the packet schema in 2.87 s on a developer workstation. We report source coverage, timeline consistency, ownership coverage, schema validity, baseline comparison, ablation analysis, response-time decomposition, packet-quality scoring, and deployment failure modes. The system is a prototype and does not claim legal fault assignment, safety certification, real customer deployment, or incident prevention. Its purpose is to make robot incident reconstruction testable, reviewable, and shareable.

Keywords: robot incident analysis, robot fleet operations, ROS2, MCAP, teleoperation, robot observability, command ownership, evidence packet, post-incident replay, safety review, service robots.

I. Introduction

Robot fleets are moving from controlled demonstrations into continuous operations. Professional service robots now work in logistics, cleaning, inspection, hospitality, retail, healthcare support, and delivery. The International Federation of Robotics reported 9% growth in professional service robot sales in 2024, more than 199,000 professional service robots in its sample, and 31% growth in Robotics-as-a-Service fleet size. As the deployed base grows, robot incidents become operational events rather than isolated engineering bugs.

A single robot incident can involve multiple layers:

  • perception confidence changes
  • route replanning
  • local controller commands
  • remote operator takeover
  • network delay or recovery
  • safety controller gating
  • site-policy violations
  • customer-facing impact
  • privacy-sensitive video evidence
  • post-event support escalation

In current workflows, these traces are usually examined through separate systems. Engineers inspect ROS bags or MCAP files. Operations teams review fleet dashboards. Remote-assist teams inspect operator session logs. Safety teams ask for event details. Customer-success teams write a report. Insurance or compliance reviewers may request evidence after the fact. The result is slow and inconsistent reconstruction.

This paper argues that deployed robot fleets need an incident system of record. The core artifact should not be a folder of logs or a dashboard screenshot. It should be a bounded packet that preserves evidence, reconstructs the event timeline, attributes command authority, and produces audience-specific reports.

We make four contributions.

  1. Blackbox Incident Packet (BIP). We define a versioned packet abstraction for robot incidents, including metadata, stream manifest, timeline, command ownership, privacy metadata, hashes, retention policy, and report views.
  1. Command-Ownership Attribution (COA). We introduce an interval model for attributing decision source, execution authority, constraint source, and intervention source during incidents.
  1. Six-layer incident architecture. We describe a recorder-to-report stack for robot fleets that integrates with existing ROS2, MCAP, teleoperation, safety, and fleet operations systems.
  1. Seeded evaluation protocol. We evaluate the prototype on three incident archetypes and one public packet, reporting schema validity, source coverage, timeline consistency, ownership coverage, baseline comparison, ablation, and failure modes.

II-A. Event Data Recorders and Incident Reconstruction

Vehicle event data recorders provide a useful analogy for bounded evidence capture. NHTSA describes event data recorders as systems that record technical information for a short period before, during, and after a crash. They are not full surveillance systems; they are event-triggered reconstruction artifacts. Robot fleets need a similar concept, but with additional complexity: robots may combine autonomous decision-making, teleoperation, cloud orchestration, safety gating, and human-robot interaction.

Blackbox borrows the bounded-window principle while extending the evidence model to multimodal robotics logs and command authority transitions.

II-B. Robot Data Formats and Observability

ROS2 bag tooling supports recording and playback of topic data. MCAP provides an open container format for timestamped multimodal log data. Robot observability tools such as Foxglove, Formant, and InOrbit help teams visualize logs, inspect telemetry, manage fleets, and debug behavior.

Blackbox is not a replacement for these tools. It is a packetization layer above them. A BIP can reference ROS2 bags, MCAP files, video streams, safety events, teleoperation logs, and external attachments while exposing incident-level semantics: trigger, timeline, ownership, evidence quality, privacy status, and report views.

II-C. Human-Robot Incident Analysis

Human-robot interaction safety research increasingly treats incidents as systemic events rather than single-component failures. Accident analysis often involves human behavior, environment changes, robot policy, interaction timing, and system design. Recent work on HRI incident archetypes reinforces the need to preserve both machine traces and human-facing context.

Blackbox operationalizes this by making timeline events source-referenced and by keeping customer-facing impact separate from internal debug detail.

II-D. Teleoperation, Shared Autonomy, and Privacy

Remote operation changes the reconstruction problem. In supervised autonomy, an event may include autonomy, remote operator input, stale command packets, network recovery, and safety overrides. Privacy research in teleoperated robots also shows that remote presence in human environments creates sensitive data exposure risks.

Blackbox treats teleoperation and privacy as first-class packet properties. Operator commands are not buried in unstructured logs, and raw video is not assumed to be externally shareable.

II-E. Safety Standards and Compliance Boundaries

Safety standards such as ISO 10218-1:2025 provide requirements for industrial robot safety. Blackbox does not replace certified safety systems or legal review. Its role is post-event technical reconstruction. The packet can support safety review by preserving evidence, but it does not itself certify a robot or assign legal fault.

II-F. Agentic Robotics and System Papers

Recent robotics system papers, including SafeGuard ASF, use a layered architecture, scenario-driven methods, explicit metrics, baseline comparisons, and simulation or real-world evaluation to establish credibility. Blackbox follows this systems-paper style while focusing on incident evidence infrastructure rather than humanoid hazard response.

III. Problem Formulation

Let an incident be:

I = (r, s, t0, W, C, E)

where `r` is the robot, `s` is the site, `t0` is the trigger time, `W = [t0 - pre, t0 + post]` is the evidence window, `C` is the incident class, and `E` is the set of raw evidence sources.

The output is a Blackbox Incident Packet:

BIP = { M, S, T, O, P, H, A, R }

where:

  • `M` is incident metadata.
  • `S` is the stream manifest.
  • `T` is the synchronized incident timeline.
  • `O` is the command-ownership interval set.
  • `P` is privacy and retention policy.
  • `H` is hash, seal, and provenance metadata.
  • `A` is annotations.
  • `R` is stakeholder report output.

The reconstruction objective is to maximize review utility while satisfying bounded evidence, source traceability, privacy separation, and schema validity.

III-A. Evidence Coverage

For incident class `c`, let `Req(c)` be the required stream roles and `Obs(I)` be observed stream roles. Source coverage is:

C_source(I) = |Req(c) intersect Obs(I)| / |Req(c)|

III-B. Timeline Consistency

For ordered timeline events `e_i`, monotonicity requires:

t(e_i) <= t(e_{i+1})

Each event must include source references. We define timeline validity:

V_timeline = 1 - (N_time_violations + N_unreferenced_events) / N_events

III-C. Command-Ownership Coverage

Let `A = [t_a, t_b]` be the action-critical interval and `O` be the set of ownership intervals. Coverage is:

C_owner = duration(union(O) intersect A) / duration(A)

III-D. Ownership Conflict

At time `t`, let `Active(t)` be the set of active command authorities. A conflict occurs when:

|Active(t)| > 1 and arbitration(Active(t), mode_t) = undefined

For example, a remote joystick packet overlapping with an autonomous handoff routine after network recovery is a conflict unless an explicit arbitration rule yields control.

III-E. Incident Evidence Graph

We model packet reconstruction as an evidence graph:

G_I = (V, E_g)

where vertices include raw streams, derived events, command intervals, annotations, redaction objects, and report claims:

V = V_stream union V_event union V_owner union V_policy union V_claim

Edges encode support relationships. A report claim is admissible only when it is connected to at least one event and one stream reference:

admissible(claim_j) = exists e_i, s_k such that s_k -> e_i -> claim_j

This graph prevents a customer report from becoming a free-form narrative detached from packet evidence. In v0.3 the graph is implicit in JSON references; in production it should be materialized as an indexed provenance graph.

III-F. Reconstruction Objective

Given raw evidence `E`, the reconstruction process selects a packet `BIP*` that maximizes a weighted utility:

BIP* = argmax_BIP [
  alpha C_source
  + beta V_timeline
  + gamma C_owner
  + delta Q_privacy
  + eta Q_integrity
  - lambda C_exposure
]

where `Q_privacy` rewards correct privacy separation, `Q_integrity` rewards valid hashes and schemas, and `C_exposure` penalizes unnecessary raw-data exposure. The weights are site-specific. For a clinic, `delta` and `lambda` should be high. For an internal warehouse engineering review, `alpha`, `beta`, and `gamma` dominate.

III-G. Packet Quality Score

We define a prototype packet quality score:

Q_packet = 0.25 C_source
         + 0.20 V_timeline
         + 0.20 C_owner
         + 0.15 Q_schema
         + 0.10 Q_privacy
         + 0.10 Q_report

The score is not a safety score and not a legal confidence score. It is a review-readiness score. A low score means the packet should not be shared externally without manual review.

III-H. Clock Alignment

Robot incident reconstruction depends on clock quality. Let `t_s` be source-local time and `t_r` be packet-relative time. For each stream `s`, alignment uses:

t_r = t_s + offset_s + drift_s * (t_s - t_ref)

In v0.3, seeded events are already aligned. A production recorder must estimate `offset_s` and `drift_s` from heartbeat messages, ROS2 header timestamps, MCAP message index time, NTP/PTP synchronization data, or post-hoc event anchors.

III-I. Evidence Confidence

Each claim receives an evidence confidence score:

Q_claim = w_s Q_source + w_t Q_time + w_o Q_owner + w_p Q_policy

where `Q_source` measures source availability, `Q_time` measures timestamp alignment, `Q_owner` measures command authority clarity, and `Q_policy` measures privacy/export eligibility. The packet should expose low-confidence claims rather than hiding them. This is especially important for insurance and customer-facing review, where unsupported certainty is worse than explicit uncertainty.

III-J. Missing Evidence Semantics

Missing evidence is represented explicitly:

missing(role, reason, effect)

Examples include camera unavailable, planner trace not logged, operator session id redacted, safety event hash mismatch, or external CCTV not attached. The effect field describes how the missing source limits reconstruction. A packet with missing evidence can still be useful, but it must not pretend to be complete.

IV. System Architecture

IV-A. System Overview

Blackbox is a six-layer incident system.

L1 Recorder and Trigger
   camera, depth, lidar, control, operator, safety, network

L2 Evidence Manifest
   stream roles, hashes, retention, privacy, export scope

L3 Temporal Alignment
   relative incident timeline, clock offsets, source references

L4 Command-Ownership Attribution
   decision source, execution authority, constraint source, intervention source

L5 Replay and Report
   engineering replay, operations summary, customer-safe report, insurer export

L6 Fleet Incident Graph
   recurring incident types, site patterns, policy failures, design-partner metrics

IV-B. Deployment Model

The intended deployment has three modes:

  1. Offline conversion. Existing logs, videos, and support records are converted into BIP files. This is the first design-partner workflow.
  1. Edge recorder. A robot-side or gateway-side recorder maintains a rolling buffer and creates packets when triggers fire.
  1. Fleet incident graph. Packets across sites are aggregated to expose recurring failure patterns.

IV-C. Reference Integration Platform

Blackbox is designed for mixed robot fleets rather than a single hardware platform. The reference integration assumes:

ComponentReference configurationPurpose
Robot runtimeROS2 or ROS-compatible bridgetopic discovery, message capture, replay export
Log containerMCAP or ROS2 bagtimestamped stream preservation
Edge hostrobot compute, fleet gateway, or nearby NUC-class recorderrolling buffer and trigger capture
Video streamsRGB, depth, site CCTV attachmentvisual reconstruction and redaction
Command streamscmd_vel, joint command, planner branch, autonomy modecommand authority reconstruction
Human inputteleop joystick, operator pause, manual markshared-autonomy attribution
Safety streame-stop, bumper, zone gate, force thresholdsafety-event timeline
Report storepacket object storage plus database indexstakeholder retrieval and audit

The framework intentionally avoids depending on a particular robot morphology. It applies to AMRs, hotel delivery robots, sidewalk robots, mobile manipulators, inspection robots, and humanoid fleets when they produce timestamped operational traces.

IV-D. Stream Budget and Capture Policy

The recorder must bound storage. For each incident class `c`, a capture profile defines required roles, preferred roles, pre-window, post-window, and privacy default.

Incident classPre-windowPost-windowRequired rolesPrivacy default
Near miss300 s30 svideo, depth/lidar, planner, command, operator, safetyredacted external
Low-speed contact180 s60 svideo, bumper/force, command, localization, site contextredacted external
Remote takeover180 s45 sautonomy mode, operator input, command, network, safetyinternal by default
Command conflict180 s45 scommand authority, teleop session, autonomy state, safety gateinternal by default
Task failure120 s120 smission state, operator notes, customer event, video optionalcustomer-safe summary

In production, the capture profile should be configurable per customer site. A hospital, hotel, warehouse, and sidewalk deployment should not have the same export policy.

IV-D1. Storage Model

For a stream set `S` and evidence window `W`, the packet storage budget is:

Bytes(BIP) = sum_{s in S} bitrate_s * duration_s + metadata + indexes

For event-only streams such as safety events or operator commands, the cost is small. Video and depth streams dominate. A practical recorder therefore uses:

  • continuous low-rate metadata capture
  • bounded high-rate video/depth ring buffer
  • event-triggered manifest sealing
  • post-trigger downsampling for external report views

IV-D2. Retention Model

Retention is split into raw retention and metadata retention:

retention(packet) = { raw_days, redacted_days, metadata_days }

For example, a clinic packet may retain raw video for 7 days, redacted video for 90 days, and metadata for 365 days. A warehouse packet may retain raw data longer if customer privacy risk is lower. This difference must be policy-driven rather than hardcoded.

IV-D3. Export Eligibility

External export is allowed only when:

export_ok = schema_pass
          and seal_valid
          and privacy_pass
          and no_required_source_missing
          and audience_scope_allows

This condition is intentionally stricter than internal replay. Engineering teams may inspect incomplete packets; customers and insurers should not receive packets with unresolved privacy or integrity failures.

IV-E. Recorder and Trigger Layer (L1)

The recorder maintains a bounded ring buffer for allowlisted streams. It is triggered by:

  • safety stop
  • bumper event
  • human-proximity threshold
  • remote takeover
  • command conflict
  • mission failure
  • customer escalation
  • manual operator mark

Algorithm 1 shows the trigger and seal process.

Algorithm 1: Triggered Packet Seal
Input: stream buffers B, trigger rule g, incident class c, window W
Output: packet index BIP

1. continuously append allowlisted streams to B
2. if g(B_t) fires at time t0:
3.     freeze interval [t0 - pre(c), t0 + post(c)]
4.     enumerate stream roles and source availability
5.     compute hashes for immutable references
6.     attach trigger metadata and site context
7.     emit packet draft
8.     run schema and policy checks
9.     seal packet if checks pass

IV-F. Evidence Manifest Layer (L2)

The manifest is the packet's evidence index. It records modality, source role, duration, sample rate, storage reference, hash, retention policy, privacy class, and export scope. The manifest allows a reviewer to know which evidence exists without exposing raw data to every audience.

IV-G. Temporal Alignment Layer (L3)

Blackbox maps all evidence to relative incident time, where `t = 0` is the trigger. The timeline is not a raw dump of every event. It is a structured reconstruction with source references and confidence levels.

Algorithm 2 gives temporal alignment.

Algorithm 2: Timeline Alignment
Input: stream manifest S, raw event candidates E, trigger time t0
Output: aligned timeline T

1. estimate clock offset and drift for each source stream
2. transform source timestamps into packet-relative time
3. group event candidates within merge window epsilon
4. attach source references and confidence values
5. remove duplicate low-confidence events
6. sort timeline by relative time
7. flag gaps, missing references, and time reversals
8. emit T with quality metrics

IV-H. Command-Ownership Attribution Layer (L4)

COA represents ownership as intervals:

o_i = (t_start, t_end, d_i, x_i, c_i, h_i, q_i)

where:

  • `d_i` is decision source.
  • `x_i` is execution authority.
  • `c_i` is constraint source.
  • `h_i` is human intervention source.
  • `q_i` is evidence quality.

Algorithm 3 describes interval construction.

Algorithm 3: Command-Ownership Attribution
Input: autonomy state A, controller commands U, operator events H, safety events S
Output: ownership interval set O

1. collect all authority transition candidates from A, U, H, S
2. sort candidates by relative time
3. initialize current owner from autonomy mode
4. for each candidate k:
5.     close previous interval at t_k
6.     assign decision source, execution authority, and constraint source
7.     attach evidence references
8.     mark conflict if two active authorities lack arbitration
9. merge adjacent intervals with identical authority tuple
10. emit O with coverage and confidence metrics

IV-I. Replay and Report Layer (L5)

The replay surface has audience-specific views:

  • Engineering view: raw stream refs, event sources, command buffers, schema validation.
  • Operations view: incident class, intervention, downtime, next action.
  • Customer view: redacted timeline, explanation, corrective action.
  • Insurance or safety view: source integrity, timeline, ownership intervals, report provenance.

Report generation follows Algorithm 4.

Algorithm 4: Redacted Report Generation
Input: sealed packet BIP, audience a, policy P
Output: report R_a

1. select claims whose policy scope includes audience a
2. replace raw stream references with redacted summaries when required
3. include missing-source and confidence notes
4. include timeline, command ownership, and corrective action
5. attach packet id, seal status, and report generation time
6. block export if privacy or integrity checks fail

IV-J. Fleet Incident Graph Layer (L6)

The graph aggregates packets across a fleet. Nodes include robots, sites, incident classes, stream roles, ownership conflicts, trigger policies, privacy classes, and corrective actions. The goal is not only to replay one incident, but to detect repeated patterns across deployments.

IV-K. Tool Categories

Similar to a robotics tool orchestration system, Blackbox can be decomposed into incident tools.

CategoryCount in reference designExample tools
Capture7record_video, record_mcap, record_cmd, record_operator, record_safety, attach_cctv, manual_mark
Parse6parse_ros2, parse_mcap, parse_teleop, parse_safety, parse_network, parse_ticket
Align4estimate_offset, merge_events, check_monotonicity, detect_gap
Attribute5infer_owner, detect_conflict, resolve_arbitration, score_confidence, merge_interval
Privacy5mask_face, mask_badge, mask_screen, restrict_export, retention_check
Report5engineering_report, customer_report, insurer_report, corrective_action, provenance_export
Validate5schema_check, hash_check, policy_check, completeness_check, seal_packet

This decomposition makes the product build path concrete. The first implementation does not need every tool, but the paper defines the target system boundary.

V. Incident Reconstruction Methods

V-A. Scenario 1: Human-Proximity Near Miss

V-A1. Detection Phase

The trigger fires when a human-proximity alert, route deviation, operator takeover, or safety stop occurs within a configured time window. Required evidence roles include front RGB, depth or lidar, planner trace, command output, operator input, and safety event.

Near-miss severity uses distance, time-to-contact, robot speed, intervention latency, and human track confidence:

S_near = w_d exp(-d_min / sigma_d)
       + w_t exp(-ttc_min / sigma_t)
       + w_v min(v_robot / v_ref, 1)
       + w_o operator_delay
       + w_h human_confidence

where `d_min` is closest approach, `ttc_min` is minimum predicted time-to-contact, and `operator_delay` is normalized delay between proximity alert and intervention. This is not a legal risk score. It is a packet triage score used to determine review priority.

The trigger predicate is:

trigger_near = (d_min < d_warn)
            or (ttc_min < tau_warn)
            or takeover_after_proximity
            or safety_stop_after_proximity

V-A2. Attribution Phase

The near-miss reconstruction separates environment, perception, planning, operator, and safety layers. In the public Dock Aisle Near-Miss packet, autonomy owns the interval from -300 s to -2.2 s, the remote operator owns the pause interval from -2.2 s to -0.26 s, and the safety system owns the final stop interval from -0.26 s to 0 s.

The reviewer is expected to answer four questions:

  1. Did the robot perceive the human or obstruction?
  2. Did planning select a legal but uncomfortable route?
  3. Did a human operator intervene before or after safety escalation?
  4. Did the safety layer constrain motion as expected?

The packet should support each answer with a stream reference rather than narrative memory.

V-A3. Report Phase

The customer report excludes raw unredacted video. It summarizes the trigger, closest approach, intervention, and corrective action while preserving the internal evidence chain.

For a near-miss report, the recommended external fields are:

FieldExternal reportInternal replay
Incident classyesyes
Closest approachyes, roundedexact
Raw RGB videonoyes, restricted
Operator session idnoyes
Planner branchessummaryfull trace
Corrective actionyesyes

V-B. Scenario 2: Low-Speed Contact

V-B1. Contact Detection

A low-speed contact packet is triggered by bumper trace, force threshold, velocity discontinuity, or operator mark. Required sources include robot camera, bumper or force trace, command stream, localization, and optional external CCTV.

The contact trigger combines physical and semantic signals:

trigger_contact =
  bumper_event
  or (force_z > F_limit)
  or (abs(v_cmd - v_odom) > tau_v and obstacle_range < d_contact)
  or manual_contact_mark

The packet distinguishes contact from near miss by requiring an impact, bumper, force, or post-event operator/customer mark.

V-B2. Evidence Alignment

The method aligns robot-side streams with site-side evidence. For customer-facing environments such as hotels, the report must separate robot perception failure from temporary site obstruction or third-party interference.

Contact attribution uses four candidate contributors:

contributors = {perception, planning_margin, site_state, operator_action}

Each contributor receives a technical contribution label: `primary`, `secondary`, `observed`, or `not_supported`. The label must include source references. If evidence is missing, the packet should say "not determined" instead of inventing a cause.

V-B3. Root-Cause Draft

The report does not assign legal fault. It identifies technical contributors: perception reclassification, route clearance, site obstruction, operator absence, or safety trigger timing.

The root-cause draft is generated only after manifest, timeline, and ownership checks pass. If checks fail, the report is downgraded to evidence summary.

V-C. Scenario 3: Command Authority Conflict

V-C1. Conflict Detection

Command conflict occurs when autonomy and teleoperation are simultaneously active without a valid arbitration rule. The clinic handoff abort scenario models network recovery during an autonomous handoff routine.

Let `u_a(t)` be autonomy command activity, `u_h(t)` be human operator command activity, and `g(t)` be the arbitration state. Conflict is:

conflict(t) = u_a(t) and u_h(t) and g(t) = undefined

The trigger condition is:

trigger_conflict = exists t in W such that conflict(t) = true

V-C2. Ownership Lock

When conflict is detected, the packet records the overlapping authorities, session IDs, safety gate action, and blocked motion. This helps distinguish an autonomy failure from a remote-assist concurrency failure.

The ownership-lock protocol is:

  1. freeze motion if active control authority is ambiguous
  2. record autonomy mode and teleop session id
  3. attach latest command packet from each authority
  4. record arbitration table state
  5. mark safety gate decision
  6. emit conflict interval

V-C3. Privacy Response

For clinical or home environments, the packet must apply privacy masks and restrict raw exports by default.

The clinic scenario requires `privacy_default = restricted`. Any external report must remove patient identifiers, operator identifiers, screen content, and raw room audio unless explicitly approved.

V-D. Redaction and Privacy Policy

Algorithm 5 shows the privacy gate applied before any external export.

Algorithm 5: Privacy Gate
Input: sealed packet BIP, audience a, policy P
Output: allow_export or block_export

1. check packet seal and schema status
2. enumerate streams referenced by report claims
3. for each stream s:
4.     check privacy class, retention policy, and audience scope
5.     require redaction proof for sensitive visual/audio streams
6.     block export if any required proof is missing
7. emit allow_export only when all checks pass

V-E. Packet Integrity

The prototype uses JSON schema validation and demonstration hashes. A production implementation should use signed manifests, append-only audit trails, key management, and export provenance.

VI. Prototype Implementation

The v0.2 project includes:

  • Next.js website
  • public technical paper route
  • public whitepaper route
  • seeded incident replay console
  • three incident scenarios
  • public Dock Aisle Near-Miss packet
  • JSON schemas for packet, stream manifest, timeline, and command ownership
  • validation script
  • redacted customer report sample
  • internal pitch deck kept outside the public website

VI-A. Public Packet Artifact

The public packet is located at:

public/samples/dock-17/
  packet.json
  streams-manifest.json
  timeline.json
  command-ownership.json

The public report is located at:

public/reports/dock-17-customer-report.md

VI-B. Tool Categories

Table I summarizes the incident stack.

CategoryCount in v0.2Examples
Evidence streams6front_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events
Timeline events6route deviation, confidence decay, merge command, worker entry, takeover, safety seal
Ownership intervals3autonomy, remote_operator, safety_system
Packet files4packet, manifest, timeline, command ownership
Public reports1redacted customer report

VI-C. Packet File Schema

The prototype stores one incident across four JSON files.

FilePrimary objectsValidation role
packet.jsonincident id, robot, site, trigger, summary, privacy, seal statepacket-level completeness
streams-manifest.jsonstream list, modality, sample rate, storage ref, hash, export scopeevidence availability
timeline.jsonrelative time events, actors, detail, source refsreconstruction order
command-ownership.jsonintervals, owner, control surface, transition reason, evidence refsauthority attribution

The schema intentionally separates stream existence from timeline claims. A stream can be present without being used in a report, and a report claim is not allowed unless it points back to evidence.

VI-D. Topic Role Mapping

Robot fleets use different topic names. Blackbox maps fleet-specific topics into stable incident roles.

Incident roleExample ROS2/robot sourcePacket role
front_rgb/camera/front/color/image_rawvisual scene
depth_cloud/camera/depth/points or /scangeometry and proximity
cmd_vel/cmd_vel or base commandexecuted base command
planner_trace/planner/debug, branch log, route plandecision support
operator_inputteleop joystick or remote pause eventhuman intervention
autonomy_mode/mode, mission state, behavior tree statedecision source
safety_eventse-stop, bumper, zone gate, force thresholdconstraint source
network_stateteleop connection, latency, reconnectcommand conflict context

The mapping file is a deployment artifact. Without it, teams can record data but cannot reliably compare incident packets across robots.

VI-E. Packet Lifecycle State Machine

The packet lifecycle is:

buffering -> triggered -> assembling -> validating -> sealed -> reported -> archived

State transitions have explicit failure paths:

StateFailure conditionResult
triggeredmissing pre-windowpacket marked partial
assemblingstream hash unavailablepacket marked unsealed
validatingschema failurereport export blocked
sealedhash mismatch after sealpacket invalidated
reportedprivacy check failurecustomer export blocked
archivedretention expiredraw data removed, metadata retained

VI-F. Report Views

Blackbox produces audience-specific views from the same packet.

ViewAudienceIncludedExcluded by default
Engineering replayrobotics engineeringraw refs, timeline, command intervals, validation errorscustomer-only notes
Operations summaryfleet opstrigger, downtime, operator action, next stepmodel internals unless needed
Customer reportcustomer / site ownerredacted timeline, corrective action, evidence summaryraw video, operator identity
Insurer/safety exportinsurer or safety reviewertimeline, evidence manifest, integrity statusunnecessary private footage

This view separation is central to the product. The same incident must be technically useful internally and externally safe to share.

VII. Experiments

VII-A. Experimental Setup

We evaluate the prototype on a seeded incident corpus, not real customer logs. The corpus contains three deployment archetypes:

  1. Dock Aisle Near-Miss: warehouse AMR and worker proximity.
  2. Service Cart Contact: hotel delivery robot and temporary obstruction.
  3. Clinic Handoff Abort: mobile manipulator and command authority conflict.

Only the Dock Aisle Near-Miss currently has a complete public packet. The other two scenarios are represented in the replay console and define next conversion targets.

The evaluation environment is the local project workspace. It includes a Next.js replay surface, JSON packet artifacts, schemas, and a Node.js validation script. Because this is not a production recorder, we do not evaluate camera encoding throughput, ROS2 subscription backpressure, or signed-manifest latency. Instead, the experiment evaluates whether the packet artifact can represent and validate a complete incident reconstruction.

VII-A1. Seeded Corpus Construction

Each seeded incident is constructed from five elements:

  1. incident narrative
  2. evidence source list
  3. timeline events
  4. responsibility or ownership trace
  5. stakeholder report target

The corpus is intentionally scenario-diverse. The near miss stresses human proximity and operator takeover. The contact event stresses external site evidence and property-damage explanation. The clinic abort stresses command conflict and privacy constraints.

VII-A2. Required Source Roles

ScenarioRequired source roles
Dock Aisle Near-Missfront_rgb, depth_cloud, cmd_vel, planner_trace, operator_input, safety_events
Service Cart Contactrobot_camera, bumper_trace, cmd_vel, localization, site_cctv
Clinic Handoff Abortarm_joints, command_owner, teleop_packets, privacy_masks, safety_gate

VII-A3. Evaluation Boundaries

The experiment does not claim that Blackbox detects incidents in the wild. It evaluates post-incident packet reconstruction. The current prototype assumes that seed events and streams exist. Production work must add live triggers, robust buffering, and signed sealing.

VII-B. Evaluation Metrics

We use six metrics.

MetricDefinitionPurpose
Source coverageRequired evidence roles present in manifestMeasures whether review has enough streams
Timeline consistencyMonotonic events with source referencesMeasures reconstruction coherence
Ownership coverageAction window covered by ownership intervalsMeasures command authority traceability
Schema validityJSON schema validator resultMeasures machine-checkable packet structure
Report readinessWhether stakeholder report can be generatedMeasures external communication readiness
Privacy separationRaw and redacted evidence access separatedMeasures shareability risk

VII-C. Public Packet Results

ArtifactValue
Evidence window330 s
Packet files4
Evidence streams6
Timeline events6
Ownership intervals3
Redacted customer reports1
Validator resultpass
Validator wall time2.87 s

The public packet reaches full source coverage for its declared required roles:

C_source = 6 / 6 = 1.00

The timeline contains six events and no timestamp reversals in the seeded artifact:

V_timeline = 1.00

The command-ownership intervals cover the full action window from -300 s to 0 s:

C_owner = 1.00

Using the prototype quality score with all declared checks passing:

Q_packet = 1.00

This value should be read carefully. It means the seeded packet is internally complete under its declared schema. It does not mean that a real deployment packet would always score 1.00.

VII-D. Scenario Coverage

ScenarioRequired source rolesPresent in v0.2Timeline statusOwnership statusReport status
Dock Aisle Near-Miss66complete3 intervalscustomer-safe report
Service Cart Contact55 in consoleseededplanned packetengineering draft
Clinic Handoff Abort55 in consoleseededplanned conflict intervalsprivacy-focused draft

VII-E. Baselines

We compare against three practical baselines.

BaselineDescriptionLimitation
Manual log reviewEngineer searches logs, video, tickets, and dashboardsSlow, non-repeatable, hard to share
Observability-only replayVisualize ROS/MCAP streamsStrong debug tool, weak stakeholder packet
Support-ticket narrativeOperator writes incident summaryNo machine-checkable evidence chain

VII-F. Baseline Comparison

Review taskManual logsObservability replayBlackbox packet
Identify event windowoperator-dependentpossible but manualexplicit trigger window
Locate sourcesscatteredstream browsermanifest
Reconstruct sequencemanualvisual playbacktimeline with references
Attribute authorityinferredpartialownership intervals
Share externallyhand-writtenrarely saferedacted report
Validate completenessad hocad hocschema validation

VII-G. Ablation Study

We evaluate the packet concept by removing major components.

ConfigurationSource coverageTimeline validityOwnership coverageExternal reportabilityExpected failure mode
Full packet1.001.001.00passreviewable packet
Without manifestundefined1.001.00blockedreviewer cannot tell which sources exist or are missing
Without timeline1.00undefined1.00weakraw streams remain hard to explain to non-engineering stakeholders
Without ownership intervals1.001.00undefinedweakremote takeover and safety gating collapse into vague autonomy state
Without privacy policy1.001.001.00blockedcustomer report risks exposing raw video or operator data
Without schema validationunknownunknownunknownblockedpacket completeness becomes manual and inconsistent

VII-H. Response Time Decomposition

The prototype benchmark separates reconstruction time into conceptual phases. Only schema validation is measured directly in v0.3; the other phases are design targets for the production recorder.

Phasev0.3 statusMeasurement or target
Trigger detectiondesign target< 250 ms after trigger event
Buffer freezedesign target< 1 s for bounded window index
Manifest assemblyprototype artifactmanual/seeded
Timeline alignmentprototype artifactmanual/seeded
Ownership attributionprototype artifactmanual/seeded
Schema validationmeasured2.87 s
Customer report generationprototype artifactmanual/seeded

The goal for a production v1 system is not real-time root-cause analysis. It is reliable packet sealing soon enough that evidence is not lost and review can begin quickly.

VII-I. Command-Ownership Case Study

The Dock Aisle Near-Miss packet contains three intervals:

IntervalOwnerControl surfaceEvidence
-300.000 s to -2.200 sautonomyplanner plus mobile base controllerplanner_trace, cmd_vel
-2.200 s to -0.260 sremote_operatorteleop_pauseoperator_input, cmd_vel
-0.260 s to 0.000 ssafety_systemsupervised_safety_stopsafety_events, cmd_vel

This decomposition prevents a misleading single-cause narrative. The event involves site obstruction, perception confidence decay, planning margin, operator intervention, and safety stop confirmation.

VII-J. Computational Performance

The current prototype measures packet validation, not full stream ingestion. Running:

/usr/bin/time -p npm run validate:packet

produced:

real 2.87
user 0.95
sys 0.35

This result is not a production latency benchmark. It only verifies that the current packet schema and validator operate within a practical development workflow.

VII-K. Comparison with Baseline Systems

SystemEvidence boundaryOwnership attributionPrivacy-aware exportMachine validationReview repeatability
Support ticket onlynonomanualnolow
Raw ROS/MCAP logspartialmanualnopartialmedium
Fleet dashboardevent statepartialnonomedium
Observability replaystreamsmanualnopartialmedium-high
Blackbox packetyesyesyesyeshigh

This comparison is functional rather than commercial. Blackbox should integrate with observability replay; the comparison shows why replay alone is not an incident system of record.

VII-L. Real-World Validation Plan

The next evaluation requires design-partner incidents. The study should collect three incidents per partner:

  1. one near miss or safety stop
  2. one remote takeover or command conflict
  3. one customer-facing task failure or low-speed contact

For each incident, reviewers compare the existing workflow against the Blackbox packet. Metrics should include:

  • time-to-understanding
  • reviewer agreement
  • missing-source rate
  • report approval rate
  • customer-shareability rating
  • corrective-action clarity

VII-M. Proposed Reviewer Study

The first human-subject-free reviewer study can be conducted internally with robotics engineers and operations reviewers. Each reviewer receives the same seeded incident in two formats: baseline logs/report fragments and Blackbox packet. The reviewer answers:

  1. What happened?
  2. Which system layer most needs engineering review?
  3. Who or what controlled the robot during the critical interval?
  4. Which evidence source supports the claim?
  5. Could this summary be shared with a customer?

The primary metric is time-to-understanding. Secondary metrics are answer agreement, unsupported-claim count, and confidence calibration.

Study variableBaseline conditionBlackbox condition
Evidencescattered logs and notespacket manifest and timeline
Ownershipinferred manuallyinterval table
Privacyreviewer judgmentexplicit export policy
Outputfree-form answerstructured report

VII-N. Production Recorder Benchmark Targets

The production recorder should be evaluated separately from the current packet prototype.

BenchmarkTarget
Recorder CPU overhead< 10% on edge host
Dropped command messages0 during incident window
Video frame retention> 99% in bounded window
Packet trigger-to-freeze time< 1 s
Manifest creation time< 5 s for 5 min window
Customer report generation< 60 s after packet seal
Hash verificationpass before export

These targets are design constraints, not achieved v0.3 results.

VIII. Discussion

VIII-A. Why Packetization Matters

Logs are necessary but not sufficient. A packet makes the incident a shared object. Engineering, operations, customer success, insurance, and safety teams can discuss the same evidence boundary instead of building separate narratives.

VIII-B. Why Command Ownership Matters

Autonomy state is too coarse for modern robot fleets. Many incidents occur under shared autonomy, remote assist, cloud orchestration, and safety gating. COA exposes authority transitions that would otherwise be hidden.

VIII-C. Privacy as a Reconstruction Constraint

In homes, clinics, hotels, offices, and public spaces, raw replay is often not shareable. Redaction, retention, and access scope must be packet properties, not post-hoc edits.

VIII-D. Relationship to Existing Robotics Tools

Blackbox should integrate with ROS2, MCAP, Foxglove-style visualization, fleet operations dashboards, and support systems. The product wedge is not generic observability. It is incident evidence assembly, attribution, and reportability.

IX. Limitations and Failure Analysis

IX-A. Current Limitations

  1. The current system is a prototype.
  2. The seeded incidents are realistic but fictional.
  3. Only one public packet is complete.
  4. No real customer incident corpus has been evaluated.
  5. Hashes and sealing are demonstration-grade.
  6. Video redaction is represented as metadata and report policy, not production video processing.
  7. The system does not assign legal fault.
  8. The system does not certify safety compliance.
  9. The system does not prevent incidents.

IX-B. Failure Modes

Failure modeImpactMitigation
Missing streamTimeline cannot support a claimexplicit missing-source flag
Clock driftEvents appear out of orderoffset estimation and confidence scoring
Ambiguous command authorityOwnership interval cannot be resolvedconflict marker and arbitration table
Privacy mask failureReport cannot be shared externallyraw export block and manual review
Hash mismatchPacket integrity compromisedseal invalidation and audit trail
Overconfident root-cause draftReviewer may treat technical contribution as legal faultexplicit non-fault language

IX-C. Deployment Considerations

A production system must solve:

  • edge recorder reliability
  • low-overhead stream selection
  • MCAP/ROS2 ingest
  • signed packet manifests
  • key management
  • retention policy enforcement
  • customer-specific privacy rules
  • reviewer audit trails
  • fleet-scale storage cost

X. Future Work

Future work includes:

  • production ROS2 and MCAP ingest
  • edge recorder with rolling buffer
  • signed manifests and append-only audit logs
  • video and image redaction service
  • command arbitration table for teleoperation systems
  • Foxglove-compatible export
  • insurer-facing packet view
  • fleet incident graph
  • design-partner user study
  • time-to-understanding benchmark
  • reviewer agreement benchmark

XI. Conclusion

Blackbox Robotics proposes incident packetization, replay, and command-ownership attribution for deployed robot fleets. The central claim is that robot incidents should produce structured, schema-valid evidence packets rather than scattered logs and informal narratives. By defining BIP and COA, the system makes incident reconstruction more explicit: what happened, when it happened, which evidence supports it, who controlled the robot, what privacy constraints apply, and what can be shared externally.

The current prototype demonstrates the artifact and review workflow through a public Dock Aisle Near-Miss packet and two additional seeded scenarios. The next proof is field validation with design partners. If validated, Blackbox can become a practical evidence layer between robot operations, engineering debugging, customer trust, insurance review, and safety governance.

References

[1] International Federation of Robotics, "Executive Summary World Robotics 2025 - Service Robots," 2025. https://ifr.org/img/worldrobotics/Executive_Summary_WR_2025_Service_Robots.pdf

[2] ISO, "ISO 10218-1:2025 Robotics - Safety requirements - Part 1: Industrial robots," 2025. https://www.iso.org/standard/73933.html

[3] ROS 2 Documentation, "Recording a bag from a node." https://docs.ros.org/en/rolling/Tutorials/Advanced/Recording-A-Bag-From-Your-Own-Node-CPP.html

[4] MCAP, "Open source container file format for multimodal log data." https://mcap.dev/

[5] NHTSA, "Event Data Recorder." https://www.nhtsa.gov/research-data/event-data-recorder

[6] A. Butler, S. Izadi, and M. Cakmak, "The Privacy-Utility Tradeoff for Remotely Teleoperated Robots," ACM/IEEE International Conference on Human-Robot Interaction, 2015. https://hcrlab.cs.washington.edu/publications/butler2015hri/

[7] "Identifying human-robot interaction incident archetypes: a system and network analysis of accidents," Safety Science, Volume 191, 2025. https://www.sciencedirect.com/science/article/pii/S0925753525001845

[8] Foxglove, robotics observability and visualization. https://foxglove.dev/

[9] Formant, fleet observability. https://docs.formant.io/docs/fleet-observability

[10] InOrbit, robot operations platform. https://www.inorbit.ai/

[11] T. N. Canh, T. T. Viet, T. T. Tran, and B. W. Lim, "SafeGuard ASF: SR Agentic Humanoid Robot System for Autonomous Industrial Safety," arXiv:2603.25353, 2026. https://arxiv.org/html/2603.25353v1