µScope Trace Format Specification

Version: 0.3-draft Magic: uSCP (0x75 0x53 0x43 0x50) Byte order: Little-endian (all multi-byte integers throughout the file, including field values in event payloads, checkpoint slot data, and summary entries) Alignment: All section offsets are 8-byte aligned

1. Overview

µScope is a binary trace format for cycle-accurate hardware introspection.

1.1 Layered Architecture

µScope is structured as two distinct layers:

flowchart TD
  proto["<b>Protocol Layer</b><br />Defines semantic meaning for a specific DUT type.<br />Contains reconstruction logic, decoders, and visualization rules.<br /><i>NOT part of this specification.</i>"]
  transport["<b>Transport Layer</b><br />The file format (this document). Knows about:<br />Storages · Events · Checkpoints + deltas · Summaries<br />Knows NOTHING about CPUs, pipelines, caches,<br />entities, counters, annotations, or any specific hardware."]
  proto --- transport

The protocol layer defines semantics, decoders, and visualization. The transport layer defines the binary format and read/write APIs.

1.2 Core Primitives

Primitive	What it models
Storage	A named array of typed slots
Event	A timestamped occurrence with a typed payload

All primitives are schema-defined. The transport layer imposes no assumptions about their fields, types, or semantics.

Everything else — entities, counters, annotations, dependencies, markers — is modeled using these two primitives and interpreted by the protocol layer.

1.3 String Representation

All human-readable strings in the format (field names, enum labels, DUT properties, etc.) are stored in a single string pool. Structures reference strings by uint16_t offset into the pool (max 64 KB, sufficient for any realistic schema).

The string pool is null-terminated UTF-8 sequences packed sequentially, stored at the end of the schema chunk payload. Both the DUT descriptor and schema definitions reference it.

The optional string table section (§7) stores runtime strings referenced by FT_STRING_REF fields in delta data. An FT_STRING_REF value is a 0-based index into the string table's entries array.

1.4 File Layout

The file has two regions: a fixed preamble written at trace creation, and an append region that grows during simulation.

During simulation, only segments are appended. At close, finalization data is written and the file header is rewritten with final values.

1.5 Access Patterns

µScope supports three access patterns:

Pattern	When	Mechanism
Streaming write	During simulation	Append segments, update `tail_offset`
Live read	While writer is still running	Follow `tail_offset` → `prev` chain
Random access	After finalization	Binary search segment table

See §8 for details.

1.6 Time Model

All timestamps in µScope are in picoseconds (ps). This provides a universal time axis that accommodates multiple clock domains without conversion loss — every practical hardware clock period is an integer number of picoseconds (e.g., 5 GHz → 200 ps, 800 MHz → 1250 ps).

Cycle-frame deltas, segment boundaries, and summary buckets all use picosecond timestamps. The schema defines clock domains (§4.9), each with a name and period. Scopes are assigned to clock domains so the viewer can display domain-local cycle numbers (by dividing timestamps by the clock period).

Writers emit cycle-frame deltas equal to the clock period of the active domain (e.g., 200 for a 5 GHz clock). LEB128 encoding keeps this compact (1–2 bytes), and segment-level compression handles the repeating patterns efficiently.

2. File Header

Offset 0. Fixed size: 48 bytes.

typedef struct {
    uint8_t  magic[4];              // "uSCP" = {0x75, 0x53, 0x43, 0x50}
    uint16_t version_major;         // 0
    uint16_t version_minor;         // 2
    uint64_t flags;                 // §2.1
    uint64_t total_time_ps;         // total trace duration in picoseconds (0 until finalized)
    uint32_t num_segments;          // updated after each segment flush
    uint32_t preamble_end;          // file offset where segments begin
    uint64_t section_table_offset;  // 0 until finalized
    uint64_t tail_offset;           // file offset of last segment header (0 = none)
} file_header_t;                    // 48 bytes

The preamble (§2.3) immediately follows the header at offset 48 and extends to preamble_end. Readers scan preamble chunks to locate the DUT descriptor, schema, and trace configuration.

2.1 Flags

Bit	Name	Description
0	`F_COMPLETE`	Trace was cleanly finalized
1	`F_COMPRESSED`	Delta segments use compression
2	`F_HAS_STRINGS`	String table section present
3-5	`F_COMP_METHOD`	Compression method (0=LZ4, 1=ZSTD; 2–7 reserved, must not be used). LZ4 support is mandatory for all readers; ZSTD is optional.
6	`F_COMPACT_DELTAS`	Delta blobs may contain compact ops (§8.6.3). Ignored when `F_INTERLEAVED_DELTAS` is set.
7	`F_INTERLEAVED_DELTAS`	v0.2 interleaved frame format (§8.6.6). Ops and events use self-describing tags.
8-63	Reserved	Must be zero

2.2 Header Lifecycle

Field	At open	After each segment	At close
`magic`	`uSCP`	—	—
`flags`	`F_COMPRESSED` etc	—	`F_COMPLETE` set
`total_time_ps`	0	—	final value
`num_segments`	0	incremented	final value
`preamble_end`	final value	—	—
`section_table_offset`	0	—	final offset
`tail_offset`	0	offset of new segment	offset of last segment

After fully writing a segment, the writer commits it in this order:

Write segment data (header + checkpoint + deltas) at EOF
Memory barrier / fsync
Write tail_offset (single naturally-aligned 8-byte write — the commit point)
Write num_segments (single naturally-aligned 4-byte write — advisory)

A live reader uses tail_offset as the sole authoritative indicator of new data. num_segments may lag by one during live reads.

2.3 Preamble Chunks

The preamble immediately follows the file header (offset 48) and consists of a sequence of typed chunks. Each chunk has an 8-byte header:

typedef struct {
    uint16_t type;                  // chunk type
    uint16_t flags;                 // must be 0 (reserved for future use)
    uint32_t size;                  // payload size in bytes
    // uint8_t payload[size];
    // padding to 8-byte alignment (0-filled)
} preamble_chunk_t;                 // 8 bytes + payload + padding

Chunk payloads are padded to 8-byte alignment. The next chunk starts at offset 8 + align8(size) from the current chunk header.

enum preamble_chunk_type : uint16_t {
    CHUNK_END          = 0x0000,    // terminates the preamble
    CHUNK_DUT_DESC     = 0x0001,    // DUT descriptor (§3)
    CHUNK_SCHEMA       = 0x0002,    // schema definition (§4)
    CHUNK_TRACE_CONFIG = 0x0003,    // trace session parameters (§2.4)
    // future: CHUNK_ELF, CHUNK_SOURCE_MAP, ...
};

Mandatory chunks: A valid file must contain exactly one each of CHUNK_DUT_DESC, CHUNK_SCHEMA, and CHUNK_TRACE_CONFIG. Readers must reject files missing any of these.

Unknown chunks: Readers must skip chunk types they do not recognize (advance by 8 + align8(size) bytes). This allows older readers to open files written by newer writers that add new chunk types.

Ordering: Writers should emit chunks in the order DUT → Schema → Trace Config, but readers must not depend on ordering.

2.4 Trace Configuration Chunk

Session-level parameters that govern how the trace was captured.

typedef struct {
    uint64_t checkpoint_interval_ps; // picoseconds between checkpoints
} trace_config_t;                    // 8 bytes (CHUNK_TRACE_CONFIG payload)

3. DUT Descriptor

CHUNK_DUT_DESC payload. Identifies what is being traced.

typedef struct {
    uint16_t num_properties;
    uint16_t reserved;              // must be 0
    // dut_property_t properties[num_properties];
} dut_desc_t;                       // 4 bytes + properties

3.1 DUT Properties

typedef struct {
    uint16_t key;                   // offset into string pool
    uint16_t value;                 // offset into string pool
} dut_property_t;                   // 4 bytes

Properties are opaque key-value pairs. The transport layer does not interpret them — only the protocol layer does. Protocol version, vendor, DUT name, and any domain-specific metadata are all properties.

Example properties for an OoO CPU (protocol-specific keys use a prefix to avoid collisions in multi-protocol traces):

Key	Value
`dut_name`	`boom_core_0`
`cpu.vendor`	`acme`
`cpu.protocol_version`	`0.1`
`cpu.isa`	`RV64IMAFDCV`
`cpu.pipeline_depth`	`12`
`cpu.elf_path`	`/path/to/fw.elf`

4. Schema

CHUNK_SCHEMA payload. The schema defines the structure of all data in the trace. Written once at trace creation, immutable thereafter. Fully self-describing — a viewer can parse and display data without a protocol plugin.

4.1 Schema Header

typedef struct {
    uint8_t  num_enums;             // max 255 enum types
    uint8_t  num_clock_domains;     // max 255 clock domains
    uint16_t num_scopes;
    uint16_t num_storages;
    uint16_t num_event_types;
    uint16_t num_summary_fields;
    uint16_t string_pool_offset;    // offset from schema start to string pool
    // Followed by, in order:
    //   clock_domain_def_t      clocks[num_clock_domains]
    //   scope_def_t             scopes[num_scopes]
    //   enum_def_t              enums[num_enums]              (variable-size)
    //   storage_def_t           storages[num_storages]        (variable-size)
    //   event_def_t             event_types[num_event_types]  (variable-size)
    //   summary_field_def_t     summary_fields[num_summary_fields]
    //   <string pool>
} schema_header_t;                  // 12 bytes

4.2 Field Types

enum field_type : uint8_t {
    FT_U8           = 0x01,
    FT_U16          = 0x02,
    FT_U32          = 0x03,
    FT_U64          = 0x04,
    FT_I8           = 0x05,
    FT_I16          = 0x06,
    FT_I32          = 0x07,
    FT_I64          = 0x08,
    FT_BOOL         = 0x09,         // 1 byte
    FT_STRING_REF   = 0x0A,         // uint32_t index into string table entries[]
    FT_ENUM         = 0x0B,         // uint8_t index into a named enum
};

4.3 Field Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  type;                  // field_type (size derived from type)
    uint8_t  enum_id;               // if type==FT_ENUM, else 0
    uint8_t  reserved[4];
} field_def_t;                      // 8 bytes

Field size is derived from the type:

Type	Size (bytes)
FT_U8, FT_I8, FT_BOOL, FT_ENUM	1
FT_U16, FT_I16	2
FT_U32, FT_I32, FT_STRING_REF	4
FT_U64, FT_I64	8

4.4 Scope Definition

Scopes define a hierarchical tree for organizing storages and events. The schema must contain at least one scope: scope 0 is the root scope (conventionally named /).

typedef struct {
    uint16_t name;              // offset into string pool
    uint16_t scope_id;          // 0-based; scope 0 = root
    uint16_t parent_id;         // parent scope_id, 0xFFFF = root (only valid for scope 0)
    uint16_t protocol;          // offset into string pool, 0xFFFF = no protocol
    uint8_t  clock_id;          // clock domain index (§4.9), 0xFF = inherit from parent
    uint8_t  reserved[3];
} scope_def_t;                  // 12 bytes

Each scope optionally declares a protocol — a string identifying which protocol layer applies (e.g., "cpu", "dma", "noc"). The viewer uses the protocol to select the appropriate plugin for that subtree. Scopes with protocol = 0xFFFF have no protocol and are rendered generically.

There is no protocol inheritance. Each scope that needs a protocol must declare it explicitly. The root scope typically has no protocol.

Protocol identifiers: Vendor-specific protocols use a dotted prefix: axelera.loom_core. The protocol generic (or no protocol) means the viewer renders raw schema data without interpretation.

4.5 Enum Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  num_values;
    uint8_t  reserved;
    // enum_value_t values[num_values];
} enum_def_t;                       // 4 bytes + values

typedef struct {
    uint8_t  value;                 // numeric value
    uint8_t  reserved;
    uint16_t name;                  // offset into string pool
} enum_value_t;                     // 4 bytes

4.6 Storage Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint16_t storage_id;            // 0-based
    uint16_t num_slots;
    uint16_t num_fields;
    uint16_t flags;                 // §4.6.1
    uint16_t scope_id;              // owning scope, 0xFFFF = root-level
    uint16_t num_properties;        // v0.3: number of storage-level properties
    uint16_t reserved;              // v0.3: must be 0
    // field_def_t fields[num_fields];
    // field_def_t properties[num_properties];   // v0.3
} storage_def_t;                    // 16 bytes + fields + properties

4.6.2 Storage Properties (v0.3)

Storage properties are named, typed scalar values attached to a storage (not per-slot). They are checkpointed and updated via DA_PROP_SET deltas. Use cases include buffer pointers (retire_ptr, allocate_ptr) and other storage-level metadata that changes each cycle.

Properties are defined in the schema as field_def_t entries appended after the slot field definitions. Each property has a name, type, and optional enum_id, following the same rules as slot fields.

4.6.1 Storage Flags

Bit	Name	Description
0	`SF_SPARSE`	Checkpoints store only valid entries + bitmask
1	`SF_BUFFER`	Buffer storage — sparse storage used as a named buffer (e.g., ROB, issue queue). The protocol layer uses this flag to detect buffer storages for dedicated visualization.
2-15	Reserved

For SF_SPARSE storages, slot validity is tracked by the transport:

DA_SLOT_SET on any field of an invalid slot implicitly marks it valid.
DA_SLOT_CLEAR marks a slot invalid.

Non-sparse storages have all slots always valid.

4.7 Event Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint16_t event_type_id;         // 0-based
    uint16_t num_fields;
    uint16_t scope_id;              // owning scope, 0xFFFF = root-level
    // field_def_t fields[num_fields];
} event_def_t;                      // 8 bytes + fields

4.8 Summary Field Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  type;                  // field_type (size derived from type, see §4.3)
    uint8_t  reserved;
    uint16_t scope_id;              // owning scope (same field as in storage/event defs)
    uint16_t reserved2;
} summary_field_def_t;              // 8 bytes

Summary fields are scoped: in a multi-scope trace, each scope has its own set of summary fields. Fields with the same name in different scopes are independent (e.g., core0/committed vs. core1/committed).

Summary fields are opaque to the transport. The writer computes and writes values; the transport stores and retrieves them. What each field means (counter rate, storage occupancy, event frequency) is the protocol layer's concern.

4.9 Clock Domain Definition

typedef struct {
    uint16_t name;                  // offset into string pool (e.g., "core_clk")
    uint16_t clock_id;              // 0-based
    uint32_t period_ps;             // clock period in picoseconds (0 = unknown)
} clock_domain_def_t;               // 8 bytes

Each clock domain defines a named clock with a period in picoseconds. Scopes reference clock domains via clock_id in scope_def_t (§4.4).

The viewer uses period_ps to convert picosecond timestamps to domain-local cycle numbers for display (cycle = timestamp / period_ps).

A trace must define at least one clock domain. If the DUT has a single clock, one domain suffices. Multi-clock SoCs define one domain per distinct clock frequency.

Example	`period_ps`	Frequency
`core_clk`	200	5.0 GHz
`bus_clk`	1000	1.0 GHz
`mem_clk`	1250	800 MHz
`slow_periph_clk`	30000	33.3 MHz

5. Section Table

Written at finalization only (when F_COMPLETE is set).

enum section_type : uint16_t {
    SECTION_END              = 0x0000,
    SECTION_SUMMARY          = 0x0001,
    SECTION_STRINGS          = 0x0002,
    SECTION_SEGMENTS         = 0x0003,
    SECTION_COUNTER_SUMMARY  = 0x0010,  // trace summary (counter mipmaps + instruction density)
};

typedef struct {
    uint16_t type;
    uint16_t flags;
    uint32_t reserved;
    uint64_t offset;
    uint64_t size;
} section_entry_t;                  // 24 bytes

The table is terminated by a SECTION_END entry.

For incomplete files (F_COMPLETE not set), section_table_offset is 0 and the section table does not exist. Readers must use the segment chain (§8.2) to discover segments.

6. Trace Summary Section (TSUM)

Written at finalization into a SECTION_COUNTER_SUMMARY section. Contains instruction density mipmaps and per-counter mipmaps in a self-contained blob.

6.1 TSUM Wire Format

Offset  Size   Field
──────  ─────  ──────────────────────────────────
0       4      magic: b"TSUM" (0x54 0x53 0x55 0x4D)
4       4      base_interval_cycles (u32 LE)
8       4      fan_out (u32 LE)
12      8      total_instructions (u64 LE)
                                                    ─── 20 bytes fixed header ───

20      4      num_density_levels (u32 LE)
        ...    For each density level:
                 4 bytes: num_entries (u32 LE)
                 num_entries × 4 bytes: instruction counts (u32 LE each)

        4      num_counters (u32 LE)
        ...    For each counter:
                 4 bytes: name_len (u32 LE)
                 name_len bytes: name (UTF-8, not null-terminated)
                 2 bytes: storage_id (u16 LE)
                 4 bytes: num_levels (u32 LE)
                 For each level:
                   4 bytes: num_entries (u32 LE)
                   num_entries × 24 bytes: mipmap entries

6.2 Mipmap Entry

Each mipmap entry is 24 bytes:

Offset	Size	Field	Description
0	8	`min_delta`	Minimum per-cycle delta in this bucket
8	8	`max_delta`	Maximum per-cycle delta in this bucket
16	8	`sum`	Total delta accumulated in this bucket

6.3 Backward Compatibility (CSUM)

Readers must also accept the legacy CSUM magic (b"CSUM", 0x43 0x53 0x55 0x4D) which predates instruction density support. The CSUM layout is:

0       4      magic: b"CSUM"
4       4      base_interval_cycles (u32 LE)
8       4      fan_out (u32 LE)
12      4      num_counters (u32 LE)
        ...    Counter mipmaps (same format as TSUM)

When reading CSUM, set total_instructions = 0 and instruction_density = [].

7. String Table (Optional)

For runtime strings referenced by FT_STRING_REF fields in storage slots or event payloads. Written at finalization.

typedef struct {
    uint32_t num_entries;
    uint32_t reserved;
    // string_index_t entries[num_entries];
    // followed by packed null-terminated string data
} string_table_header_t;

typedef struct {
    uint32_t offset;                // byte offset into string data (relative to end of entries array)
    uint32_t length;                // string length in bytes (excluding null terminator)
} string_index_t;                   // 8 bytes

An FT_STRING_REF field value is a 0-based index into the entries[] array. The reader looks up entries[value] to get the offset and length of the string data. Writers assign sequential indices starting from 0.

8. Segments

A segment is one checkpoint-interval's worth of data: a full state snapshot (checkpoint) followed by compressed cycle-by-cycle deltas.

8.1 Segment Header

Each segment is self-describing and linked to the previous segment, forming a backward chain.

typedef struct {
    uint32_t segment_magic;         // "uSEG" = {0x75, 0x53, 0x45, 0x47}
    uint32_t flags;
    uint64_t time_start_ps;         // segment start time in picoseconds
    uint64_t time_end_ps;           // exclusive
    uint64_t prev_segment_offset;   // file offset of previous segment (0 = first)
    uint32_t checkpoint_size;
    uint32_t deltas_compressed_size;
    uint32_t deltas_raw_size;
    uint32_t num_frames;            // number of cycle_frame records in decompressed delta blob
    uint32_t num_frames_active;     // frames with at least one op or event
    uint32_t reserved;
    // checkpoint data (checkpoint_size bytes)
    // compressed delta data (deltas_compressed_size bytes)
} segment_header_t;                 // 56 bytes

The segment_magic field allows validation when walking the chain and recovery of incomplete files.

8.2 Segment Chain

Segments form a singly-linked list via prev_segment_offset, traversable from tail_offset in the file header backward to the first segment (prev_segment_offset == 0).

flowchart LR
  S2["Segment 2<br />[400ns,600ns)<br />prev→S1"] -->|prev| S1["Segment 1<br />[200ns,400ns)<br />prev→S0"]
  S1 -->|prev| S0["Segment 0<br />[0,200ns)<br />prev=0"]
  tail(["tail_offset"]) -.->|points to| S2

8.3 Segment Table (Finalization Only)

At close, the writer builds a flat segment table for fast random access. This table is referenced by SECTION_SEGMENTS in the section table.

typedef struct {
    uint64_t offset;                // file offset of segment_header_t
    uint64_t time_start_ps;
    uint64_t time_end_ps;           // exclusive
} segment_index_entry_t;            // 24 bytes

Binary search on time_start_ps gives O(log n) seek to any timestamp.

8.4 Reading Strategies

Finalized file (F_COMPLETE set):

Read file header → preamble_end, section_table_offset
Scan preamble chunks → extract DUT, schema, trace config
Read section table → find SECTION_SEGMENTS
Binary search segment table for target timestamp → get segment offset
Read segment header + checkpoint + deltas at that offset

Live file (F_COMPLETE not set):

Read file header → preamble_end, tail_offset
Scan preamble chunks → extract DUT, schema, trace config
Read segment at tail_offset → follow prev_segment_offset chain
Build in-memory segment index (done once, O(n) in segments)
To check for new data: re-read tail_offset from file header

Streaming write (writer perspective):

Write file header (with tail_offset=0) + preamble chunks + CHUNK_END
Set preamble_end in file header
For each checkpoint interval: a. Write segment_header_t + checkpoint + compressed deltas at EOF b. Rewrite tail_offset and num_segments in file header
At close: write string table, summary, segment table, section table; set F_COMPLETE; rewrite file header with final values

8.5 Checkpoint Format

A checkpoint is a sequence of storage blocks, one per storage.

typedef struct {
    uint16_t storage_id;
    uint16_t reserved;
    uint32_t size;                  // payload size in bytes
    // payload
} checkpoint_block_t;               // 8 bytes

8.5.1 Sparse Storage Block

checkpoint_block_t { storage_id, size }
uint8_t  valid_mask[ceil(num_slots/8)];
// For each set bit: slot_data[slot_size]
// v0.3: property_data[property_data_size] (if num_properties > 0)

8.5.2 Dense Storage Block

checkpoint_block_t { storage_id, size }
// slot_data[slot_size] × num_slots
// v0.3: property_data[property_data_size] (if num_properties > 0)

For storages with num_properties > 0, property values are appended after slot data as tightly-packed field values (same packing rules as slot data). The size field in the checkpoint block covers the total payload including property data.

8.6 Delta Format

8.6.1 Cycle Frame

Wire format (variable-length — not representable as a C struct):

cycle_frame:
  [LEB128]   time_delta_ps   1–10 bytes, unsigned delta in ps from previous frame
  [uint8]    op_format       0 = wide (16B delta_op_t), 1 = compact (8B delta_op_compact_t)
  [uint8]    reserved        must be 0
  [uint16]   num_ops
  [uint16]   num_events
  [repeated] ops             × num_ops  (size per op depends on op_format)
  [repeated] events          × num_events (event_record_t, variable-size)

The op_format field is only meaningful when F_COMPACT_DELTAS is set in the file header. If the flag is not set, op_format must be 0 (wide) and readers may skip checking it.

The time delta uses unsigned LEB128 encoding (same as DWARF / protobuf). Values are in picoseconds. For a 5 GHz clock (200 ps period), consecutive cycles produce a repeating delta of 200:

Delta value	Encoded bytes	Typical scenario
0	1 (0x00)	Multiple frames at same timestamp
1–127	1	Sub-ns deltas (rare)
128–16383	2	Most clock periods (e.g., 200–1250)
16384+	3+	Large idle gaps

The first frame in each segment uses segment_header_t.time_start_ps as the base, so each segment is independently decodable without prior context.

8.6.2 Delta Operations

enum delta_action : uint8_t {
    DA_SLOT_SET     = 0x01,         // set a field value
    DA_SLOT_CLEAR   = 0x02,         // mark slot invalid (sparse only)
    DA_SLOT_ADD     = 0x03,         // add value to field (for counters etc.)
    DA_PROP_SET     = 0x04,         // v0.3: set a storage-level property
};

typedef struct {
    uint8_t  action;
    uint8_t  reserved;
    uint16_t storage_id;
    uint16_t slot_index;
    uint16_t field_index;           // ignored for DA_SLOT_CLEAR; prop_index for DA_PROP_SET
    uint64_t value;                 // ignored for DA_SLOT_CLEAR
} delta_op_t;                       // 16 bytes

8.6.3 Compact Delta Variant

When the file header flag F_COMPACT_DELTAS is set, delta blobs may contain compact 8-byte ops. The op_format field in cycle_frame_t (§8.6.1) determines which layout all ops in that frame use.

typedef struct {
    uint8_t  action;
    uint8_t  storage_id_lo;         // low 8 bits of storage_id
    uint16_t slot_index;
    uint16_t field_index;
    uint16_t value16;
} delta_op_compact_t;               // 8 bytes

Compact ops have the following limitations. If any op in a frame violates these, the writer must use wide format for the entire frame:

storage_id must be 0–255
value16 is zero-extended to 64 bits; values > 65535 cannot be represented
DA_SLOT_CLEAR ignores field_index and value16 (same as wide format)

8.6.4 Event Records

typedef struct {
    uint16_t event_type_id;
    uint16_t reserved;              // must be 0
    uint32_t payload_size;
    // uint8_t payload[payload_size];
} event_record_t;

The payload_size must equal the sum of field sizes for this event type as defined in the schema. Writers must not emit a different size. Readers should validate this but may use payload_size to skip events with unrecognized event_type_id without consulting the schema.

8.6.5 Payload Wire Format

Event payloads and checkpoint slot data use the same packing rule: fields are concatenated in schema-definition order with no padding and no alignment. Multi-byte fields use little-endian byte order (as with all integers in the file). The total payload size equals the sum of all field sizes as derived from their types (see §4.3).

Checkpoint blocks (§8.5) and cycle frames within the delta blob are also tightly packed with no inter-block or intra-block padding.

8.6.6 Interleaved Frame Format (v0.2)

When the file header flag F_INTERLEAVED_DELTAS is set, cycle frames use a self-describing tagged item stream instead of separate op/event arrays. This preserves the exact call order of ops and events within a cycle, which the v0.1 format cannot represent.

cycle_frame_v2:
  [LEB128]   time_delta_ps   1–10 bytes, unsigned delta in ps
  [uint16]   num_items       total number of tagged items
  [repeated] items           × num_items (self-describing via tag byte)

Each item starts with a tag byte that determines its type and size:

Tag	Type	Total size	Layout
`0x01`	Wide op	16 bytes	`tag:u8 action:u8 storage_id:u16 slot:u16 field:u16 value:u64`
`0x02`	Compact op	8 bytes	`tag:u8 action:u8 storage_id_lo:u8 slot:u16 field:u16 value16:u16`
`0x03`	Event	8+N bytes	`tag:u8 reserved:u8 event_type_id:u16 payload_size:u32 payload[N]`

The tag byte is size-neutral: it replaces the reserved byte in wide ops and one byte of the reserved:u16 in events. The frame header shrinks by 3 bytes compared to v0.1 (no op_format, no separate counts).

Compact decision is per-frame, same logic as v0.1: if all ops in the frame satisfy storage_id ≤ 255 and value ≤ 65535, all ops use tag 0x02; otherwise all ops use tag 0x01. Events always use 0x03.

When F_INTERLEAVED_DELTAS is set, F_COMPACT_DELTAS is ignored.

Readers must support both v0.1 (§8.6.1) and v0.2 frame formats by checking the F_INTERLEAVED_DELTAS flag.

8.6.7 Compression

Per-segment, single LZ4 or ZSTD block. Method indicated in file header flags. Readers must reject files with unknown F_COMP_METHOD values.

9. Writer API

// ── Lifecycle ──
uscope_writer_t* uscope_writer_open(const char* path,
                                     const dut_desc_t* dut,
                                     const schema_t* schema,
                                     uint32_t checkpoint_interval);
void             uscope_writer_close(uscope_writer_t* w);

// ── Per-cycle ──
void uscope_begin_cycle(uscope_writer_t* w, uint64_t time_ps);

void uscope_slot_set(uscope_writer_t* w, uint16_t storage_id,
                      uint16_t slot, uint16_t field, uint64_t value);
void uscope_slot_clear(uscope_writer_t* w, uint16_t storage_id,
                        uint16_t slot);
void uscope_slot_add(uscope_writer_t* w, uint16_t storage_id,
                      uint16_t slot, uint16_t field, uint64_t value);

void uscope_event(uscope_writer_t* w, uint16_t event_type_id,
                   const void* payload);

void uscope_end_cycle(uscope_writer_t* w);

// ── Checkpoints ──
typedef void (*uscope_checkpoint_fn)(uscope_writer_t* w, void* user_data);
void uscope_set_checkpoint_callback(uscope_writer_t* w,
                                     uscope_checkpoint_fn fn, void* ud);
void uscope_checkpoint_storage(uscope_writer_t* w, uint16_t storage_id,
                                const uint8_t* valid_mask,
                                const void* slot_data,
                                uint32_t num_valid_slots);

9.1 DPI Bridge

The transport-level DPI is generic. Protocol-specific convenience wrappers are defined by each protocol, not by this spec.

import "DPI-C" function chandle uscope_open(string path);
import "DPI-C" function void    uscope_close(chandle w);

import "DPI-C" function void    uscope_begin_cycle(chandle w, longint unsigned time_ps);
import "DPI-C" function void    uscope_end_cycle(chandle w);

import "DPI-C" function void    uscope_slot_set(
    chandle w, shortint unsigned storage_id, shortint unsigned slot,
    shortint unsigned field, longint unsigned value
);
import "DPI-C" function void    uscope_slot_clear(
    chandle w, shortint unsigned storage_id, shortint unsigned slot
);
import "DPI-C" function void    uscope_slot_add(
    chandle w, shortint unsigned storage_id, shortint unsigned slot,
    shortint unsigned field, longint unsigned value
);

import "DPI-C" function void    uscope_event_raw(
    chandle w, shortint unsigned event_type_id,
    input byte unsigned payload[]
);

10. Reader API

// ── Lifecycle ──
uscope_reader_t* uscope_reader_open(const char* path);
void             uscope_reader_close(uscope_reader_t* r);

// ── Metadata ──
const file_header_t*  uscope_header(const uscope_reader_t* r);
const dut_desc_t*     uscope_dut_desc(const uscope_reader_t* r);
const schema_t*       uscope_schema(const uscope_reader_t* r);
const char*           uscope_scope_protocol(const uscope_reader_t* r,
                                             uint16_t scope_id);
const char*           uscope_dut_property(const uscope_reader_t* r,
                                           const char* key);
bool                  uscope_is_complete(const uscope_reader_t* r);

// ── Summary (finalized files only) ──
uint32_t    uscope_summary_levels(const uscope_reader_t* r);
const void* uscope_summary_data(const uscope_reader_t* r, uint32_t level,
                                 uint32_t* out_count);

// ── State reconstruction ──
uscope_state_t* uscope_state_at(uscope_reader_t* r, uint64_t time_ps);
void            uscope_state_free(uscope_state_t* s);

bool     uscope_slot_valid(const uscope_state_t* s, uint16_t storage_id,
                            uint16_t slot);
uint64_t uscope_slot_field(const uscope_state_t* s, uint16_t storage_id,
                            uint16_t slot, uint16_t field);
uint32_t uscope_storage_occupancy(const uscope_state_t* s,
                                   uint16_t storage_id);

// ── Events ──
uscope_event_iter_t* uscope_events_in_range(uscope_reader_t* r,
                                             uint64_t time_start_ps,
                                             uint64_t time_end_ps);
bool uscope_event_next(uscope_event_iter_t* it, uint64_t* time_ps,
                        uint16_t* event_type_id, const void** payload);
void uscope_event_iter_free(uscope_event_iter_t* it);

// ── Live tailing ──
bool uscope_poll_new_segments(uscope_reader_t* r);

11. Konata Trace Reconstruction

This section demonstrates that µScope's two primitives (storages + events) carry all information needed to reconstruct a Konata-format pipeline visualization.

Konata cmd	µScope equivalent
`I` (create)	`DA_SLOT_SET` on entity catalog slot
`L` (label)	Entity catalog fields (`pc`, `inst_bits`) decoded by protocol plugin
`S` (stage start)	`stage_transition` event
`E` (stage end)	Next `stage_transition` or entity cleared/flushed
`R` (retire)	`DA_SLOT_CLEAR` on entity catalog slot
`W` (flush)	`flush` event with entity ID
`C` (cycle)	Absolute timestamp (segment base + cumulative LEB128 deltas)
Dependency arrows	`dependency` events linking entity IDs

See the cpu protocol specification for the full reconstruction algorithm (§9 of that document).

12. Design Rationale

Two primitives: Storages and events are sufficient to model any time-series structured data. Entities, counters, annotations, and dependencies are protocol-level patterns built on top.
Two-layer architecture: Format never changes when adding DUT types. Only new protocol specs are written.
Schema-driven, self-describing: Unknown protocols render generically.
String pool: Arbitrary-length names, no wasted padding, smaller structs. One pool shared by DUT descriptor and schema.
No styling in transport: Colors, line styles, layout rules, display hints (hex, hidden, key) belong in the protocol layer or viewer configuration. The transport layer is pure data.
Append-only segments with backward chain: Segments are appended during simulation with no pre-allocated tables. A tail_offset in the file header lets readers discover new segments. At finalization, a flat segment table is built for fast random access. This supports streaming write, live read, and fast seek — all from a single file.
Checkpoint + delta: O(1) seek to segment, O(n) replay within segment. Cycle timestamps are LEB128 delta-encoded — 1 byte per frame for consecutive cycles instead of 8.
Mipmap summaries: O(screen_pixels) overview rendering. Summary semantics are opaque to the transport — the protocol layer defines what each field means.
Single file, section-based: Portable, self-locating sections. No sidecar files.
Chunked preamble: DUT descriptor, schema, and trace config are typed chunks with length headers. Older readers skip unknown chunk types, so new metadata can be added (embedded ELF, source maps, protocol config) without bumping the format version.
Scoped hierarchy with per-scope protocols: Storages and events are organized into a tree of scopes rooted at / (matching hardware hierarchy: SoC → tile → core). Each scope can declare its own protocol, enabling mixed-protocol traces (CPU + DMA + NoC in one file). No inheritance — each scope is explicit.

13. Comparison with FST

13.1 Core Difference: Signals vs. Structures

Aspect	FST	µScope
Data model	Flat signals (bit-vectors)	Typed structures + protocol semantics
Semantics	None	Schema (transport) + protocol (domain)
Aggregation	External post-processing	Built-in mipmaps
Extensibility	New signals only	New protocols, same format

13.2 When FST is Better

Signal-level debugging (exact wire values)
RTL verification (waveform comparison)
Tool ecosystem (GTKWave, Surfer, DVT)
Zero instrumentation cost ($dumpvars)

13.3 When µScope is Better

Microarchitectural introspection
Large structures (1024-entry ROB = one sparse storage)
Performance analysis with built-in summaries
Billion-cycle interactive exploration
Non-RTL environments (architectural simulators)
Multiple DUT types with one format

13.4 Complementary Use

Phase	Tool	Why
RTL signal debug	FST + GTKWave	Bit-accurate, zero setup
Microarch exploration	µScope viewer	Structured, schema-aware
Performance analysis	µScope summaries	Multi-resolution aggregation
Bug root-cause	µScope → FST	Find cycle in µScope, drill into FST

14. Version History

Version	Date	Changes
1.0	2025-xx-xx	Initial draft (CPU-specific)
2.0	2025-xx-xx	Architecture-agnostic, schema-driven
3.0	2025-xx-xx	Transport/protocol layer separation
3.1	2025-xx-xx	String pools, styling removed, Konata proof
4.0	2025-xx-xx	Aggressive simplification:
		— Entities removed (modeled as storages)
		— Annotations removed (modeled as events)
		— Counters removed (modeled as 1-slot storages)
		— SF_CIRCULAR, SF_CAM removed (head/tail = fields)
		— DA_HEAD, DA_TAIL, DA_COUNTER_* removed
		— Field/event/counter display flags removed
		— Summary source/aggregation semantics removed
		— Two string pools merged into one
		— DUT descriptor simplified (vendor/version = properties)
		— Delta actions: 7 → 3 (SET, CLEAR, ADD); 4 in v0.3
		— Section types: 7 → 4
		— LEB128 delta-encoded cycle in cycle frames
		— Append-only segment chain with `tail_offset`
		— Live read support (no finalization required)
		— Segment table moved to finalization-only
4.1	2026-xx-xx	Chunked preamble:
		— File header: 64 → 48 bytes
		— DUT descriptor, schema, trace config become chunks
		— `dut_desc_offset`, `schema_offset` removed from header
		— `checkpoint_interval` moved to CHUNK_TRACE_CONFIG
		— `dut_desc_t.size`, `schema_header_t.size` removed
		— Unknown chunk types skipped (forward compatibility)
		— `storage_id` widened to uint16 throughout
		— `num_enums` narrowed to uint8
		— `field_def_t.size` removed (derived from type)
		— Compact deltas: per-frame `op_format` + file flag
		— `num_deltas` → `num_cycle_frames`
		— Payload wire format specified (tight packing, LE)
		— Live-read commit ordering specified
		— Scopes: hierarchical grouping of storages/events
		— Per-scope protocol assignment (multi-protocol traces)
4.2	2026-xx-xx	Picosecond time model:
		— All timestamps in picoseconds (universal time axis)
		— Clock domain definitions in schema (name + period)
		— Scopes assigned to clock domains
		— `total_cycles` → `total_time_ps`
		— `cycle_start`/`cycle_end` → `time_start_ps`/`time_end_ps`
		— `checkpoint_interval` → `checkpoint_interval_ps`
4.3	2026-03-21	Interleaved frame format (v0.2):
		— `F_INTERLEAVED_DELTAS` flag (bit 7)
		— Tagged item stream replaces separate op/event arrays
		— Preserves call order of ops and events within a cycle
		— `version_minor` bumped to 2
		— `F_COMPACT_DELTAS` ignored when interleaved is set
4.4	2026-03-29	Trace summary + buffer flag:
		— `SF_BUFFER` storage flag (bit 1)
		— `SECTION_COUNTER_SUMMARY` section type (0x0010)
		— TraceSummary (TSUM) replaces abstract summary §6
		— Instruction density mipmap + counter mipmaps
		— Backward-compatible CSUM reader for legacy files
4.5	2026-04-01	Storage properties (v0.3):
		— `StorageDef` header: 12 → 16 bytes
		— `num_properties` + `reserved` fields added
		— `FieldDef[num_properties]` appended after slot fields
		— `DA_PROP_SET` delta action (0x04) for property updates
		— Checkpoint blocks include property data after slot data
		— `version_minor` bumped to 3
		— `SchemaHeader` size corrected: 14 → 12 bytes

15. Glossary

Term	Definition
Checkpoint	Full snapshot of all storage state at a segment boundary. Enables random access without replaying from the start.
Chunk	A typed, length-prefixed block in the preamble. Unknown chunk types are skipped for forward compatibility.
Clock domain	A named clock with a period in picoseconds. Scopes are assigned to clock domains for cycle-number display.
Cycle frame	One timestamp's worth of delta operations and events. v0.1: separate op/event arrays (§8.6.1). v0.2: interleaved tagged items preserving call order (§8.6.6).
Delta	A single state change within a cycle frame (`DA_SLOT_SET`, `DA_SLOT_CLEAR`, `DA_SLOT_ADD`, or `DA_PROP_SET`).
DUT	Device Under Test. The hardware being traced.
Event	A timestamped occurrence with a schema-defined typed payload. Fire-and-forget (no persistent state).
Finalization	The process of writing summary, string table, segment table, and section table at trace close. Sets `F_COMPLETE`.
LEB128	Little-Endian Base 128. Variable-length unsigned integer encoding. Used for time deltas.
Mipmap	Multi-resolution summary pyramid. Each level aggregates the level below by a fan-out factor.
Preamble	The chunk stream between the file header and the first segment. Contains DUT, schema, and trace config.
Protocol layer	Defines semantic meaning for a specific DUT type (e.g. `cpu`). Assigned per-scope. Not part of this spec.
Schema	Immutable definition of all scopes, storages, events, enums, and summary fields. Written once at creation.
Scope	A named node in a hierarchical tree rooted at `/`. Groups storages and events. Optionally declares a protocol.
Segment	One checkpoint-interval's worth of data: a checkpoint followed by compressed deltas.
Slot	One entry in a storage array. Contains one value per field defined in the storage schema.
Storage	A named, fixed-size array of typed slots. State is mutated by deltas and snapshotted in checkpoints.
String pool	Packed, null-terminated UTF-8 strings referenced by `uint16_t` offsets. Shared by DUT descriptor and schema.
String table	Optional section for runtime strings (e.g. disassembled instructions) referenced by `FT_STRING_REF` fields.
Tail offset	File header field pointing to the last completed segment. Updated after each segment flush. Enables live reading.
Transport layer	The binary file format defined by this spec. Knows about storages, events, and segments — nothing else.

µScope `cpu` Protocol Specification

Version: 0.1-draft Protocol identifier: cpu Transport version: µScope 0.x

1. Overview

The cpu protocol defines conventions for tracing any pipelined CPU — in-order, out-of-order, VLIW, or multi-threaded — using the µScope transport layer. It does not prescribe a fixed schema. Instead, it defines semantic conventions that a DUT writer follows and a viewer relies on to render pipeline visualizations, occupancy charts, and performance summaries without prior knowledge of the specific microarchitecture.

1.1 Design Principles

Generic over specific. The protocol works for a 5-stage in-order core and a 20-stage OoO core alike. The DUT declares its structures; the viewer renders whatever it finds.
Convention over configuration. Semantics are conveyed through field names, storage shapes, and DUT properties — not through protocol-specific binary metadata.
Viewer decodes, trace stores data. The trace carries raw values (PC, instruction bits). The viewer decodes disassembly, register names, etc. using the ELF and ISA knowledge.
Entity-centric. Every in-flight instruction has a unique ID. All structures reference entities by ID. The viewer joins on this ID to build per-instruction timelines.

2. Concepts

2.1 Entities

An entity is an in-flight instruction (or micro-op). Each entity occupies a slot in the entity catalog storage and is referenced by its slot index throughout the pipeline.

Entity ID = slot index in the entity catalog (U32).
When an instruction is fetched, the writer allocates a slot (DA_SLOT_SET on its fields). When it retires or is flushed, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused.
The entity catalog must be sparse.

2.2 Buffers

A buffer is any storage whose slots hold entity references — a hardware structure that entities pass through or reside in. Examples: ROB, issue queues, load/store queues, scoreboards, reservation stations.

A storage is recognized as a buffer if it contains a field named entity_id (§3.2). The viewer automatically tracks entity membership in every buffer.

2.3 Stages

The viewer renders a per-entity Gantt chart showing which pipeline stage each instruction is in over time. Since an entity can occupy multiple buffers simultaneously (e.g., ROB + issue queue + executing), stage progression is tracked explicitly via stage_transition events (§5.1), not inferred from buffer membership.

Buffers and stages are orthogonal:

Buffers model where an entity physically resides (ROB slot 42, LQ slot 7). An entity can be in multiple buffers at once.
Stages model logical pipeline progress (fetch → decode → ... → retire). An entity is in exactly one stage at any time.

The DUT declares the stage ordering via pipeline_stages (§4.1) and emits a stage_transition event each time an entity advances. The viewer maintains a current_stage per entity and draws Gantt bars from stage entry/exit times.

2.4 Counters

A counter is a 1-slot, non-sparse storage with numeric fields, mutated via DA_SLOT_ADD. The viewer infers counters from this shape and renders them as line graphs or sparklines. No protocol markup is needed.

2.5 Events

Events model instantaneous occurrences attached to entities or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.

3. Entity Catalog

3.1 Storage Convention

The entity catalog is a storage named entities.

Property	Value
Name	`entities`
Sparse	yes (`SF_SPARSE`)
Num slots	max concurrent in-flight entities (DUT-specific)

3.2 Required Fields

Field name	Type	Description
`entity_id`	`U32`	Unique entity ID (equals the slot index)
`pc`	`U64`	Program counter
`inst_bits`	`U32`	Raw instruction bits

3.3 Optional Fields

The DUT may add any additional fields. Common examples:

Field name	Type	Description
`thread_id`	`U16`	Hardware thread / hart ID
`is_compressed`	`BOOL`	Compressed instruction (RVC, Thumb, ...)
`priv_level`	`ENUM`	Privilege level at fetch

3.4 Entity Lifecycle

Fetch:   DA_SLOT_SET  entities[id].entity_id = id
         DA_SLOT_SET  entities[id].pc = ...
         DA_SLOT_SET  entities[id].inst_bits = ...

Retire:  DA_SLOT_CLEAR entities[id]

Flush:   DA_SLOT_CLEAR entities[id]
         (plus a flush event, §5.4)

The entity_id field is always equal to the slot index. It is stored explicitly so that buffer storages and events can reference it using a uniform U32 field, independent of the transport's slot indexing.

Slot reuse: After DA_SLOT_CLEAR, the slot may be reused for a new instruction. The new occupant is a logically distinct entity — the viewer treats each clear/set cycle as a new entity lifetime. The viewer must not carry state (stage, annotations, dependencies) across a clear boundary.

4. Buffers and Stages

4.1 Stage Ordering via DUT Properties

The DUT declares pipeline stages using a DUT property:

pipeline_stages = "fetch,decode,rename,dispatch,issue,execute,complete,retire"

4.2 Buffer Storage Convention

Any storage with a field named entity_id of type U32 is a buffer.

Property	Value
Sparse	yes (`SF_SPARSE`)
Num slots	hardware structure capacity

4.3 Required Buffer Fields

Field name	Type	Description
`entity_id`	`U32`	References entity catalog slot

4.4 Optional Buffer Fields

The DUT may add structure-specific fields:

Field name	Type	Description
`completed`	`BOOL`	Execution completed (ROB)
`addr`	`U64`	Memory address (LQ/SQ)
`ready`	`BOOL`	Operands ready (IQ/scoreboard)
`fu_type`	`ENUM`	Functional unit assigned

4.5 Buffer Operations

Insert:  DA_SLOT_SET  rob[slot].entity_id = id
Remove:  DA_SLOT_CLEAR rob[slot]
Update:  DA_SLOT_SET  rob[slot].completed = 1

5. Standard Events

The protocol defines the following event names. stage_transition is required for Gantt chart rendering; all others are optional. The viewer renders recognized events with specific visualizations and unknown events generically (name + fields in a tooltip).

5.1 `stage_transition`

Explicit pipeline stage change for an entity. The DUT emits this event each time an instruction advances to a new pipeline stage. Superscalar cores emit multiple stage_transition events in the same cycle frame (e.g., a 4-wide machine retiring 4 instructions produces 4 events).

Field name	Type	Description
`entity_id`	`U32`	Entity that advanced
`stage`	`ENUM(pipeline_stage)`	Stage the entity entered

The enum must be named pipeline_stage in the schema. Its values must match the names declared in the pipeline_stages DUT property (§4.1). For example:

Value	Name
0	`fetch`
1	`decode`
2	`rename`
3	`dispatch`
4	`issue`
5	`execute`
6	`complete`
7	`retire`

The enum is DUT-defined — an in-order core might have just fetch, decode, execute, memory, writeback.

The viewer maintains a current_stage per entity. A Gantt bar for a stage spans from the time the entity entered it until the time it entered the next stage (or was cleared/flushed). Multi-cycle stages (e.g., a long-latency divide in execute) require no special handling — the entity simply stays in its current stage until the next stage_transition event.

5.2 `annotate`

Free-text annotation attached to an entity.

Field name	Type	Description
`entity_id`	`U32`	Target entity
`text`	`STRING_REF`	Annotation text

Viewer: shows as a label on the entity's Gantt bar.

5.3 `dependency`

Data or structural dependency between two entities.

Field name	Type	Description
`src_id`	`U32`	Producer entity
`dst_id`	`U32`	Consumer entity
`dep_type`	`ENUM(dep_type)`	Dependency kind

Standard dep_type enum values:

Value	Name
0	`raw`
1	`war`
2	`waw`
3	`structural`

Viewer: draws an arrow from producer to consumer in the Gantt chart.

5.4 `flush`

Entity was squashed before retirement.

Field name	Type	Description
`entity_id`	`U32`	Flushed entity
`reason`	`ENUM(flush_reason)`	Cause

Standard flush_reason enum values:

Value	Name
0	`mispredict`
1	`exception`
2	`interrupt`
3	`pipeline_clear`

Viewer: marks the entity's Gantt bar with a squash indicator.

5.5 `stall`

Pipeline stall (not tied to a specific entity).

Field name	Type	Description
`reason`	`ENUM(stall_reason)`	Stall cause

Standard stall_reason enum values are DUT-defined. Common examples: rob_full, iq_full, lq_full, sq_full, fetch_miss, dcache_miss, frontend_stall.

Viewer: renders a colored band on the timeline.

6. Counters

No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.

Common counters:

Storage name	Fields	Meaning
`committed_insns`	`count: U64`	Retired instructions
`bp_misses`	`count: U64`	Branch mispredictions
`dcache_misses`	`count: U64`	D-cache misses
`icache_misses`	`count: U64`	I-cache misses

Writer updates via DA_SLOT_ADD:

uscope_slot_add(w, STOR_COMMITTED_INSNS, 0, FIELD_COUNT, 4);  // retired 4 this cycle

7. Summary Fields

The protocol defines standard summary field names for mipmap rendering. The viewer recognizes these and aggregates them appropriately.

Field name	Type	Meaning
`committed`	`U32`	Instructions committed in bucket
`cycles_active`	`U32`	Non-idle cycles in bucket
`flushes`	`U16`	Flush events in bucket
`bp_misses`	`U16`	Branch mispredictions in bucket

Per-buffer occupancy summaries use the naming pattern <storage_name>_occ (e.g., rob_occ). The value is the sum of occupancy samples in the bucket; divide by cycles_active for average.

DUT-specific summary fields are rendered as generic bar charts.

8. DUT Properties

Properties use the cpu. key prefix so they coexist with other protocols in multi-protocol traces.

8.1 Required Properties

Key	Description	Example
`dut_name`	DUT instance name	`boom_core_0`
`cpu.protocol_version`	Version of the `cpu` protocol	`0.1`
`cpu.isa`	Instruction set architecture	`RV64GC`
`cpu.pipeline_stages`	Comma-separated stage names, in order	`fetch,...,retire`

8.2 Optional Properties

Key	Description	Example
`cpu.fetch_width`	Instructions fetched per cycle	`4`
`cpu.commit_width`	Instructions retired per cycle	`4`
`cpu.elf_path`	Path to ELF for disassembly	`/path/to/fw.elf`
`cpu.vendor`	DUT vendor	`sifive`

9. Viewer Reconstruction

9.1 Opening a Trace

Read preamble → parse schema and DUT properties
Walk scope tree from root / → find all scopes with protocol = "cpu"; each is a core
Per core scope: identify entities storage (entity catalog), find all buffers (storages with entity_id field), identify counters (1-slot non-sparse storages)
Read cpu.pipeline_stages property → build ordered stage list
If cpu.elf_path property exists, load ELF for disassembly

9.2 Gantt Chart Rendering

For a time range [T0, T1) in picoseconds:

Seek to segment covering T0 (binary search or chain walk)
Load checkpoint → initial state of all storages
Replay deltas and events T0..T1, tracking per-entity:
- Birth: entity slot becomes valid in entities
- Stage transitions: stage_transition event → record (entity_id, stage, timestamp)
- Death: entity slot cleared in entities (retire or flush)
For each entity, emit Gantt bars: each stage spans from its stage_transition timestamp until the next transition (or death)
Entity labels: read pc and inst_bits from entity catalog, decode via ISA disassembler
Dependency arrows: dependency events in the range
Flush markers: flush events in the range
Convert timestamps to domain-local cycle numbers for display using the scope's clock domain period

9.3 Occupancy View

For each buffer, count valid slots per cycle. The mipmap summary (<name>_occ fields) gives this at coarse granularity; delta replay gives exact per-cycle values when zoomed in.

9.4 Counter Graphs

Read counter storages at each cycle frame (via DA_SLOT_ADD deltas). Compute rates (delta / cycles) for display. Mipmap summaries provide pre-aggregated values for zoomed-out views.

10. Example: BOOM-like OoO Core

10.1 DUT Properties

dut_name              = "boom_tile0_core0"
cpu.isa               = "RV64GC"
cpu.fetch_width       = "4"
cpu.commit_width      = "4"
cpu.elf_path          = "/workspace/fw.elf"
cpu.pipeline_stages   = "fetch,decode,rename,dispatch,issue,execute,complete,retire"

10.2 Schema

Scopes:
  /       (id=0, root,      protocol=none)
  core0   (id=1, parent=0,  protocol="cpu")

Enums:
  pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
                  issue(4), execute(5), complete(6), retire(7)
  dep_type:       raw(0), war(1), waw(2), structural(3)
  flush_reason:   mispredict(0), exception(1), interrupt(2)
  stall_reason:   rob_full(0), iq_full(1), lq_full(2), sq_full(3),
                  fetch_miss(4), dcache_miss(5)

Storages (all scope=core0):
  entities    (sparse, 512 slots):  entity_id:U32, pc:U64, inst_bits:U32
  rob         (sparse, 256 slots):  entity_id:U32, completed:BOOL
  iq_int      (sparse, 48 slots):   entity_id:U32
  iq_fp       (sparse, 32 slots):   entity_id:U32
  iq_mem      (sparse, 48 slots):   entity_id:U32
  lq          (sparse, 32 slots):   entity_id:U32, addr:U64
  sq          (sparse, 32 slots):   entity_id:U32, addr:U64
  committed   (dense, 1 slot):      count:U64
  bp_misses   (dense, 1 slot):      count:U64

Events (all scope=core0):
  stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)
  annotate:         entity_id:U32, text:STRING_REF
  dependency:       src_id:U32, dst_id:U32, type:ENUM(dep_type)
  flush:            entity_id:U32, reason:ENUM(flush_reason)
  stall:            reason:ENUM(stall_reason)

Note: transient stages (fetch, decode, execute, etc.) are modeled purely via stage_transition events — no storages needed. Only physical structures that hold entities (ROB, IQ, LQ, SQ) are storages.

10.3 Example: 5-Stage In-Order Core

Same protocol, minimal schema:

DUT properties:
  cpu.pipeline_stages  = "fetch,decode,execute,memory,writeback"

Scopes:
  /       (id=0, root,      protocol=none)
  core0   (id=1, parent=0,  protocol="cpu")

Enums:
  pipeline_stage: fetch(0), decode(1), execute(2), memory(3), writeback(4)

Storages (all scope=core0):
  entities    (sparse, 8 slots):    entity_id:U32, pc:U64, inst_bits:U32
  committed   (dense, 1 slot):      count:U64

Events (all scope=core0):
  stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)

An in-order core may have no buffers at all — just the entity catalog and stage transitions. The viewer renders a Gantt chart purely from events.

10.4 Example: Dual-Core SoC

Multi-core uses transport-level scopes (§4.4 of the transport spec). Each core is a scope with protocol = "cpu". Storages and event types are defined per-scope, so entity IDs are per-core and no core_id field is needed in event payloads.

DUT properties:
  dut_name              = "my_soc"
  cpu.pipeline_stages   = "fetch,decode,rename,dispatch,issue,execute,complete,retire"
  cpu.isa               = "RV64GC"
  cpu.elf_path          = "/workspace/fw.elf"

Scopes:
  /            (id=0, root,         protocol=none)
  cpu_cluster  (id=1, parent=0,     protocol=none)
  core0        (id=2, parent=1,     protocol="cpu")
  core1        (id=3, parent=1,     protocol="cpu")

Enums (shared):
  pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
                  issue(4), execute(5), complete(6), retire(7)

Storages:
  entities  (scope=core0, sparse, 512):  entity_id:U32, pc:U64, inst_bits:U32
  rob       (scope=core0, sparse, 256):  entity_id:U32
  committed (scope=core0, dense, 1):     count:U64

  entities  (scope=core1, sparse, 512):  entity_id:U32, pc:U64, inst_bits:U32
  rob       (scope=core1, sparse, 256):  entity_id:U32
  committed (scope=core1, dense, 1):     count:U64

Events:
  stage_transition (scope=core0): entity_id:U32, stage:ENUM(pipeline_stage)
  stage_transition (scope=core1): entity_id:U32, stage:ENUM(pipeline_stage)
  flush            (scope=core0): entity_id:U32, reason:ENUM(flush_reason)
  flush            (scope=core1): entity_id:U32, reason:ENUM(flush_reason)

The viewer finds all scopes with protocol = "cpu", renders a per-core pipeline view for each, and can show them side-by-side.

Storage names (entities, rob) repeat across scopes — the storage_id is globally unique, but the name + scope combination gives the viewer the display path (core0/entities, core1/rob).

Cross-core events (cache coherence, IPIs) can be defined at the cpu_cluster scope with fields referencing the relevant scope IDs.

11. Version History

Version	Date	Changes
0.1	2026-xx-xx	Initial draft

µScope `noc` Protocol Specification

Version: 0.1-draft Protocol identifier: noc Transport version: µScope 0.x

1. Overview

The noc protocol defines conventions for tracing any on-chip interconnect — crossbar, mesh, ring, tree, or point-to-point — using the µScope transport layer. It works with any bus protocol: AXI4, CHI, ACE, TileLink, UCIe, or proprietary fabrics.

Like the cpu protocol, it does not prescribe a fixed schema. Instead, it defines semantic conventions that a DUT writer follows and a viewer relies on to render transaction Gantt charts, topology maps, latency histograms, and traffic heatmaps without prior knowledge of the specific interconnect microarchitecture.

1.1 Design Principles

Generic over specific. The protocol works for a single-port AXI crossbar and a 64-node CHI mesh alike. The DUT declares its structures; the viewer renders whatever it finds.
Convention over configuration. Semantics are conveyed through field names, storage shapes, and scope properties — not through protocol-specific binary metadata.
Entity-centric. Every in-flight transaction has a unique ID in a transaction catalog. All buffers, events, and stages reference transactions by this ID. The viewer joins on it to build per-transaction timelines.
Topology-agnostic. The protocol does not encode topology in the data model. Topology is declared via scope properties; the viewer uses it for visualization only.

2. Concepts

2.1 Transactions (Entities)

A transaction is an in-flight bus operation (read, write, snoop, etc.). Each transaction occupies a slot in the transaction catalog storage and is referenced by its slot index throughout the interconnect.

Transaction ID = slot index in the transaction catalog (U32).
When a transaction is issued, the writer allocates a slot (DA_SLOT_SET on its fields). When it completes, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused.
The transaction catalog must be sparse.

Transactions in the noc protocol are the direct analogue of entities in the cpu protocol (cpu spec §2.1).

2.2 Buffers

A buffer is any storage whose slots hold transaction references — a hardware structure that transactions pass through or reside in. Examples: virtual channel (VC) buffers, reorder buffers, outstanding request tables, credit pools.

A storage is recognized as a buffer if it contains a field named txn_id (§3.2). The viewer automatically tracks transaction membership in every buffer.

2.3 Stages

The viewer renders a per-transaction Gantt chart showing which pipeline stage each transaction is in over time. Since a transaction can occupy multiple buffers simultaneously (e.g., outstanding request table

VC buffer + arbitrating), stage progression is tracked explicitly via stage_transition events (§5.1), not inferred from buffer membership.

Buffers and stages are orthogonal:

Buffers model where a transaction physically resides (VC slot 3, ROB entry 7). A transaction can be in multiple buffers at once.
Stages model logical progression through the interconnect (issue → route → arbitrate → traverse → deliver → respond). A transaction is in exactly one stage at any time.

The DUT declares the stage ordering via noc.pipeline_stages (§4.1) and emits a stage_transition event each time a transaction advances. The viewer maintains a current_stage per transaction and draws Gantt bars from stage entry/exit times.

2.4 Counters

2.5 Events

Events model instantaneous occurrences attached to transactions or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.

2.6 Router Sub-Scopes

For multi-router interconnects, each router can be a child scope with protocol="noc.router". This enables per-router buffers, counters, and events while keeping the transaction catalog on the nearest ancestor noc scope.

/                     (protocol=none)
  noc0                (protocol="noc")        ← transaction catalog here
    router_0_0        (protocol="noc.router") ← per-router buffers/counters
    router_0_1        (protocol="noc.router")
    router_1_0        (protocol="noc.router")
    router_1_1        (protocol="noc.router")

A noc.router scope does not have its own transaction catalog. It references transactions from the parent noc scope's catalog via the txn_id field. The viewer resolves txn_id by walking up the scope tree to the nearest noc scope.

2.7 Cross-Scope Transaction Handoff

When a transaction crosses a scope boundary — e.g., a chiplet-to-chiplet transfer via a D2D link, or a protocol bridge (AXI→CHI) — it receives a new txn_id in the destination scope. The txn_handoff event (§5.7) stitches the two identities together, enabling end-to-end latency tracking across scope boundaries.

The txn_handoff event is emitted at a common ancestor scope of the source and destination scopes. The viewer joins on these events to build cross-scope transaction timelines.

3. Transaction Catalog

3.1 Storage Convention

The transaction catalog is a storage named transactions.

Property	Value
Name	`transactions`
Sparse	yes (`SF_SPARSE`)
Num slots	max concurrent in-flight transactions (DUT-specific)

3.2 Required Fields

Field name	Type	Description
`txn_id`	`U32`	Unique transaction ID (equals the slot index)
`opcode`	`ENUM`	Transaction type (read, write, snoop, etc.)
`addr`	`U64`	Target address
`len`	`U16`	Burst length (number of beats)
`size`	`U8`	Beat size (log2 bytes, e.g., 3 = 8 bytes)
`src_port`	`U16`	Source port / initiator ID
`dst_port`	`U16`	Destination port / target ID

3.3 Optional Fields

The DUT may add any additional fields. Common examples:

Field name	Type	Description
`qos`	`U8`	Quality-of-service priority
`txn_class`	`ENUM`	Transaction class (posted, non-posted, etc.)
`prot`	`U8`	Protection bits (privileged, secure, etc.)
`cache`	`U8`	Cache allocation hints
`snoop`	`U8`	Snoop attribute bits
`domain`	`ENUM`	Shareability domain
`excl`	`BOOL`	Exclusive access flag
`tag`	`U16`	Transaction tag (for reorder tracking)

3.4 Transaction Lifecycle

Issue:      DA_SLOT_SET  transactions[id].txn_id = id
            DA_SLOT_SET  transactions[id].opcode = ...
            DA_SLOT_SET  transactions[id].addr = ...
            DA_SLOT_SET  transactions[id].len = ...
            DA_SLOT_SET  transactions[id].size = ...
            DA_SLOT_SET  transactions[id].src_port = ...
            DA_SLOT_SET  transactions[id].dst_port = ...

Complete:   DA_SLOT_CLEAR transactions[id]

The txn_id field is always equal to the slot index. It is stored explicitly so that buffer storages and events can reference it using a uniform U32 field, independent of the transport's slot indexing.

4. Buffers and Stages

4.1 Stage Ordering via Scope Properties

Each noc scope declares pipeline stages using a scope property:

noc.pipeline_stages = "issue,route,arbitrate,traverse,deliver,respond"

The value is a comma-separated list in pipeline order (earliest first). The viewer uses this ordering for Gantt chart column layout and coloring. Stage names must match the values used in stage_transition events (§5.1). Each noc scope declares its own stages, enabling heterogeneous interconnects in the same trace.

4.2 Buffer Storage Convention

Any storage with a field named txn_id of type U32 is a buffer.

Property	Value
Sparse	yes (`SF_SPARSE`)
Num slots	hardware structure capacity

4.3 Required Buffer Fields

Field name	Type	Description
`txn_id`	`U32`	References transaction catalog slot

4.4 Optional Buffer Fields

The DUT may add structure-specific fields:

Field name	Type	Description
`vc`	`U8`	Virtual channel assignment
`priority`	`U8`	Arbitration priority
`flit_type`	`ENUM`	Flit type (header, data, tail)
`credits`	`U8`	Available credits

4.5 Buffer Operations

Insert:  DA_SLOT_SET  vc_buf[slot].txn_id = id
Remove:  DA_SLOT_CLEAR vc_buf[slot]
Update:  DA_SLOT_SET  vc_buf[slot].credits = 3

4.6 Common Buffers

Buffer name	Models
`vc_buf_<port>`	Per-port virtual channel buffer
`rob`	Reorder buffer for out-of-order completion
`ort`	Outstanding request table / tracker
`snoop_filter`	Snoop filter entries
`retry_buf`	Transactions awaiting retry

4.7 Example Stage Sets

AXI4 crossbar:

noc.pipeline_stages = "ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"

CHI mesh:

noc.pipeline_stages = "req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"

TileLink ring:

noc.pipeline_stages = "acquire,route,grant,grant_ack"

5. Standard Events

5.1 `stage_transition`

Explicit stage change for a transaction. The DUT emits this event each time a transaction advances to a new pipeline stage.

Field name	Type	Description
`txn_id`	`U32`	Transaction that advanced
`stage`	`ENUM(pipeline_stage)`	Stage the transaction entered

The pipeline_stage enum values must match the names declared in the noc.pipeline_stages scope property (§4.1). For example (AXI4):

Value	Name
0	`ar_issue`
1	`route`
2	`arbitrate`
3	`transport`
4	`target_accept`
5	`r_data`
6	`r_last`

The enum is DUT-defined — a simple crossbar might have just issue, arbitrate, transfer, complete.

The viewer maintains a current_stage per transaction. A Gantt bar for a stage spans from the cycle the transaction entered it until the cycle it entered the next stage (or was cleared).

5.2 `beat`

Individual data beat in a burst transfer.

Field name	Type	Description
`txn_id`	`U32`	Parent transaction
`beat_num`	`U16`	Beat number within burst (0-based)
`data_bytes`	`U16`	Bytes transferred in this beat

Viewer: shows beat markers on the transaction's Gantt bar during the data transfer stage. Useful for identifying partial transfers and stalls between beats.

5.3 `retry`

Transaction retry — the target or interconnect rejected the transaction and it must be re-attempted.

Field name	Type	Description
`txn_id`	`U32`	Retried transaction
`reason`	`ENUM(retry_reason)`	Cause of retry

Standard retry_reason enum values:

Value	Name
0	`target_busy`
1	`no_credits`
2	`vc_full`
3	`arb_lost`
4	`protocol_retry`

Viewer: marks a retry indicator on the transaction's Gantt bar.

5.4 `timeout`

Watchdog timeout — a transaction exceeded the expected completion time.

Field name	Type	Description
`txn_id`	`U32`	Timed-out transaction
`threshold_cycles`	`U32`	Watchdog threshold that was exceeded

Viewer: marks a timeout indicator on the transaction's Gantt bar and highlights it in the topology view.

5.5 `link_credit`

Credit flow control update on a link.

Field name	Type	Description
`port`	`U16`	Port ID
`direction`	`ENUM(credit_direction)`	Credit grant or consume
`credits`	`U8`	Number of credits

Standard credit_direction enum values:

Value	Name
0	`grant`
1	`consume`

Viewer: renders credit level as a per-port sparkline.

5.6 `arb_decision`

Arbitration outcome — records which transaction won arbitration at a port.

Field name	Type	Description
`winner_txn`	`U32`	Transaction that won arbitration
`port`	`U16`	Port where arbitration occurred
`num_contenders`	`U8`	Number of competing transactions

Viewer: shows arbitration events in the timeline. High num_contenders values indicate congestion hotspots.

5.7 `txn_handoff`

Cross-scope transaction stitching — links a transaction in one scope to its continuation in another scope.

Field name	Type	Description
`src_scope`	`U16`	Scope ID of the source transaction
`src_txn_id`	`U32`	Transaction ID in the source scope
`dst_scope`	`U16`	Scope ID of the destination transaction
`dst_txn_id`	`U32`	Transaction ID in the destination scope

This event is emitted at a common ancestor scope of src_scope and dst_scope. It enables end-to-end latency tracking across chiplet boundaries, protocol bridges, or any other scope boundary where a transaction receives a new identity.

Viewer: draws a handoff arrow between the two transaction timelines and computes end-to-end latency by joining the linked transactions.

5.8 `annotate`

Free-text annotation attached to a transaction.

Field name	Type	Description
`txn_id`	`U32`	Target transaction
`text`	`STRING_REF`	Annotation text

Viewer: shows as a label on the transaction's Gantt bar.

6. Counters

No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.

Common counters:

Storage name	Fields	Meaning
`bytes_tx`	`count: U64`	Bytes transmitted
`bytes_rx`	`count: U64`	Bytes received
`arb_conflicts`	`count: U64`	Arbitration conflicts (>1 contender)
`retries`	`count: U64`	Transaction retries
`txn_completed`	`count: U64`	Transactions completed

Writer updates via DA_SLOT_ADD:

uscope_slot_add(w, STOR_BYTES_TX, 0, FIELD_COUNT, 64);  // 64 bytes this cycle

For per-router counters, place the counter storage on the router's sub-scope (§2.6).

7. Summary Fields

The protocol defines standard summary field names for mipmap rendering. Each summary field is scoped to its noc scope (via scope_id in summary_field_def_t), so multi-interconnect traces have independent summaries without name collisions.

Field name	Type	Meaning
`txn_completed`	`U32`	Transactions completed in bucket
`bytes_transferred`	`U64`	Total bytes transferred in bucket
`avg_latency_ticks`	`U32`	Average transaction latency in bucket
`retries`	`U16`	Retry events in bucket

Per-buffer occupancy summaries use the naming pattern <storage_name>_occ (e.g., vc_buf_0_occ). The value is the sum of occupancy samples in the bucket; divide by active cycles for average.

DUT-specific summary fields are rendered as generic bar charts.

8. Scope Properties

Properties are stored on each scope (transport spec §3.4.1). The noc protocol uses the noc. key prefix. Each noc scope carries its own properties, enabling heterogeneous interconnects in the same trace.

Properties that describe the overall trace (e.g., dut_name) belong on the root scope.

8.1 Required Properties (on each `noc` scope)

Key	Description	Example
`noc.protocol_version`	Version of the `noc` protocol	`0.1`
`noc.bus_protocol`	Underlying bus protocol	`AXI4`, `CHI`, `TileLink`, `UCIe`
`noc.topology`	Interconnect topology	`crossbar`, `mesh`, `ring`, `tree`, `p2p`
`noc.pipeline_stages`	Comma-separated stage names, in order	`issue,route,arbitrate,traverse,deliver,respond`
`clock.period_ps`	Clock period in picoseconds	`1000` (1 GHz)

8.2 Optional Properties (on each `noc` scope)

Key	Description	Example
`noc.dim_x`	Mesh X dimension	`4`
`noc.dim_y`	Mesh Y dimension	`4`
`noc.num_vcs`	Number of virtual channels per port	`4`
`noc.data_width`	Data bus width in bits	`128`
`noc.addr_width`	Address bus width in bits	`48`
`noc.num_ports`	Total number of ports	`16`
`noc.routing`	Routing algorithm	`xy`, `adaptive`

8.3 Root Scope Properties

Key	Description	Example
`dut_name`	DUT instance name	`my_soc`
`vendor`	DUT vendor (top-level)	`acme`

9. Viewer Reconstruction

9.1 Opening a Trace

Read preamble → parse schema (including scope properties)
Walk scope tree from root / → find all scopes with protocol = "noc"; each is an interconnect instance
Per noc scope: a. Read scope properties → noc.pipeline_stages, noc.bus_protocol, noc.topology, etc. b. Identify transactions storage (transaction catalog) c. Find all buffers (storages with txn_id field) d. Identify counters (1-slot non-sparse storages) e. Find child scopes with protocol = "noc.router" for per-router detail
Per noc scope: build ordered stage list from noc.pipeline_stages
If noc.topology = "mesh", read noc.dim_x and noc.dim_y for topology rendering

9.2 Transaction Gantt Chart

For a cycle range [C0, C1):

Seek to segment covering C0 (binary search or chain walk)
Load checkpoint → initial state of all storages
Replay deltas and events C0..C1, tracking per-transaction:
- Birth: transaction slot becomes valid in transactions
- Stage transitions: stage_transition event → record (txn_id, stage, cycle)
- Death: transaction slot cleared in transactions (completion)
For each transaction, emit Gantt bars: each stage spans from its stage_transition cycle until the next transition (or death)
Transaction labels: read opcode, addr, src_port, dst_port from the transaction catalog
Retry markers: retry events in the range
Beat markers: beat events in the range
Timeout markers: timeout events in the range

9.3 Topology View

Using the noc.topology scope property and src_port/dst_port fields from the transaction catalog:

Render the interconnect topology (mesh grid, ring, tree, etc.)
Animate transaction flow by mapping stage_transition events to router positions
Color links by utilization (bytes per cycle / data width)
Highlight congestion hotspots using arb_decision contention data

For mesh topologies, map port IDs to (x, y) coordinates using noc.dim_x and noc.dim_y.

9.4 Latency Histogram

Compute per-transaction latency from birth-to-death ticks in the transactions catalog. Group by opcode, src_port, dst_port, or address range for drill-down analysis.

9.5 Cross-Scope Stitching

Find txn_handoff events across all noc scopes
Join (src_scope, src_txn_id) to (dst_scope, dst_txn_id)
Build end-to-end transaction timelines spanning multiple scopes
Compute end-to-end latency by summing per-scope stage durations

9.6 Occupancy View

For each buffer, count valid slots per cycle. The mipmap summary (<name>_occ fields) gives this at coarse granularity; delta replay gives exact per-cycle values when zoomed in.

9.7 Counter Graphs

Read counter storages at each cycle frame (via DA_SLOT_ADD deltas). Compute rates (delta / cycles) for display. Mipmap summaries provide pre-aggregated values for zoomed-out views.

10. Examples

10.1 AXI4 Crossbar

A simple single-scope NoC tracing an AXI4 crossbar with 4 initiator ports and 2 target ports.

Scopes:
  /           (id=0, root,      protocol=none)
    properties: dut_name="axi_xbar"
  noc0        (id=1, parent=0,  protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="AXI4",
                noc.topology="crossbar", noc.data_width="64",
                noc.num_ports="6", clock.period_ps="1000",
                noc.pipeline_stages="ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"

Enums:
  opcode:         read(0), write(1), read_linked(2), write_cond(3)
  pipeline_stage: ar_issue(0), route(1), arbitrate(2), transport(3),
                  target_accept(4), r_data(5), r_last(6)
  retry_reason:   target_busy(0), no_credits(1), arb_lost(2)
  credit_direction: grant(0), consume(1)

Storages (all scope=noc0):
  transactions  (sparse, 64 slots):   txn_id:U32, opcode:ENUM(opcode), addr:U64,
                                      len:U16, size:U8, src_port:U16, dst_port:U16,
                                      qos:U8
  ort           (sparse, 32 slots):   txn_id:U32
  bytes_tx      (dense, 1 slot):      count:U64
  bytes_rx      (dense, 1 slot):      count:U64
  arb_conflicts (dense, 1 slot):      count:U64
  txn_completed (dense, 1 slot):      count:U64

Events (all scope=noc0):
  stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
  beat:             txn_id:U32, beat_num:U16, data_bytes:U16
  retry:            txn_id:U32, reason:ENUM(retry_reason)
  arb_decision:     winner_txn:U32, port:U16, num_contenders:U8
  link_credit:      port:U16, direction:ENUM(credit_direction), credits:U8
  annotate:         txn_id:U32, text:STRING_REF

10.2 CHI Mesh NoC

A 4x4 CHI mesh with per-router sub-scopes. The transaction catalog lives on the parent noc scope; router sub-scopes hold local buffers and counters.

Scopes:
  /                 (id=0,  root,       protocol=none)
    properties: dut_name="chi_mesh_soc"
  noc0              (id=1,  parent=0,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
                noc.num_vcs="4", noc.data_width="256",
                clock.period_ps="500",
                noc.pipeline_stages="req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"
  router_0_0        (id=2,  parent=1,   protocol="noc.router")
  router_0_1        (id=3,  parent=1,   protocol="noc.router")
  ...
  router_3_3        (id=17, parent=1,   protocol="noc.router")

Enums:
  opcode:         read_no_snp(0), read_once(1), read_shared(2), read_unique(3),
                  write_no_snp(4), write_unique(5), snoop_shared(6),
                  snoop_unique(7), comp_data(8), comp_ack(9)
  pipeline_stage: req_issue(0), req_accept(1), snoop_send(2),
                  snoop_resp(3), dat_transfer(4), comp_ack(5)
  retry_reason:   target_busy(0), no_credits(1), vc_full(2),
                  arb_lost(3), protocol_retry(4)
  txn_class:      req(0), snp(1), dat(2), rsp(3)

Storages (scope=noc0):
  transactions  (sparse, 256 slots):  txn_id:U32, opcode:ENUM(opcode), addr:U64,
                                      len:U16, size:U8, src_port:U16, dst_port:U16,
                                      qos:U8, txn_class:ENUM(txn_class)

Storages (scope=router_0_0, one set per router):
  vc_buf_n      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_s      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_e      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_w      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_local  (sparse, 4 slots):    txn_id:U32, vc:U8
  bytes_fwd     (dense, 1 slot):      count:U64
  arb_conflicts (dense, 1 slot):      count:U64

Events (scope=noc0):
  stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
  retry:            txn_id:U32, reason:ENUM(retry_reason)
  annotate:         txn_id:U32, text:STRING_REF

Events (scope=router_0_0, one set per router):
  arb_decision:     winner_txn:U32, port:U16, num_contenders:U8
  link_credit:      port:U16, direction:ENUM(credit_direction), credits:U8

The viewer discovers all 16 routers as noc.router children of noc0, maps them to a 4x4 grid via noc.dim_x/noc.dim_y, and renders per-router buffer occupancy alongside the global transaction Gantt chart.

10.3 Multi-Chiplet with D2D

Two chiplets connected via a UCIe D2D link. Each chiplet has its own noc scope with an independent transaction catalog. The txn_handoff event on the SoC-level scope stitches transactions across the link.

Scopes:
  /                       (id=0, root,       protocol=none)
    properties: dut_name="multi_chiplet_soc"
  chiplet0                (id=1, parent=0,   protocol=none)
  chiplet0_noc            (id=2, parent=1,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
                noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
                clock.period_ps="500"
  chiplet1                (id=3, parent=0,   protocol=none)
  chiplet1_noc            (id=4, parent=3,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="2", noc.dim_y="2",
                noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
                clock.period_ps="500"
  d2d_link                (id=5, parent=0,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="UCIe",
                noc.topology="p2p",
                noc.pipeline_stages="d2d_issue,phy_encode,link_traverse,phy_decode,d2d_deliver",
                clock.period_ps="500"

Storages:
  transactions (scope=chiplet0_noc, sparse, 256): txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16
  transactions (scope=chiplet1_noc, sparse, 128): txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16
  transactions (scope=d2d_link, sparse, 32):      txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16

Events (scope=root):
  txn_handoff:  src_scope:U16, src_txn_id:U32, dst_scope:U16, dst_txn_id:U32

Handoff sequence for a cross-chiplet read:

Chiplet 0 issues a read → transactions[42] in chiplet0_noc
The read reaches the D2D egress port → DA_SLOT_CLEAR on chiplet0_noc.transactions[42]
D2D link picks it up → transactions[7] in d2d_link
Root scope emits txn_handoff(src_scope=2, src_txn_id=42, dst_scope=5, dst_txn_id=7)
D2D link delivers to chiplet 1 → DA_SLOT_CLEAR on d2d_link.transactions[7]
Chiplet 1 ingests the read → transactions[19] in chiplet1_noc
Root scope emits txn_handoff(src_scope=5, src_txn_id=7, dst_scope=4, dst_txn_id=19)
The viewer chains: chiplet0_noc:42 → d2d_link:7 → chiplet1_noc:19 and computes end-to-end latency

11. Version History

Version	Date	Changes
0.1	2026-xx-xx	Initial draft

Rust Crate API Reference

Crate: uscope Location: crates/uscope/

1. Overview

The uscope Rust crate provides a complete reader and writer for the µScope trace format. It implements the transport layer (file header, preamble, schema, segments, checkpoints, deltas, string table, section table) and the CPU protocol layer (entity catalog, pipeline stages, typed events).

Dependencies

Crate	Purpose
`byteorder`	Little-endian integer read/write
`lz4_flex`	Pure-Rust LZ4 compression

No other runtime dependencies.

2. Schema Building

Use SchemaBuilder and DutDescBuilder to define the trace structure before writing.

2.1 SchemaBuilder

#![allow(unused)]
fn main() {
use uscope::schema::{SchemaBuilder, FieldSpec};
use uscope::types::SF_SPARSE;

let mut sb = SchemaBuilder::new();

// Clock domain: 5 GHz (200 ps period)
let clk = sb.clock_domain("core_clk", 200);

// Scope hierarchy
sb.scope("root", None, None, None);
let scope = sb.scope("core0", Some(0), Some("cpu"), Some(clk));

// Enum type
let stage_enum = sb.enum_type(
    "pipeline_stage",
    &["fetch", "decode", "execute", "writeback"],
);

// Storage (entity catalog)
let entities = sb.storage(
    "entities", scope, 512, SF_SPARSE,
    &[
        ("entity_id", FieldSpec::U32),
        ("pc",        FieldSpec::U64),
        ("inst_bits", FieldSpec::U32),
    ],
);

// Event type
let stage_ev = sb.event(
    "stage_transition", scope,
    &[
        ("entity_id", FieldSpec::U32),
        ("stage",     FieldSpec::Enum(stage_enum)),
    ],
);

let schema = sb.build();
}

Methods:

Method	Returns	Description
`clock_domain(name, period_ps)`	`u8`	Add a clock domain
`scope(name, parent, protocol, clock_id)`	`u16`	Add a scope
`enum_type(name, values)`	`u8`	Add an enum type
`storage(name, scope, slots, flags, fields)`	`u16`	Add a storage definition
`event(name, scope, fields)`	`u16`	Add an event type
`summary_field(name, type, scope)`	—	Add a summary field
`strings_mut()`	`&mut StringPoolBuilder`	Access the string pool
`build()`	`Schema`	Consume builder, produce schema

2.2 DutDescBuilder

#![allow(unused)]
fn main() {
use uscope::schema::DutDescBuilder;

let mut dut = DutDescBuilder::new();
dut.property("dut_name", "boom_core_0")
   .property("cpu.isa", "RV64GC")
   .property("cpu.pipeline_stages", "fetch,decode,execute,writeback");

// Build using the schema's shared string pool
let dut_desc = dut.build(sb.strings_mut());
}

2.3 FieldSpec

Variant	Wire type	Size
`FieldSpec::U8`	`FT_U8`	1
`FieldSpec::U16`	`FT_U16`	2
`FieldSpec::U32`	`FT_U32`	4
`FieldSpec::U64`	`FT_U64`	8
`FieldSpec::I8`	`FT_I8`	1
`FieldSpec::I16`	`FT_I16`	2
`FieldSpec::I32`	`FT_I32`	4
`FieldSpec::I64`	`FT_I64`	8
`FieldSpec::Bool`	`FT_BOOL`	1
`FieldSpec::StringRef`	`FT_STRING_REF`	4
`FieldSpec::Enum(id)`	`FT_ENUM`	1

3. Writer

Writer<W> writes µScope trace files in streaming, append-only fashion.

3.1 Creating a Writer

#![allow(unused)]
fn main() {
use uscope::writer::Writer;
use std::fs::File;

let file = File::create("trace.uscope")?;
let mut w = Writer::create(file, &dut_desc, &schema, checkpoint_interval_ps)?;
}

The checkpoint_interval_ps parameter controls how often a full checkpoint is written. Smaller intervals allow faster random-access seeks at the cost of larger files.

3.2 Writing Cycles

All storage mutations and events must occur within a begin_cycle / end_cycle pair. Time must be monotonically non-decreasing.

#![allow(unused)]
fn main() {
w.begin_cycle(time_ps);

// Mutate storage slots
w.slot_set(storage_id, slot, field, value);
w.slot_add(storage_id, slot, field, delta);
w.slot_clear(storage_id, slot);

// Emit events (payload is pre-serialized, fields concatenated LE)
w.event(event_type_id, &payload_bytes);

w.end_cycle()?;
}

Method	Description
`begin_cycle(time_ps)`	Start a cycle frame at the given time
`slot_set(storage, slot, field, value)`	Set a field value (marks slot valid)
`slot_clear(storage, slot)`	Mark slot invalid (sparse only)
`slot_add(storage, slot, field, delta)`	Add to a field value
`event(type_id, payload)`	Emit an event with raw payload
`end_cycle()`	Finish the cycle frame

3.3 String Table

For STRING_REF fields, insert strings into the writer's string table:

#![allow(unused)]
fn main() {
let text_idx = w.string_table.insert("addi x0, x0, 0");
// Use text_idx as the u32 value for a STRING_REF field in event payloads
}

3.4 Finalization

#![allow(unused)]
fn main() {
let file = w.close()?;  // Writes string table, segment table, section table
}

Calling close() sets F_COMPLETE, writes the section table, and returns the underlying writer. The file is then readable by Reader.

4. Reader

Reader opens µScope trace files for random-access reading.

4.1 Opening a File

#![allow(unused)]
fn main() {
use uscope::reader::Reader;

let mut r = Reader::open("trace.uscope")?;
}

Handles both finalized (F_COMPLETE) and in-progress files. For finalized files, the section table is used for fast segment lookup. For in-progress files, the segment chain is walked from tail_offset.

4.2 Metadata Access

#![allow(unused)]
fn main() {
let header = r.header();           // FileHeader
let schema = r.schema();           // Schema (clock domains, scopes, storages, events)
let dut = r.dut_desc();            // DutDesc (key-value properties)
let config = r.trace_config();     // TraceConfig (checkpoint_interval_ps)
let offsets = r.field_offsets();    // Precomputed field offsets per storage

// Look up a DUT property by key
let isa = r.dut_property("cpu.isa");  // Some("RV64GC")

// String table (for STRING_REF field values)
if let Some(st) = r.string_table() {
    let text = st.get(0);  // Some("addi x0, x0, 0")
}
}

4.3 State Reconstruction

Reconstruct the full storage state at any point in time. The reader finds the appropriate segment, loads its checkpoint, and replays deltas up to the target time.

#![allow(unused)]
fn main() {
let state = r.state_at(time_ps)?;

// Query storage state
let valid = state.slot_valid(storage_id, slot);
let value = state.slot_field(storage_id, slot, field_index, &offsets[storage_id]);
}

4.4 Event Queries

#![allow(unused)]
fn main() {
let events = r.events_in_range(t0_ps, t1_ps)?;
for ev in &events {
    println!("t={} type={} payload={:?}", ev.time_ps, ev.event_type_id, ev.payload);
}
}

4.5 Segment-Level Access

#![allow(unused)]
fn main() {
let n = r.segment_count();
let (storages, events, ops) = r.segment_replay(seg_idx)?;
}

segment_replay returns the checkpoint state after full delta replay, plus all events and storage operations (TimedOp) in the segment.

4.6 Live Tailing

For traces being written concurrently:

#![allow(unused)]
fn main() {
loop {
    if r.poll_new_segments()? {
        // New segments available — re-query events or state
    }
    std::thread::sleep(std::time::Duration::from_millis(100));
}
}

5. CPU Protocol Helpers

The protocols::cpu module provides higher-level APIs that implement the CPU protocol conventions on top of the transport-layer primitives.

5.1 CpuSchemaBuilder

Constructs a complete CPU-protocol schema with all standard enums, storages, and events.

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::CpuSchemaBuilder;
use uscope::schema::FieldSpec;

let (dut_builder, mut schema_builder, ids) = CpuSchemaBuilder::new("core0")
    .isa("RV64GC")
    .pipeline_stages(&["fetch", "decode", "rename", "dispatch",
                        "issue", "execute", "complete", "retire"])
    .fetch_width(4)
    .commit_width(4)
    .entity_slots(512)
    .buffer("rob", 256, &[("completed", FieldSpec::Bool)])
    .buffer("iq_int", 48, &[])
    .counter("committed_insns")
    .counter("bp_misses")
    .build();

let dut = dut_builder.build(schema_builder.strings_mut());
let schema = schema_builder.build();
}

Builder methods:

Method	Description
`isa(name)`	Set ISA (e.g. `"RV64GC"`)
`pipeline_stages(names)`	Define pipeline stage enum
`fetch_width(n)`	Set fetch width DUT property
`commit_width(n)`	Set commit width DUT property
`entity_slots(n)`	Max in-flight entities (default: 512)
`elf_path(path)`	Set ELF path for disassembly
`vendor(name)`	Set vendor DUT property
`buffer(name, slots, fields)`	Add a hardware buffer storage
`counter(name)`	Add a counter (1-slot dense storage)
`stall_reasons(names)`	Override default stall reason enum

CpuIds — returned by build(), contains all assigned IDs:

Field	Type	Description
`scope_id`	`u16`	CPU scope ID
`entities_storage_id`	`u16`	Entity catalog storage ID
`stage_transition_event_id`	`u16`	Stage transition event type
`annotate_event_id`	`u16`	Annotation event type
`dependency_event_id`	`u16`	Dependency event type
`flush_event_id`	`u16`	Flush event type
`stall_event_id`	`u16`	Stall event type
`field_entity_id`	`u16`	Field index: entity_id
`field_pc`	`u16`	Field index: pc
`field_inst_bits`	`u16`	Field index: inst_bits
`buffers`	`Vec<(String, u16)>`	Buffer (name, storage_id) pairs
`counters`	`Vec<(String, u16, u16)>`	Counter (name, storage_id, field) triples

5.2 CpuWriter

Typed helpers that emit the correct transport-layer operations for CPU protocol semantics.

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::CpuWriter;

let cpu = CpuWriter::new(ids);

w.begin_cycle(time_ps);

// Fetch: allocate entity in catalog
cpu.fetch(&mut w, entity_id, pc, inst_bits);

// Stage transition
cpu.stage_transition(&mut w, entity_id, stage_index);

// Retire: clear entity from catalog
cpu.retire(&mut w, entity_id);

// Flush: emit flush event + clear entity
cpu.flush(&mut w, entity_id, reason);

// Annotation: insert text into string table + emit event
cpu.annotate(&mut w, entity_id, "decoded: addi x1, x0, 1");

// Dependency: record data/structural dependency
cpu.dependency(&mut w, src_entity, dst_entity, dep_type);

// Stall
cpu.stall(&mut w, reason);

// Counter increment
cpu.counter_add(&mut w, "committed_insns", 1);

w.end_cycle()?;
}

Method	Transport ops	Description
`fetch(w, id, pc, bits)`	3 × `slot_set`	Allocate entity
`stage_transition(w, id, stage)`	1 × `event`	Pipeline stage change
`retire(w, id)`	1 × `slot_clear`	Normal retirement
`flush(w, id, reason)`	1 × `event` + 1 × `slot_clear`	Squash
`annotate(w, id, text)`	1 × `string_insert` + 1 × `event`	Text annotation
`dependency(w, src, dst, type)`	1 × `event`	Data dependency
`stall(w, reason)`	1 × `event`	Pipeline stall
`counter_add(w, name, delta)`	1 × `slot_add`	Increment counter

6. Example: Full Write-Read Cycle

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::{CpuSchemaBuilder, CpuWriter};
use uscope::writer::Writer;
use uscope::reader::Reader;
use std::fs::File;

// Build schema
let (dut_builder, mut sb, ids) = CpuSchemaBuilder::new("core0")
    .isa("RV64GC")
    .pipeline_stages(&["fetch", "decode", "execute", "writeback"])
    .entity_slots(16)
    .build();

let dut = dut_builder.build(sb.strings_mut());
let schema = sb.build();

// Write
let file = File::create("trace.uscope").unwrap();
let mut w = Writer::create(file, &dut, &schema, 10_000).unwrap();
let cpu = CpuWriter::new(ids.clone());

w.begin_cycle(0);
cpu.fetch(&mut w, 0, 0x8000_0000, 0x13);
cpu.stage_transition(&mut w, 0, 0);
w.end_cycle().unwrap();

w.begin_cycle(1000);
cpu.stage_transition(&mut w, 0, 1);
w.end_cycle().unwrap();

w.begin_cycle(2000);
cpu.stage_transition(&mut w, 0, 2);
w.end_cycle().unwrap();

w.begin_cycle(3000);
cpu.stage_transition(&mut w, 0, 3);
cpu.retire(&mut w, 0);
w.end_cycle().unwrap();

w.close().unwrap();

// Read
let mut r = Reader::open("trace.uscope").unwrap();
assert_eq!(r.header().total_time_ps, 3000);

let state = r.state_at(1500).unwrap();
assert!(state.slot_valid(ids.entities_storage_id, 0)); // still in-flight

let state = r.state_at(3000).unwrap();
assert!(!state.slot_valid(ids.entities_storage_id, 0)); // retired

let events = r.events_in_range(0, 3000).unwrap();
assert_eq!(events.len(), 4); // 4 stage transitions
}

uscope-cpu: CPU Protocol Library

Crate: uscope-cpu Location: crates/uscope-cpu/

Overview

The uscope-cpu crate provides the CPU protocol interpretation layer on top of the uscope transport crate. It understands instruction lifecycles, pipeline stages, performance counters, and hardware buffers — concepts that the transport layer treats as opaque storages and events.

Architecture

uscope-cpu (this crate)          uscope (transport)
┌──────────────────────┐         ┌─────────────────┐
│ CpuTrace             │────────▶│ Reader           │
│  - instructions      │         │  - state_at()    │
│  - stages            │         │  - segment_replay│
│  - counters          │         │  - schema()      │
│  - buffers           │         └─────────────────┘
│  - lazy loading      │
│  - performance stats │
└──────────────────────┘

Dependencies

Crate	Purpose
`uscope`	Transport layer (Reader, Schema, state reconstruction)
`instruction-decoder`	RISC-V ISA decode (optional, behind `decode` feature)

CpuTrace

The main entry point. Opens a trace file, resolves the CPU protocol schema, and provides query methods.

Opening a trace

#![allow(unused)]
fn main() {
use uscope_cpu::CpuTrace;

let mut trace = CpuTrace::open("trace.uscope")?;

// File overview
let info = trace.file_info();
println!("Version: {}.{}", info.version_major, info.version_minor);
println!("Segments: {}", info.num_segments);
println!("Max cycle: {}", trace.max_cycle());
println!("Period: {} ps", trace.period_ps());

// Schema access
for (name, _) in trace.counter_names() {
    println!("Counter: {}", name);
}
for buf in trace.buffer_infos() {
    println!("Buffer: {} ({} slots)", buf.name, buf.capacity);
}
}

Counter queries

#![allow(unused)]
fn main() {
// Cumulative value at a cycle
let val = trace.counter_value_at(0, 100);

// Rate over a window (instructions per cycle)
let ipc = trace.counter_rate_at(0, 100, 64);

// Single-cycle delta
let delta = trace.counter_delta_at(0, 100);

// Downsample for sparkline rendering (min/max envelope)
let data = trace.counter_downsample(0, 0, 1000, 200);
for (min_rate, max_rate) in &data {
    // render bar from min to max
}
}

Buffer state

#![allow(unused)]
fn main() {
let state = trace.buffer_state_at(0, 50)?;
println!("Capacity: {}", state.capacity);

// Occupied slots
for slot in &state.slots {
    println!("Slot 0x{:02x}: entity_id={}", slot.0, slot.1[0]);
}

// Storage-level properties (pointer pairs)
for prop in &state.properties {
    println!("{}: {} (role={}, pair_id={})",
        prop.name, prop.value, prop.role, prop.pair_id);
}
}

Lazy segment loading

#![allow(unused)]
fn main() {
// Load specific segments (instruction/stage data)
let result = trace.load_segments(&[0, 1, 2])?;
println!("Loaded {} instructions", result.instructions.len());

// Or load segments covering a cycle range
let loaded = trace.ensure_loaded(100, 200);
}

Metadata

#![allow(unused)]
fn main() {
for (key, value) in trace.metadata() {
    println!("{}: {}", key, value);
}
}

Types

InstructionData

#![allow(unused)]
fn main() {
pub struct InstructionData {
    pub id: u32,              // Entity ID
    pub sim_id: u64,          // Simulator-assigned ID
    pub thread_id: u16,
    pub rbid: Option<u32>,    // Retire buffer slot
    pub iq_id: Option<u32>,   // Issue queue ID
    pub dq_id: Option<u32>,   // Dispatch queue ID
    pub ready_cycle: Option<u32>,
    pub pc: u64,
    pub disasm: String,
    pub tooltip: String,
    pub stage_range: Range<u32>,  // Index range into stages vec
    pub retire_status: RetireStatus,
    pub first_cycle: u32,
    pub last_cycle: u32,
}
}

StageSpan

#![allow(unused)]
fn main() {
pub struct StageSpan {
    pub stage_name_idx: u16,  // Index into stage name table
    pub lane: u16,
    pub start_cycle: u32,
    pub end_cycle: u32,
}
}

BufferInfo

#![allow(unused)]
fn main() {
pub struct BufferInfo {
    pub name: String,
    pub storage_id: u16,
    pub capacity: u16,
    pub fields: Vec<(String, u8)>,
    pub properties: Vec<BufferPropertyDef>,
}

pub struct BufferPropertyDef {
    pub name: String,
    pub field_type: u8,
    pub role: u8,     // 0=plain, 1=HEAD_PTR, 2=TAIL_PTR
    pub pair_id: u8,  // Groups head/tail into pairs
}
}

CounterSeries

#![allow(unused)]
fn main() {
pub struct CounterSeries {
    pub name: String,
    pub samples: Vec<(u32, u64)>,  // (cycle, cumulative_value)
    pub default_mode: CounterDisplayMode,
}
}

SegmentIndex

#![allow(unused)]
fn main() {
pub struct SegmentIndex {
    pub segments: Vec<(u32, u32)>,  // (start_cycle, end_cycle)
}

impl SegmentIndex {
    pub fn segments_in_range(&self, start: u32, end: u32) -> Vec<usize>;
}
}

Feature Flags

Feature	Default	Description
`decode`	yes	RISC-V instruction decode via `instruction-decoder`

C DPI Library API Reference

Header: uscope_dpi.h Location: dpi/

1. Overview

The C DPI library is a standalone, write-only µScope trace library designed for integration with hardware simulators via DPI (Direct Programming Interface). It produces trace files that are binary-compatible with the Rust reader.

Design Principles

Single .c + .h (plus vendored LZ4) — easy to integrate
C99 — compiles with any standard C compiler
No dynamic allocation during per-cycle operations — pre-allocated buffers
Write-only — no reader (use the Rust crate for reading)
Zero Rust dependency — fully self-contained

Building

make -C dpi            # builds libuscope_dpi.a
make -C dpi test       # builds and runs the test program

Link with -luscope_dpi (or include uscope_dpi.c and lz4.c directly).

2. Schema Building

Before opening a writer, define the trace schema.

2.1 Create / Free

uscope_schema_def_t *schema = uscope_schema_new();
// ... add clocks, scopes, enums, storages, events ...
// Schema is consumed by uscope_writer_open() — do not free after open.
// If not opening a writer, free with:
uscope_schema_free(schema);

2.2 Clock Domains

uint8_t clk = uscope_schema_add_clock(schema, "core_clk", 1000); // 1 GHz

Parameter	Type	Description
`name`	`const char *`	Clock name
`period_ps`	`uint32_t`	Period in picoseconds
Returns	`uint8_t`	Clock domain ID

2.3 Scopes

uscope_schema_add_scope(schema, "root", 0xFFFF, NULL, 0xFF);
uint16_t scope = uscope_schema_add_scope(schema, "core0", 0, "cpu", clk);

Parameter	Type	Description
`name`	`const char *`	Scope name
`parent`	`uint16_t`	Parent scope ID (`0xFFFF` = root)
`protocol`	`const char *`	Protocol name (`NULL` = none)
`clock_id`	`uint8_t`	Clock domain (`0xFF` = inherit)
Returns	`uint16_t`	Scope ID

2.4 Enums

const char *stages[] = {"fetch", "decode", "execute", "writeback"};
uint8_t stage_enum = uscope_schema_add_enum(schema, "pipeline_stage", stages, 4);

2.5 Storages

Fields are passed as parallel arrays of names, types, and enum IDs.

const char  *fields[]    = {"entity_id", "pc",          "inst_bits"};
uint8_t      types[]     = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
uint8_t      enum_ids[]  = {0,             0,              0};

uint16_t entities = uscope_schema_add_storage(
    schema, "entities", scope, /*num_slots=*/512, USCOPE_SF_SPARSE,
    /*num_fields=*/3, fields, types, enum_ids);

Parameter	Type	Description
`name`	`const char *`	Storage name
`scope_id`	`uint16_t`	Owning scope
`num_slots`	`uint16_t`	Number of slots
`flags`	`uint16_t`	`USCOPE_SF_SPARSE` or `0` (dense)
`num_fields`	`uint16_t`	Number of fields
`field_names`	`const char **`	Field name array
`field_types`	`const uint8_t *`	Field type array
`field_enum_ids`	`const uint8_t *`	Enum ID array (or `NULL`)
Returns	`uint16_t`	Storage ID

2.6 Events

const char  *st_fields[] = {"entity_id",    "stage"};
uint8_t      st_types[]  = {USCOPE_FT_U32,  USCOPE_FT_ENUM};
uint8_t      st_enums[]  = {0,              stage_enum};

uint16_t st_event = uscope_schema_add_event(
    schema, "stage_transition", scope,
    /*num_fields=*/2, st_fields, st_types, st_enums);

3. Field Type Constants

Constant	Value	Size	Description
`USCOPE_FT_U8`	`0x01`	1	Unsigned 8-bit
`USCOPE_FT_U16`	`0x02`	2	Unsigned 16-bit
`USCOPE_FT_U32`	`0x03`	4	Unsigned 32-bit
`USCOPE_FT_U64`	`0x04`	8	Unsigned 64-bit
`USCOPE_FT_I8`	`0x05`	1	Signed 8-bit
`USCOPE_FT_I16`	`0x06`	2	Signed 16-bit
`USCOPE_FT_I32`	`0x07`	4	Signed 32-bit
`USCOPE_FT_I64`	`0x08`	8	Signed 64-bit
`USCOPE_FT_BOOL`	`0x09`	1	Boolean
`USCOPE_FT_STRING_REF`	`0x0A`	4	String table index
`USCOPE_FT_ENUM`	`0x0B`	1	Enum value

4. Writer

4.1 Open / Close

uscope_dut_property_t props[] = {
    {"dut_name", "boom_core_0"},
    {"cpu.isa",  "RV64GC"},
};

uscope_writer_t *w = uscope_writer_open(
    "trace.uscope",
    props, /*num_props=*/2,
    schema,                    // consumed — do not free
    /*checkpoint_interval_ps=*/1000000);

// ... write cycles ...

uscope_writer_close(w);  // finalizes and frees

uscope_writer_open takes ownership of the schema. Do not call uscope_schema_free after opening.

uscope_writer_close writes the string table, segment table, section table, sets F_COMPLETE, and frees all resources.

4.2 Per-Cycle Operations

All mutations must occur within a begin_cycle / end_cycle pair. Time must be monotonically non-decreasing.

uscope_begin_cycle(w, time_ps);

uscope_slot_set(w, storage_id, slot, field, value);
uscope_slot_clear(w, storage_id, slot);
uscope_slot_add(w, storage_id, slot, field, delta);
uscope_event(w, event_type_id, payload, payload_size);

uscope_end_cycle(w);

Function	Description
`uscope_begin_cycle(w, time_ps)`	Start a cycle at the given time
`uscope_slot_set(w, stor, slot, field, val)`	Set field value (marks slot valid)
`uscope_slot_clear(w, stor, slot)`	Mark slot invalid
`uscope_slot_add(w, stor, slot, field, val)`	Add to field value
`uscope_event(w, type_id, payload, size)`	Emit event with raw payload
`uscope_end_cycle(w)`	End cycle, flush segment if needed

4.3 Event Payloads

Event payloads are the field values concatenated in schema-definition order, little-endian, with no padding. Build them manually:

// stage_transition: entity_id (U32) + stage (ENUM/U8)
uint8_t payload[5];
uint32_t entity_id = 42;
memcpy(payload, &entity_id, 4);  // little-endian on LE platforms
payload[4] = 2;                  // stage index
uscope_event(w, st_event, payload, 5);

4.4 String Table

For STRING_REF fields in event payloads:

uint32_t idx = uscope_string_insert(w, "addi x0, x0, 0");
// Use idx as the 4-byte value in a STRING_REF field

5. Limits

Resource	Maximum
String pool (schema)	64 KB
Clock domains	16
Scopes	256
Enum types	64
Enum values per type	256
Storages	256
Event types	256
Fields per storage/event	32
Ops per cycle	4096
Events per cycle	1024
Event payload size	256 bytes
Segments	65536
Delta buffer	4 MB (auto-grows)

6. Example: CPU Pipeline Trace

#include "uscope_dpi.h"
#include <string.h>

int main(void) {
    // Schema
    uscope_schema_def_t *s = uscope_schema_new();
    uint8_t clk = uscope_schema_add_clock(s, "clk", 1000);
    uscope_schema_add_scope(s, "root", 0xFFFF, NULL, 0xFF);
    uint16_t scope = uscope_schema_add_scope(s, "core0", 0, "cpu", clk);

    const char *stages[] = {"fetch", "decode", "execute", "writeback"};
    uint8_t se = uscope_schema_add_enum(s, "pipeline_stage", stages, 4);

    const char *ef[] = {"entity_id", "pc", "inst_bits"};
    uint8_t et[] = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
    uint16_t ent = uscope_schema_add_storage(s, "entities", scope,
                                              256, USCOPE_SF_SPARSE,
                                              3, ef, et, NULL);

    const char *sf[] = {"entity_id", "stage"};
    uint8_t st[] = {USCOPE_FT_U32, USCOPE_FT_ENUM};
    uint8_t sen[] = {0, se};
    uint16_t sev = uscope_schema_add_event(s, "stage_transition", scope,
                                            2, sf, st, sen);

    // DUT properties
    uscope_dut_property_t props[] = {
        {"dut_name", "core0"},
        {"cpu.isa", "RV64GC"},
        {"cpu.pipeline_stages", "fetch,decode,execute,writeback"},
    };

    // Open
    uscope_writer_t *w = uscope_writer_open("trace.uscope",
                                             props, 3, s, 100000);

    // Fetch instruction 0
    uscope_begin_cycle(w, 0);
    uscope_slot_set(w, ent, 0, 0, 0);          // entity_id
    uscope_slot_set(w, ent, 0, 1, 0x80000000); // pc
    uscope_slot_set(w, ent, 0, 2, 0x13);       // inst_bits
    uint8_t payload[5];
    uint32_t eid = 0;
    memcpy(payload, &eid, 4);
    payload[4] = 0; // fetch stage
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Decode
    uscope_begin_cycle(w, 1000);
    payload[4] = 1;
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Execute
    uscope_begin_cycle(w, 2000);
    payload[4] = 2;
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Writeback + retire
    uscope_begin_cycle(w, 3000);
    payload[4] = 3;
    uscope_event(w, sev, payload, 5);
    uscope_slot_clear(w, ent, 0);
    uscope_end_cycle(w);

    uscope_writer_close(w);
    return 0;
}

7. Integration with Simulators

SystemVerilog DPI

import "DPI-C" function chandle uscope_writer_open(
    input string path,
    /* ... */
);
import "DPI-C" function void uscope_begin_cycle(
    input chandle w, input longint unsigned time_ps
);
// ... etc

Verilator

Include uscope_dpi.c and lz4.c in the Verilator build:

verilator --cc top.sv --exe sim_main.cpp uscope_dpi.c lz4.c

Call the C API from sim_main.cpp or from DPI-exported functions in the SystemVerilog testbench.

uscope-cli: Command-Line Trace Inspector

Binary: uscope-cli Location: crates/uscope-cli/

Overview

uscope-cli is a standalone command-line tool for inspecting µScope CPU pipeline traces. It provides quick access to trace metadata, buffer state, instruction timelines, and counter data without needing the Reflex GUI.

All commands support --json for structured JSON output, making it suitable for scripting and CI pipelines.

Installation

cargo install --path crates/uscope-cli
# or run directly:
cargo run --bin uscope-cli -- <command> <file>

Commands

`info` — File overview

uscope-cli info trace.uscope

Prints: file header (version, flags, segments, duration), metadata (DUT properties), pipeline stage names, counter names, buffer names, and full schema dump (storages, events, enums).

# JSON output for scripting
uscope-cli info trace.uscope --json | jq '.counters'

`state` — Buffer state at a cycle

uscope-cli state trace.uscope --cycle 50

Shows the state of all buffers at the given cycle: occupied slots with field values, entity fields (rbid, fpb_id, etc.), and storage properties (pointer positions).

# Check ROB state at cycle 100
uscope-cli state trace.uscope --cycle 100 --json | jq '.buffers[] | select(.name == "rob")'

`timeline` — Instruction lifecycle

uscope-cli timeline trace.uscope --entity 42

Shows the complete lifecycle of instruction entity 42: fetch cycle, all stage transitions with durations, annotations, and retire/flush status.

# Find when entity 42 was in the execute stage
uscope-cli timeline trace.uscope --entity 42 --json | jq '.stages[] | select(.name == "Ex")'

`counters` — Counter values

# Show final counter values
uscope-cli counters trace.uscope

# Per-cycle values over a range
uscope-cli counters trace.uscope --range 100:200

# Filter by counter name
uscope-cli counters trace.uscope --counter retired_insns --range 0:50

`buffers` — Buffer occupancy

uscope-cli buffers trace.uscope --cycle 50

Like state but focused on buffer fill level, pointer pair positions, and occupancy percentage. Filter by buffer name with --buffer.

uscope-cli buffers trace.uscope --cycle 50 --buffer rob

Output Formats

Flag	Format	Use case
(default)	Human-readable aligned table	Interactive inspection
`--json`	Pretty-printed JSON	Scripting, piping to `jq`, CI

Examples

# Quick sanity check: does the trace have data?
uscope-cli info trace.uscope

# Debugging: what's in the ROB at cycle 50?
uscope-cli state trace.uscope --cycle 50

# Performance: what's the IPC?
uscope-cli counters trace.uscope --counter retired_insns

# Entity debugging: what happened to instruction 17?
uscope-cli timeline trace.uscope --entity 17

# Scripting: extract all counter names
uscope-cli info trace.uscope --json | jq -r '.counters[]'

uscope-mcp: MCP Server for AI-Assisted Debugging

Binary: uscope-mcp Location: crates/uscope-mcp/

Overview

uscope-mcp is a Model Context Protocol (MCP) server that lets Claude inspect µScope CPU pipeline traces. It exposes the uscope-cpu query API as MCP tools, enabling natural-language performance debugging.

Quick Start

1. Start the server

cargo run --bin uscope-mcp -- --trace /path/to/trace.uscope

2. Configure Claude Code

Add to .claude/settings.json:

{
  "mcpServers": {
    "uscope": {
      "command": "cargo",
      "args": ["run", "--release", "--bin", "uscope-mcp", "--",
               "--trace", "/path/to/trace.uscope"],
      "cwd": "/path/to/uscope/repo"
    }
  }
}

Or with a pre-built binary:

{
  "mcpServers": {
    "uscope": {
      "command": "/path/to/uscope-mcp",
      "args": ["--trace", "/path/to/trace.uscope"]
    }
  }
}

3. Ask Claude

"What's the IPC between cycles 100 and 500?"

"Show me the pipeline stages for entity 42"

"Why is the ROB full at cycle 200?"

"What caused the pipeline stall at cycle 350?"

MCP Tools

`file_info`

Returns trace header, schema, segments, counters, buffers, and metadata.

Parameters: none

`state_at_cycle`

Returns buffer contents at a specific cycle — slot values, entity fields, and storage properties.

Parameters:

cycle (number, required): cycle number to query

`entity_timeline`

Returns the complete lifecycle of an instruction: stages with durations, disasm, annotations, retire/flush status.

Parameters:

entity_id (number, required): entity ID to trace

`counter_values`

Returns counter data over a cycle range with per-cycle values, deltas, and rates.

Parameters:

counter (string, required): counter name (e.g., "retired_insns")
start_cycle (number, required): range start
end_cycle (number, required): range end

`buffer_occupancy`

Returns buffer fill level at a cycle — occupied slots, pointer pair positions, fill percentage.

Parameters:

buffer (string, required): buffer name (e.g., "rob")
cycle (number, required): cycle to query

`analyze_performance`

Returns a structured performance summary over a cycle range:

Instruction counts (total, retired, flushed, in-flight)
IPC (instructions per cycle)
Flush rate
Per-counter totals and rates
Buffer occupancy snapshots at start/mid/end
Per-stage average latency, sorted by bottleneck

Parameters:

start_cycle (number, required): range start
end_cycle (number, required): range end

Protocol

The server implements the Model Context Protocol over stdio using JSON-RPC 2.0. It handles:

initialize — server capabilities and info
notifications/initialized — acknowledged silently
tools/list — returns tool definitions with JSON Schema
tools/call — dispatches to tool handlers

All tool responses are structured JSON, formatted for AI reasoning. Errors are returned as MCP tool errors (not JSON-RPC errors) so Claude can see error messages.

Logging goes to stderr (stdout is the MCP channel).

konata2uscope

Binary: konata2uscope Location: crates/konata2uscope/

1. Overview

konata2uscope converts Konata (Kanata v0004) pipeline trace logs into µScope CPU protocol traces. This enables viewing Konata-format traces in µScope-compatible viewers with random-access seeking, mipmap summaries, and structured schema metadata.

2. Usage

konata2uscope <input.log[.gz]> -o <output.uscope> [options]

Option	Default	Description
`-o <path>`	`output.uscope`	Output file path
`--clock-period-ps <ps>`	`1000`	Clock period in picoseconds (1000 = 1 GHz)
`--dut-name <name>`	`core0`	DUT name for the trace

Gzip-compressed input (.log.gz) is detected automatically.

3. Two-Pass Architecture

Pass 1: Scan

Reads the entire Konata log to discover metadata:

All unique pipeline stage names (in first-occurrence order)
Maximum number of simultaneously in-flight instructions
Thread IDs
Total cycle count

This information is needed to construct the µScope schema before writing any trace data.

Pass 2: Emit

Re-reads the log and emits µScope data using the CPU protocol writer:

Entity allocation on instruction creation (I)
Stage transitions on stage start (S, lane 0)
Annotations on labels (L)
Retirement on retire commands (R, type 0)
Flushes on flush commands (R, type 1)
Dependencies on dependency arrows (W)

4. Konata Format Mapping

4.1 Commands

Konata	Description	µScope mapping
`C=\t<cycle>`	Set absolute cycle	Time base
`C\t<delta>`	Advance by delta cycles	Time base
`I\t<id>\t<gid>\t<tid>`	Create instruction	`DA_SLOT_SET` on entities
`L\t<id>\t0\t<text>`	Disassembly label	`annotate` event; PC extraction
`L\t<id>\t1\t<text>`	Detail label	`annotate` event
`S\t<id>\t0\t<stage>`	Start stage (lane 0)	`stage_transition` event
`S\t<id>\t1+\t<stage>`	Start stall overlay	`annotate` event
`E\t<id>\t<lane>\t<stage>`	End stage	(implicit in µScope)
`R\t<id>\t<rid>\t0`	Retire	`DA_SLOT_CLEAR` + counter
`R\t<id>\t<rid>\t1`	Flush	`flush` event + `DA_SLOT_CLEAR`
`W\t<cons>\t<prod>\t<type>`	Dependency	`dependency` event

4.2 PC Extraction

If a disassembly label (L type 0) starts with a hex address, it is extracted as the instruction PC. Supported formats:

80000000 addi x0, x0, 0 → PC = 0x80000000
0x80000000 addi x0, x0, 0 → PC = 0x80000000
00001000: jal zero, 0x10 → PC = 0x00001000

If no hex address is found, PC defaults to 0.

Kanata	0004
C=	0
I	0	0	0
L	0	0	80000000 addi x0, x0, 0
S	0	0	Fetch
C	1
E	0	0	Fetch
S	0	0	Decode
C	1
E	0	0	Decode
S	0	0	Execute
C	1
E	0	0	Execute
S	0	0	Writeback
R	0	0	0

Conversion

$ konata2uscope trace.log -o trace.uscope --clock-period-ps 200
Pass 1: scanning trace.log...
  4 stages: [Fetch, Decode, Execute, Writeback]
  max in-flight: 1
  threads: 1
  total cycles: 3
Pass 2: emitting trace.uscope...
Done.

Resulting Schema

Clock: core_clk @ 200 ps (5 GHz)
Enum: pipeline_stage = {Fetch, Decode, Execute, Writeback}
Storage: entities (1 slot, sparse)
Events: stage_transition, annotate, dependency, flush, stall
DUT: cpu.pipeline_stages = "Fetch,Decode,Execute,Writeback"

The output file is a standard µScope trace readable by the Rust Reader.