µScope Trace Format Specification

Version: 0.3-draft Magic: uSCP (0x75 0x53 0x43 0x50) Byte order: Little-endian (all multi-byte integers throughout the file, including field values in event payloads, checkpoint slot data, and summary entries) Alignment: All section offsets are 8-byte aligned


1. Overview

µScope is a binary trace format for cycle-accurate hardware introspection.

1.1 Layered Architecture

µScope is structured as two distinct layers:

flowchart TD
  proto["<b>Protocol Layer</b><br />Defines semantic meaning for a specific DUT type.<br />Contains reconstruction logic, decoders, and visualization rules.<br /><i>NOT part of this specification.</i>"]
  transport["<b>Transport Layer</b><br />The file format (this document). Knows about:<br />Storages · Events · Checkpoints + deltas · Summaries<br />Knows NOTHING about CPUs, pipelines, caches,<br />entities, counters, annotations, or any specific hardware."]
  proto --- transport

The protocol layer defines semantics, decoders, and visualization. The transport layer defines the binary format and read/write APIs.

1.2 Core Primitives

PrimitiveWhat it models
StorageA named array of typed slots
EventA timestamped occurrence with a typed payload

All primitives are schema-defined. The transport layer imposes no assumptions about their fields, types, or semantics.

Everything else — entities, counters, annotations, dependencies, markers — is modeled using these two primitives and interpreted by the protocol layer.

1.3 String Representation

All human-readable strings in the format (field names, enum labels, DUT properties, etc.) are stored in a single string pool. Structures reference strings by uint16_t offset into the pool (max 64 KB, sufficient for any realistic schema).

The string pool is null-terminated UTF-8 sequences packed sequentially, stored at the end of the schema chunk payload. Both the DUT descriptor and schema definitions reference it.

The optional string table section (§7) stores runtime strings referenced by FT_STRING_REF fields in delta data. An FT_STRING_REF value is a 0-based index into the string table's entries array.

1.4 File Layout

The file has two regions: a fixed preamble written at trace creation, and an append region that grows during simulation.

CHUNK DUT DESC CHUNK SCHEMA CHUNK TRACE CONFIG CHUNK END Segment Header Checkpoint Deltas Segment Header Checkpoint Deltas File Header Preamble Chunks Segment 0 Segment 1 ... String Table Summary Section Segment Table Section Table 48 B 56 B 56 B

During simulation, only segments are appended. At close, finalization data is written and the file header is rewritten with final values.

1.5 Access Patterns

µScope supports three access patterns:

PatternWhenMechanism
Streaming writeDuring simulationAppend segments, update tail_offset
Live readWhile writer is still runningFollow tail_offsetprev chain
Random accessAfter finalizationBinary search segment table

See §8 for details.

1.6 Time Model

All timestamps in µScope are in picoseconds (ps). This provides a universal time axis that accommodates multiple clock domains without conversion loss — every practical hardware clock period is an integer number of picoseconds (e.g., 5 GHz → 200 ps, 800 MHz → 1250 ps).

Cycle-frame deltas, segment boundaries, and summary buckets all use picosecond timestamps. The schema defines clock domains (§4.9), each with a name and period. Scopes are assigned to clock domains so the viewer can display domain-local cycle numbers (by dividing timestamps by the clock period).

Writers emit cycle-frame deltas equal to the clock period of the active domain (e.g., 200 for a 5 GHz clock). LEB128 encoding keeps this compact (1–2 bytes), and segment-level compression handles the repeating patterns efficiently.


2. File Header

Offset 0. Fixed size: 48 bytes.

typedef struct {
    uint8_t  magic[4];              // "uSCP" = {0x75, 0x53, 0x43, 0x50}
    uint16_t version_major;         // 0
    uint16_t version_minor;         // 2
    uint64_t flags;                 // §2.1
    uint64_t total_time_ps;         // total trace duration in picoseconds (0 until finalized)
    uint32_t num_segments;          // updated after each segment flush
    uint32_t preamble_end;          // file offset where segments begin
    uint64_t section_table_offset;  // 0 until finalized
    uint64_t tail_offset;           // file offset of last segment header (0 = none)
} file_header_t;                    // 48 bytes

The preamble (§2.3) immediately follows the header at offset 48 and extends to preamble_end. Readers scan preamble chunks to locate the DUT descriptor, schema, and trace configuration.

2.1 Flags

BitNameDescription
0F_COMPLETETrace was cleanly finalized
1F_COMPRESSEDDelta segments use compression
2F_HAS_STRINGSString table section present
3-5F_COMP_METHODCompression method (0=LZ4, 1=ZSTD; 2–7 reserved, must not be used). LZ4 support is mandatory for all readers; ZSTD is optional.
6F_COMPACT_DELTASDelta blobs may contain compact ops (§8.6.3). Ignored when F_INTERLEAVED_DELTAS is set.
7F_INTERLEAVED_DELTASv0.2 interleaved frame format (§8.6.6). Ops and events use self-describing tags.
8-63ReservedMust be zero

2.2 Header Lifecycle

FieldAt openAfter each segmentAt close
magicuSCP
flagsF_COMPRESSED etcF_COMPLETE set
total_time_ps0final value
num_segments0incrementedfinal value
preamble_endfinal value
section_table_offset0final offset
tail_offset0offset of new segmentoffset of last segment

After fully writing a segment, the writer commits it in this order:

  1. Write segment data (header + checkpoint + deltas) at EOF
  2. Memory barrier / fsync
  3. Write tail_offset (single naturally-aligned 8-byte write — the commit point)
  4. Write num_segments (single naturally-aligned 4-byte write — advisory)

A live reader uses tail_offset as the sole authoritative indicator of new data. num_segments may lag by one during live reads.

2.3 Preamble Chunks

The preamble immediately follows the file header (offset 48) and consists of a sequence of typed chunks. Each chunk has an 8-byte header:

typedef struct {
    uint16_t type;                  // chunk type
    uint16_t flags;                 // must be 0 (reserved for future use)
    uint32_t size;                  // payload size in bytes
    // uint8_t payload[size];
    // padding to 8-byte alignment (0-filled)
} preamble_chunk_t;                 // 8 bytes + payload + padding

Chunk payloads are padded to 8-byte alignment. The next chunk starts at offset 8 + align8(size) from the current chunk header.

enum preamble_chunk_type : uint16_t {
    CHUNK_END          = 0x0000,    // terminates the preamble
    CHUNK_DUT_DESC     = 0x0001,    // DUT descriptor (§3)
    CHUNK_SCHEMA       = 0x0002,    // schema definition (§4)
    CHUNK_TRACE_CONFIG = 0x0003,    // trace session parameters (§2.4)
    // future: CHUNK_ELF, CHUNK_SOURCE_MAP, ...
};

Mandatory chunks: A valid file must contain exactly one each of CHUNK_DUT_DESC, CHUNK_SCHEMA, and CHUNK_TRACE_CONFIG. Readers must reject files missing any of these.

Unknown chunks: Readers must skip chunk types they do not recognize (advance by 8 + align8(size) bytes). This allows older readers to open files written by newer writers that add new chunk types.

Ordering: Writers should emit chunks in the order DUT → Schema → Trace Config, but readers must not depend on ordering.

2.4 Trace Configuration Chunk

Session-level parameters that govern how the trace was captured.

typedef struct {
    uint64_t checkpoint_interval_ps; // picoseconds between checkpoints
} trace_config_t;                    // 8 bytes (CHUNK_TRACE_CONFIG payload)

3. DUT Descriptor

CHUNK_DUT_DESC payload. Identifies what is being traced.

typedef struct {
    uint16_t num_properties;
    uint16_t reserved;              // must be 0
    // dut_property_t properties[num_properties];
} dut_desc_t;                       // 4 bytes + properties

3.1 DUT Properties

typedef struct {
    uint16_t key;                   // offset into string pool
    uint16_t value;                 // offset into string pool
} dut_property_t;                   // 4 bytes

Properties are opaque key-value pairs. The transport layer does not interpret them — only the protocol layer does. Protocol version, vendor, DUT name, and any domain-specific metadata are all properties.

Example properties for an OoO CPU (protocol-specific keys use a prefix to avoid collisions in multi-protocol traces):

KeyValue
dut_nameboom_core_0
cpu.vendoracme
cpu.protocol_version0.1
cpu.isaRV64IMAFDCV
cpu.pipeline_depth12
cpu.elf_path/path/to/fw.elf

4. Schema

CHUNK_SCHEMA payload. The schema defines the structure of all data in the trace. Written once at trace creation, immutable thereafter. Fully self-describing — a viewer can parse and display data without a protocol plugin.

4.1 Schema Header

typedef struct {
    uint8_t  num_enums;             // max 255 enum types
    uint8_t  num_clock_domains;     // max 255 clock domains
    uint16_t num_scopes;
    uint16_t num_storages;
    uint16_t num_event_types;
    uint16_t num_summary_fields;
    uint16_t string_pool_offset;    // offset from schema start to string pool
    // Followed by, in order:
    //   clock_domain_def_t      clocks[num_clock_domains]
    //   scope_def_t             scopes[num_scopes]
    //   enum_def_t              enums[num_enums]              (variable-size)
    //   storage_def_t           storages[num_storages]        (variable-size)
    //   event_def_t             event_types[num_event_types]  (variable-size)
    //   summary_field_def_t     summary_fields[num_summary_fields]
    //   <string pool>
} schema_header_t;                  // 12 bytes

4.2 Field Types

enum field_type : uint8_t {
    FT_U8           = 0x01,
    FT_U16          = 0x02,
    FT_U32          = 0x03,
    FT_U64          = 0x04,
    FT_I8           = 0x05,
    FT_I16          = 0x06,
    FT_I32          = 0x07,
    FT_I64          = 0x08,
    FT_BOOL         = 0x09,         // 1 byte
    FT_STRING_REF   = 0x0A,         // uint32_t index into string table entries[]
    FT_ENUM         = 0x0B,         // uint8_t index into a named enum
};

4.3 Field Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  type;                  // field_type (size derived from type)
    uint8_t  enum_id;               // if type==FT_ENUM, else 0
    uint8_t  reserved[4];
} field_def_t;                      // 8 bytes

Field size is derived from the type:

TypeSize (bytes)
FT_U8, FT_I8, FT_BOOL, FT_ENUM1
FT_U16, FT_I162
FT_U32, FT_I32, FT_STRING_REF4
FT_U64, FT_I648

4.4 Scope Definition

Scopes define a hierarchical tree for organizing storages and events. The schema must contain at least one scope: scope 0 is the root scope (conventionally named /).

typedef struct {
    uint16_t name;              // offset into string pool
    uint16_t scope_id;          // 0-based; scope 0 = root
    uint16_t parent_id;         // parent scope_id, 0xFFFF = root (only valid for scope 0)
    uint16_t protocol;          // offset into string pool, 0xFFFF = no protocol
    uint8_t  clock_id;          // clock domain index (§4.9), 0xFF = inherit from parent
    uint8_t  reserved[3];
} scope_def_t;                  // 12 bytes

Each scope optionally declares a protocol — a string identifying which protocol layer applies (e.g., "cpu", "dma", "noc"). The viewer uses the protocol to select the appropriate plugin for that subtree. Scopes with protocol = 0xFFFF have no protocol and are rendered generically.

There is no protocol inheritance. Each scope that needs a protocol must declare it explicitly. The root scope typically has no protocol.

Protocol identifiers: Vendor-specific protocols use a dotted prefix: axelera.loom_core. The protocol generic (or no protocol) means the viewer renders raw schema data without interpretation.

4.5 Enum Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  num_values;
    uint8_t  reserved;
    // enum_value_t values[num_values];
} enum_def_t;                       // 4 bytes + values

typedef struct {
    uint8_t  value;                 // numeric value
    uint8_t  reserved;
    uint16_t name;                  // offset into string pool
} enum_value_t;                     // 4 bytes

4.6 Storage Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint16_t storage_id;            // 0-based
    uint16_t num_slots;
    uint16_t num_fields;
    uint16_t flags;                 // §4.6.1
    uint16_t scope_id;              // owning scope, 0xFFFF = root-level
    uint16_t num_properties;        // v0.3: number of storage-level properties
    uint16_t reserved;              // v0.3: must be 0
    // field_def_t fields[num_fields];
    // field_def_t properties[num_properties];   // v0.3
} storage_def_t;                    // 16 bytes + fields + properties

4.6.2 Storage Properties (v0.3)

Storage properties are named, typed scalar values attached to a storage (not per-slot). They are checkpointed and updated via DA_PROP_SET deltas. Use cases include buffer pointers (retire_ptr, allocate_ptr) and other storage-level metadata that changes each cycle.

Properties are defined in the schema as field_def_t entries appended after the slot field definitions. Each property has a name, type, and optional enum_id, following the same rules as slot fields.

4.6.1 Storage Flags

BitNameDescription
0SF_SPARSECheckpoints store only valid entries + bitmask
1SF_BUFFERBuffer storage — sparse storage used as a named buffer (e.g., ROB, issue queue). The protocol layer uses this flag to detect buffer storages for dedicated visualization.
2-15Reserved

For SF_SPARSE storages, slot validity is tracked by the transport:

  • DA_SLOT_SET on any field of an invalid slot implicitly marks it valid.
  • DA_SLOT_CLEAR marks a slot invalid.

Non-sparse storages have all slots always valid.

4.7 Event Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint16_t event_type_id;         // 0-based
    uint16_t num_fields;
    uint16_t scope_id;              // owning scope, 0xFFFF = root-level
    // field_def_t fields[num_fields];
} event_def_t;                      // 8 bytes + fields

4.8 Summary Field Definition

typedef struct {
    uint16_t name;                  // offset into string pool
    uint8_t  type;                  // field_type (size derived from type, see §4.3)
    uint8_t  reserved;
    uint16_t scope_id;              // owning scope (same field as in storage/event defs)
    uint16_t reserved2;
} summary_field_def_t;              // 8 bytes

Summary fields are scoped: in a multi-scope trace, each scope has its own set of summary fields. Fields with the same name in different scopes are independent (e.g., core0/committed vs. core1/committed).

Summary fields are opaque to the transport. The writer computes and writes values; the transport stores and retrieves them. What each field means (counter rate, storage occupancy, event frequency) is the protocol layer's concern.

4.9 Clock Domain Definition

typedef struct {
    uint16_t name;                  // offset into string pool (e.g., "core_clk")
    uint16_t clock_id;              // 0-based
    uint32_t period_ps;             // clock period in picoseconds (0 = unknown)
} clock_domain_def_t;               // 8 bytes

Each clock domain defines a named clock with a period in picoseconds. Scopes reference clock domains via clock_id in scope_def_t (§4.4).

The viewer uses period_ps to convert picosecond timestamps to domain-local cycle numbers for display (cycle = timestamp / period_ps).

A trace must define at least one clock domain. If the DUT has a single clock, one domain suffices. Multi-clock SoCs define one domain per distinct clock frequency.

Exampleperiod_psFrequency
core_clk2005.0 GHz
bus_clk10001.0 GHz
mem_clk1250800 MHz
slow_periph_clk3000033.3 MHz

5. Section Table

Written at finalization only (when F_COMPLETE is set).

enum section_type : uint16_t {
    SECTION_END              = 0x0000,
    SECTION_SUMMARY          = 0x0001,
    SECTION_STRINGS          = 0x0002,
    SECTION_SEGMENTS         = 0x0003,
    SECTION_COUNTER_SUMMARY  = 0x0010,  // trace summary (counter mipmaps + instruction density)
};

typedef struct {
    uint16_t type;
    uint16_t flags;
    uint32_t reserved;
    uint64_t offset;
    uint64_t size;
} section_entry_t;                  // 24 bytes

The table is terminated by a SECTION_END entry.

For incomplete files (F_COMPLETE not set), section_table_offset is 0 and the section table does not exist. Readers must use the segment chain (§8.2) to discover segments.


6. Trace Summary Section (TSUM)

Written at finalization into a SECTION_COUNTER_SUMMARY section. Contains instruction density mipmaps and per-counter mipmaps in a self-contained blob.

6.1 TSUM Wire Format

Offset  Size   Field
──────  ─────  ──────────────────────────────────
0       4      magic: b"TSUM" (0x54 0x53 0x55 0x4D)
4       4      base_interval_cycles (u32 LE)
8       4      fan_out (u32 LE)
12      8      total_instructions (u64 LE)
                                                    ─── 20 bytes fixed header ───

20      4      num_density_levels (u32 LE)
        ...    For each density level:
                 4 bytes: num_entries (u32 LE)
                 num_entries × 4 bytes: instruction counts (u32 LE each)

        4      num_counters (u32 LE)
        ...    For each counter:
                 4 bytes: name_len (u32 LE)
                 name_len bytes: name (UTF-8, not null-terminated)
                 2 bytes: storage_id (u16 LE)
                 4 bytes: num_levels (u32 LE)
                 For each level:
                   4 bytes: num_entries (u32 LE)
                   num_entries × 24 bytes: mipmap entries

6.2 Mipmap Entry

Each mipmap entry is 24 bytes:

OffsetSizeFieldDescription
08min_deltaMinimum per-cycle delta in this bucket
88max_deltaMaximum per-cycle delta in this bucket
168sumTotal delta accumulated in this bucket

6.3 Backward Compatibility (CSUM)

Readers must also accept the legacy CSUM magic (b"CSUM", 0x43 0x53 0x55 0x4D) which predates instruction density support. The CSUM layout is:

0       4      magic: b"CSUM"
4       4      base_interval_cycles (u32 LE)
8       4      fan_out (u32 LE)
12      4      num_counters (u32 LE)
        ...    Counter mipmaps (same format as TSUM)

When reading CSUM, set total_instructions = 0 and instruction_density = [].


7. String Table (Optional)

For runtime strings referenced by FT_STRING_REF fields in storage slots or event payloads. Written at finalization.

typedef struct {
    uint32_t num_entries;
    uint32_t reserved;
    // string_index_t entries[num_entries];
    // followed by packed null-terminated string data
} string_table_header_t;

typedef struct {
    uint32_t offset;                // byte offset into string data (relative to end of entries array)
    uint32_t length;                // string length in bytes (excluding null terminator)
} string_index_t;                   // 8 bytes

An FT_STRING_REF field value is a 0-based index into the entries[] array. The reader looks up entries[value] to get the offset and length of the string data. Writers assign sequential indices starting from 0.


8. Segments

A segment is one checkpoint-interval's worth of data: a full state snapshot (checkpoint) followed by compressed cycle-by-cycle deltas.

8.1 Segment Header

Each segment is self-describing and linked to the previous segment, forming a backward chain.

typedef struct {
    uint32_t segment_magic;         // "uSEG" = {0x75, 0x53, 0x45, 0x47}
    uint32_t flags;
    uint64_t time_start_ps;         // segment start time in picoseconds
    uint64_t time_end_ps;           // exclusive
    uint64_t prev_segment_offset;   // file offset of previous segment (0 = first)
    uint32_t checkpoint_size;
    uint32_t deltas_compressed_size;
    uint32_t deltas_raw_size;
    uint32_t num_frames;            // number of cycle_frame records in decompressed delta blob
    uint32_t num_frames_active;     // frames with at least one op or event
    uint32_t reserved;
    // checkpoint data (checkpoint_size bytes)
    // compressed delta data (deltas_compressed_size bytes)
} segment_header_t;                 // 56 bytes

The segment_magic field allows validation when walking the chain and recovery of incomplete files.

8.2 Segment Chain

Segments form a singly-linked list via prev_segment_offset, traversable from tail_offset in the file header backward to the first segment (prev_segment_offset == 0).

flowchart LR
  S2["Segment 2<br />[400ns,600ns)<br />prev→S1"] -->|prev| S1["Segment 1<br />[200ns,400ns)<br />prev→S0"]
  S1 -->|prev| S0["Segment 0<br />[0,200ns)<br />prev=0"]
  tail(["tail_offset"]) -.->|points to| S2

8.3 Segment Table (Finalization Only)

At close, the writer builds a flat segment table for fast random access. This table is referenced by SECTION_SEGMENTS in the section table.

typedef struct {
    uint64_t offset;                // file offset of segment_header_t
    uint64_t time_start_ps;
    uint64_t time_end_ps;           // exclusive
} segment_index_entry_t;            // 24 bytes

Binary search on time_start_ps gives O(log n) seek to any timestamp.

8.4 Reading Strategies

Finalized file (F_COMPLETE set):

  1. Read file header → preamble_end, section_table_offset
  2. Scan preamble chunks → extract DUT, schema, trace config
  3. Read section table → find SECTION_SEGMENTS
  4. Binary search segment table for target timestamp → get segment offset
  5. Read segment header + checkpoint + deltas at that offset

Live file (F_COMPLETE not set):

  1. Read file header → preamble_end, tail_offset
  2. Scan preamble chunks → extract DUT, schema, trace config
  3. Read segment at tail_offset → follow prev_segment_offset chain
  4. Build in-memory segment index (done once, O(n) in segments)
  5. To check for new data: re-read tail_offset from file header

Streaming write (writer perspective):

  1. Write file header (with tail_offset=0) + preamble chunks + CHUNK_END
  2. Set preamble_end in file header
  3. For each checkpoint interval: a. Write segment_header_t + checkpoint + compressed deltas at EOF b. Rewrite tail_offset and num_segments in file header
  4. At close: write string table, summary, segment table, section table; set F_COMPLETE; rewrite file header with final values

8.5 Checkpoint Format

A checkpoint is a sequence of storage blocks, one per storage.

typedef struct {
    uint16_t storage_id;
    uint16_t reserved;
    uint32_t size;                  // payload size in bytes
    // payload
} checkpoint_block_t;               // 8 bytes

8.5.1 Sparse Storage Block

checkpoint_block_t { storage_id, size }
uint8_t  valid_mask[ceil(num_slots/8)];
// For each set bit: slot_data[slot_size]
// v0.3: property_data[property_data_size] (if num_properties > 0)

8.5.2 Dense Storage Block

checkpoint_block_t { storage_id, size }
// slot_data[slot_size] × num_slots
// v0.3: property_data[property_data_size] (if num_properties > 0)

For storages with num_properties > 0, property values are appended after slot data as tightly-packed field values (same packing rules as slot data). The size field in the checkpoint block covers the total payload including property data.

8.6 Delta Format

8.6.1 Cycle Frame

Wire format (variable-length — not representable as a C struct):

cycle_frame:
  [LEB128]   time_delta_ps   1–10 bytes, unsigned delta in ps from previous frame
  [uint8]    op_format       0 = wide (16B delta_op_t), 1 = compact (8B delta_op_compact_t)
  [uint8]    reserved        must be 0
  [uint16]   num_ops
  [uint16]   num_events
  [repeated] ops             × num_ops  (size per op depends on op_format)
  [repeated] events          × num_events (event_record_t, variable-size)

The op_format field is only meaningful when F_COMPACT_DELTAS is set in the file header. If the flag is not set, op_format must be 0 (wide) and readers may skip checking it.

The time delta uses unsigned LEB128 encoding (same as DWARF / protobuf). Values are in picoseconds. For a 5 GHz clock (200 ps period), consecutive cycles produce a repeating delta of 200:

Delta valueEncoded bytesTypical scenario
01 (0x00)Multiple frames at same timestamp
1–1271Sub-ns deltas (rare)
128–163832Most clock periods (e.g., 200–1250)
16384+3+Large idle gaps

The first frame in each segment uses segment_header_t.time_start_ps as the base, so each segment is independently decodable without prior context.

8.6.2 Delta Operations

enum delta_action : uint8_t {
    DA_SLOT_SET     = 0x01,         // set a field value
    DA_SLOT_CLEAR   = 0x02,         // mark slot invalid (sparse only)
    DA_SLOT_ADD     = 0x03,         // add value to field (for counters etc.)
    DA_PROP_SET     = 0x04,         // v0.3: set a storage-level property
};

typedef struct {
    uint8_t  action;
    uint8_t  reserved;
    uint16_t storage_id;
    uint16_t slot_index;
    uint16_t field_index;           // ignored for DA_SLOT_CLEAR; prop_index for DA_PROP_SET
    uint64_t value;                 // ignored for DA_SLOT_CLEAR
} delta_op_t;                       // 16 bytes

8.6.3 Compact Delta Variant

When the file header flag F_COMPACT_DELTAS is set, delta blobs may contain compact 8-byte ops. The op_format field in cycle_frame_t (§8.6.1) determines which layout all ops in that frame use.

typedef struct {
    uint8_t  action;
    uint8_t  storage_id_lo;         // low 8 bits of storage_id
    uint16_t slot_index;
    uint16_t field_index;
    uint16_t value16;
} delta_op_compact_t;               // 8 bytes

Compact ops have the following limitations. If any op in a frame violates these, the writer must use wide format for the entire frame:

  • storage_id must be 0–255
  • value16 is zero-extended to 64 bits; values > 65535 cannot be represented
  • DA_SLOT_CLEAR ignores field_index and value16 (same as wide format)

8.6.4 Event Records

typedef struct {
    uint16_t event_type_id;
    uint16_t reserved;              // must be 0
    uint32_t payload_size;
    // uint8_t payload[payload_size];
} event_record_t;

The payload_size must equal the sum of field sizes for this event type as defined in the schema. Writers must not emit a different size. Readers should validate this but may use payload_size to skip events with unrecognized event_type_id without consulting the schema.

8.6.5 Payload Wire Format

Event payloads and checkpoint slot data use the same packing rule: fields are concatenated in schema-definition order with no padding and no alignment. Multi-byte fields use little-endian byte order (as with all integers in the file). The total payload size equals the sum of all field sizes as derived from their types (see §4.3).

Checkpoint blocks (§8.5) and cycle frames within the delta blob are also tightly packed with no inter-block or intra-block padding.

8.6.6 Interleaved Frame Format (v0.2)

When the file header flag F_INTERLEAVED_DELTAS is set, cycle frames use a self-describing tagged item stream instead of separate op/event arrays. This preserves the exact call order of ops and events within a cycle, which the v0.1 format cannot represent.

cycle_frame_v2:
  [LEB128]   time_delta_ps   1–10 bytes, unsigned delta in ps
  [uint16]   num_items       total number of tagged items
  [repeated] items           × num_items (self-describing via tag byte)

Each item starts with a tag byte that determines its type and size:

TagTypeTotal sizeLayout
0x01Wide op16 bytestag:u8 action:u8 storage_id:u16 slot:u16 field:u16 value:u64
0x02Compact op8 bytestag:u8 action:u8 storage_id_lo:u8 slot:u16 field:u16 value16:u16
0x03Event8+N bytestag:u8 reserved:u8 event_type_id:u16 payload_size:u32 payload[N]

The tag byte is size-neutral: it replaces the reserved byte in wide ops and one byte of the reserved:u16 in events. The frame header shrinks by 3 bytes compared to v0.1 (no op_format, no separate counts).

Compact decision is per-frame, same logic as v0.1: if all ops in the frame satisfy storage_id ≤ 255 and value ≤ 65535, all ops use tag 0x02; otherwise all ops use tag 0x01. Events always use 0x03.

When F_INTERLEAVED_DELTAS is set, F_COMPACT_DELTAS is ignored.

Readers must support both v0.1 (§8.6.1) and v0.2 frame formats by checking the F_INTERLEAVED_DELTAS flag.

8.6.7 Compression

Per-segment, single LZ4 or ZSTD block. Method indicated in file header flags. Readers must reject files with unknown F_COMP_METHOD values.


9. Writer API

// ── Lifecycle ──
uscope_writer_t* uscope_writer_open(const char* path,
                                     const dut_desc_t* dut,
                                     const schema_t* schema,
                                     uint32_t checkpoint_interval);
void             uscope_writer_close(uscope_writer_t* w);

// ── Per-cycle ──
void uscope_begin_cycle(uscope_writer_t* w, uint64_t time_ps);

void uscope_slot_set(uscope_writer_t* w, uint16_t storage_id,
                      uint16_t slot, uint16_t field, uint64_t value);
void uscope_slot_clear(uscope_writer_t* w, uint16_t storage_id,
                        uint16_t slot);
void uscope_slot_add(uscope_writer_t* w, uint16_t storage_id,
                      uint16_t slot, uint16_t field, uint64_t value);

void uscope_event(uscope_writer_t* w, uint16_t event_type_id,
                   const void* payload);

void uscope_end_cycle(uscope_writer_t* w);

// ── Checkpoints ──
typedef void (*uscope_checkpoint_fn)(uscope_writer_t* w, void* user_data);
void uscope_set_checkpoint_callback(uscope_writer_t* w,
                                     uscope_checkpoint_fn fn, void* ud);
void uscope_checkpoint_storage(uscope_writer_t* w, uint16_t storage_id,
                                const uint8_t* valid_mask,
                                const void* slot_data,
                                uint32_t num_valid_slots);

9.1 DPI Bridge

The transport-level DPI is generic. Protocol-specific convenience wrappers are defined by each protocol, not by this spec.

import "DPI-C" function chandle uscope_open(string path);
import "DPI-C" function void    uscope_close(chandle w);

import "DPI-C" function void    uscope_begin_cycle(chandle w, longint unsigned time_ps);
import "DPI-C" function void    uscope_end_cycle(chandle w);

import "DPI-C" function void    uscope_slot_set(
    chandle w, shortint unsigned storage_id, shortint unsigned slot,
    shortint unsigned field, longint unsigned value
);
import "DPI-C" function void    uscope_slot_clear(
    chandle w, shortint unsigned storage_id, shortint unsigned slot
);
import "DPI-C" function void    uscope_slot_add(
    chandle w, shortint unsigned storage_id, shortint unsigned slot,
    shortint unsigned field, longint unsigned value
);

import "DPI-C" function void    uscope_event_raw(
    chandle w, shortint unsigned event_type_id,
    input byte unsigned payload[]
);

10. Reader API

// ── Lifecycle ──
uscope_reader_t* uscope_reader_open(const char* path);
void             uscope_reader_close(uscope_reader_t* r);

// ── Metadata ──
const file_header_t*  uscope_header(const uscope_reader_t* r);
const dut_desc_t*     uscope_dut_desc(const uscope_reader_t* r);
const schema_t*       uscope_schema(const uscope_reader_t* r);
const char*           uscope_scope_protocol(const uscope_reader_t* r,
                                             uint16_t scope_id);
const char*           uscope_dut_property(const uscope_reader_t* r,
                                           const char* key);
bool                  uscope_is_complete(const uscope_reader_t* r);

// ── Summary (finalized files only) ──
uint32_t    uscope_summary_levels(const uscope_reader_t* r);
const void* uscope_summary_data(const uscope_reader_t* r, uint32_t level,
                                 uint32_t* out_count);

// ── State reconstruction ──
uscope_state_t* uscope_state_at(uscope_reader_t* r, uint64_t time_ps);
void            uscope_state_free(uscope_state_t* s);

bool     uscope_slot_valid(const uscope_state_t* s, uint16_t storage_id,
                            uint16_t slot);
uint64_t uscope_slot_field(const uscope_state_t* s, uint16_t storage_id,
                            uint16_t slot, uint16_t field);
uint32_t uscope_storage_occupancy(const uscope_state_t* s,
                                   uint16_t storage_id);

// ── Events ──
uscope_event_iter_t* uscope_events_in_range(uscope_reader_t* r,
                                             uint64_t time_start_ps,
                                             uint64_t time_end_ps);
bool uscope_event_next(uscope_event_iter_t* it, uint64_t* time_ps,
                        uint16_t* event_type_id, const void** payload);
void uscope_event_iter_free(uscope_event_iter_t* it);

// ── Live tailing ──
bool uscope_poll_new_segments(uscope_reader_t* r);

11. Konata Trace Reconstruction

This section demonstrates that µScope's two primitives (storages + events) carry all information needed to reconstruct a Konata-format pipeline visualization.

Konata cmdµScope equivalent
I (create)DA_SLOT_SET on entity catalog slot
L (label)Entity catalog fields (pc, inst_bits) decoded by protocol plugin
S (stage start)stage_transition event
E (stage end)Next stage_transition or entity cleared/flushed
R (retire)DA_SLOT_CLEAR on entity catalog slot
W (flush)flush event with entity ID
C (cycle)Absolute timestamp (segment base + cumulative LEB128 deltas)
Dependency arrowsdependency events linking entity IDs

See the cpu protocol specification for the full reconstruction algorithm (§9 of that document).


12. Design Rationale

  1. Two primitives: Storages and events are sufficient to model any time-series structured data. Entities, counters, annotations, and dependencies are protocol-level patterns built on top.

  2. Two-layer architecture: Format never changes when adding DUT types. Only new protocol specs are written.

  3. Schema-driven, self-describing: Unknown protocols render generically.

  4. String pool: Arbitrary-length names, no wasted padding, smaller structs. One pool shared by DUT descriptor and schema.

  5. No styling in transport: Colors, line styles, layout rules, display hints (hex, hidden, key) belong in the protocol layer or viewer configuration. The transport layer is pure data.

  6. Append-only segments with backward chain: Segments are appended during simulation with no pre-allocated tables. A tail_offset in the file header lets readers discover new segments. At finalization, a flat segment table is built for fast random access. This supports streaming write, live read, and fast seek — all from a single file.

  7. Checkpoint + delta: O(1) seek to segment, O(n) replay within segment. Cycle timestamps are LEB128 delta-encoded — 1 byte per frame for consecutive cycles instead of 8.

  8. Mipmap summaries: O(screen_pixels) overview rendering. Summary semantics are opaque to the transport — the protocol layer defines what each field means.

  9. Single file, section-based: Portable, self-locating sections. No sidecar files.

  10. Chunked preamble: DUT descriptor, schema, and trace config are typed chunks with length headers. Older readers skip unknown chunk types, so new metadata can be added (embedded ELF, source maps, protocol config) without bumping the format version.

  11. Scoped hierarchy with per-scope protocols: Storages and events are organized into a tree of scopes rooted at / (matching hardware hierarchy: SoC → tile → core). Each scope can declare its own protocol, enabling mixed-protocol traces (CPU + DMA + NoC in one file). No inheritance — each scope is explicit.


13. Comparison with FST

13.1 Core Difference: Signals vs. Structures

AspectFSTµScope
Data modelFlat signals (bit-vectors)Typed structures + protocol semantics
SemanticsNoneSchema (transport) + protocol (domain)
AggregationExternal post-processingBuilt-in mipmaps
ExtensibilityNew signals onlyNew protocols, same format

13.2 When FST is Better

  • Signal-level debugging (exact wire values)
  • RTL verification (waveform comparison)
  • Tool ecosystem (GTKWave, Surfer, DVT)
  • Zero instrumentation cost ($dumpvars)

13.3 When µScope is Better

  • Microarchitectural introspection
  • Large structures (1024-entry ROB = one sparse storage)
  • Performance analysis with built-in summaries
  • Billion-cycle interactive exploration
  • Non-RTL environments (architectural simulators)
  • Multiple DUT types with one format

13.4 Complementary Use

PhaseToolWhy
RTL signal debugFST + GTKWaveBit-accurate, zero setup
Microarch explorationµScope viewerStructured, schema-aware
Performance analysisµScope summariesMulti-resolution aggregation
Bug root-causeµScope → FSTFind cycle in µScope, drill into FST

14. Version History

VersionDateChanges
1.02025-xx-xxInitial draft (CPU-specific)
2.02025-xx-xxArchitecture-agnostic, schema-driven
3.02025-xx-xxTransport/protocol layer separation
3.12025-xx-xxString pools, styling removed, Konata proof
4.02025-xx-xxAggressive simplification:
— Entities removed (modeled as storages)
— Annotations removed (modeled as events)
— Counters removed (modeled as 1-slot storages)
— SF_CIRCULAR, SF_CAM removed (head/tail = fields)
— DA_HEAD, DA_TAIL, DA_COUNTER_* removed
— Field/event/counter display flags removed
— Summary source/aggregation semantics removed
— Two string pools merged into one
— DUT descriptor simplified (vendor/version = properties)
— Delta actions: 7 → 3 (SET, CLEAR, ADD); 4 in v0.3
— Section types: 7 → 4
— LEB128 delta-encoded cycle in cycle frames
— Append-only segment chain with tail_offset
— Live read support (no finalization required)
— Segment table moved to finalization-only
4.12026-xx-xxChunked preamble:
— File header: 64 → 48 bytes
— DUT descriptor, schema, trace config become chunks
dut_desc_offset, schema_offset removed from header
checkpoint_interval moved to CHUNK_TRACE_CONFIG
dut_desc_t.size, schema_header_t.size removed
— Unknown chunk types skipped (forward compatibility)
storage_id widened to uint16 throughout
num_enums narrowed to uint8
field_def_t.size removed (derived from type)
— Compact deltas: per-frame op_format + file flag
num_deltasnum_cycle_frames
— Payload wire format specified (tight packing, LE)
— Live-read commit ordering specified
— Scopes: hierarchical grouping of storages/events
— Per-scope protocol assignment (multi-protocol traces)
4.22026-xx-xxPicosecond time model:
— All timestamps in picoseconds (universal time axis)
— Clock domain definitions in schema (name + period)
— Scopes assigned to clock domains
total_cyclestotal_time_ps
cycle_start/cycle_endtime_start_ps/time_end_ps
checkpoint_intervalcheckpoint_interval_ps
4.32026-03-21Interleaved frame format (v0.2):
F_INTERLEAVED_DELTAS flag (bit 7)
— Tagged item stream replaces separate op/event arrays
— Preserves call order of ops and events within a cycle
version_minor bumped to 2
F_COMPACT_DELTAS ignored when interleaved is set
4.42026-03-29Trace summary + buffer flag:
SF_BUFFER storage flag (bit 1)
SECTION_COUNTER_SUMMARY section type (0x0010)
— TraceSummary (TSUM) replaces abstract summary §6
— Instruction density mipmap + counter mipmaps
— Backward-compatible CSUM reader for legacy files
4.52026-04-01Storage properties (v0.3):
StorageDef header: 12 → 16 bytes
num_properties + reserved fields added
FieldDef[num_properties] appended after slot fields
DA_PROP_SET delta action (0x04) for property updates
— Checkpoint blocks include property data after slot data
version_minor bumped to 3
SchemaHeader size corrected: 14 → 12 bytes

15. Glossary

TermDefinition
CheckpointFull snapshot of all storage state at a segment boundary. Enables random access without replaying from the start.
ChunkA typed, length-prefixed block in the preamble. Unknown chunk types are skipped for forward compatibility.
Clock domainA named clock with a period in picoseconds. Scopes are assigned to clock domains for cycle-number display.
Cycle frameOne timestamp's worth of delta operations and events. v0.1: separate op/event arrays (§8.6.1). v0.2: interleaved tagged items preserving call order (§8.6.6).
DeltaA single state change within a cycle frame (DA_SLOT_SET, DA_SLOT_CLEAR, DA_SLOT_ADD, or DA_PROP_SET).
DUTDevice Under Test. The hardware being traced.
EventA timestamped occurrence with a schema-defined typed payload. Fire-and-forget (no persistent state).
FinalizationThe process of writing summary, string table, segment table, and section table at trace close. Sets F_COMPLETE.
LEB128Little-Endian Base 128. Variable-length unsigned integer encoding. Used for time deltas.
MipmapMulti-resolution summary pyramid. Each level aggregates the level below by a fan-out factor.
PreambleThe chunk stream between the file header and the first segment. Contains DUT, schema, and trace config.
Protocol layerDefines semantic meaning for a specific DUT type (e.g. cpu). Assigned per-scope. Not part of this spec.
SchemaImmutable definition of all scopes, storages, events, enums, and summary fields. Written once at creation.
ScopeA named node in a hierarchical tree rooted at /. Groups storages and events. Optionally declares a protocol.
SegmentOne checkpoint-interval's worth of data: a checkpoint followed by compressed deltas.
SlotOne entry in a storage array. Contains one value per field defined in the storage schema.
StorageA named, fixed-size array of typed slots. State is mutated by deltas and snapshotted in checkpoints.
String poolPacked, null-terminated UTF-8 strings referenced by uint16_t offsets. Shared by DUT descriptor and schema.
String tableOptional section for runtime strings (e.g. disassembled instructions) referenced by FT_STRING_REF fields.
Tail offsetFile header field pointing to the last completed segment. Updated after each segment flush. Enables live reading.
Transport layerThe binary file format defined by this spec. Knows about storages, events, and segments — nothing else.

µScope cpu Protocol Specification

Version: 0.1-draft Protocol identifier: cpu Transport version: µScope 0.x


1. Overview

The cpu protocol defines conventions for tracing any pipelined CPU — in-order, out-of-order, VLIW, or multi-threaded — using the µScope transport layer. It does not prescribe a fixed schema. Instead, it defines semantic conventions that a DUT writer follows and a viewer relies on to render pipeline visualizations, occupancy charts, and performance summaries without prior knowledge of the specific microarchitecture.

1.1 Design Principles

  1. Generic over specific. The protocol works for a 5-stage in-order core and a 20-stage OoO core alike. The DUT declares its structures; the viewer renders whatever it finds.

  2. Convention over configuration. Semantics are conveyed through field names, storage shapes, and DUT properties — not through protocol-specific binary metadata.

  3. Viewer decodes, trace stores data. The trace carries raw values (PC, instruction bits). The viewer decodes disassembly, register names, etc. using the ELF and ISA knowledge.

  4. Entity-centric. Every in-flight instruction has a unique ID. All structures reference entities by ID. The viewer joins on this ID to build per-instruction timelines.


2. Concepts

2.1 Entities

An entity is an in-flight instruction (or micro-op). Each entity occupies a slot in the entity catalog storage and is referenced by its slot index throughout the pipeline.

  • Entity ID = slot index in the entity catalog (U32).
  • When an instruction is fetched, the writer allocates a slot (DA_SLOT_SET on its fields). When it retires or is flushed, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused.
  • The entity catalog must be sparse.

2.2 Buffers

A buffer is any storage whose slots hold entity references — a hardware structure that entities pass through or reside in. Examples: ROB, issue queues, load/store queues, scoreboards, reservation stations.

A storage is recognized as a buffer if it contains a field named entity_id (§3.2). The viewer automatically tracks entity membership in every buffer.

2.3 Stages

The viewer renders a per-entity Gantt chart showing which pipeline stage each instruction is in over time. Since an entity can occupy multiple buffers simultaneously (e.g., ROB + issue queue + executing), stage progression is tracked explicitly via stage_transition events (§5.1), not inferred from buffer membership.

Buffers and stages are orthogonal:

  • Buffers model where an entity physically resides (ROB slot 42, LQ slot 7). An entity can be in multiple buffers at once.
  • Stages model logical pipeline progress (fetch → decode → ... → retire). An entity is in exactly one stage at any time.

The DUT declares the stage ordering via pipeline_stages (§4.1) and emits a stage_transition event each time an entity advances. The viewer maintains a current_stage per entity and draws Gantt bars from stage entry/exit times.

2.4 Counters

A counter is a 1-slot, non-sparse storage with numeric fields, mutated via DA_SLOT_ADD. The viewer infers counters from this shape and renders them as line graphs or sparklines. No protocol markup is needed.

2.5 Events

Events model instantaneous occurrences attached to entities or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.


3. Entity Catalog

3.1 Storage Convention

The entity catalog is a storage named entities.

PropertyValue
Nameentities
Sparseyes (SF_SPARSE)
Num slotsmax concurrent in-flight entities (DUT-specific)

3.2 Required Fields

Field nameTypeDescription
entity_idU32Unique entity ID (equals the slot index)
pcU64Program counter
inst_bitsU32Raw instruction bits

3.3 Optional Fields

The DUT may add any additional fields. Common examples:

Field nameTypeDescription
thread_idU16Hardware thread / hart ID
is_compressedBOOLCompressed instruction (RVC, Thumb, ...)
priv_levelENUMPrivilege level at fetch

3.4 Entity Lifecycle

Fetch:   DA_SLOT_SET  entities[id].entity_id = id
         DA_SLOT_SET  entities[id].pc = ...
         DA_SLOT_SET  entities[id].inst_bits = ...

Retire:  DA_SLOT_CLEAR entities[id]

Flush:   DA_SLOT_CLEAR entities[id]
         (plus a flush event, §5.4)

The entity_id field is always equal to the slot index. It is stored explicitly so that buffer storages and events can reference it using a uniform U32 field, independent of the transport's slot indexing.

Slot reuse: After DA_SLOT_CLEAR, the slot may be reused for a new instruction. The new occupant is a logically distinct entity — the viewer treats each clear/set cycle as a new entity lifetime. The viewer must not carry state (stage, annotations, dependencies) across a clear boundary.


4. Buffers and Stages

4.1 Stage Ordering via DUT Properties

The DUT declares pipeline stages using a DUT property:

pipeline_stages = "fetch,decode,rename,dispatch,issue,execute,complete,retire"

The value is a comma-separated list in pipeline order (earliest first). The viewer uses this ordering for Gantt chart column layout and coloring. Stage names must match the values used in stage_transition events (§5.1).

4.2 Buffer Storage Convention

Any storage with a field named entity_id of type U32 is a buffer.

PropertyValue
Sparseyes (SF_SPARSE)
Num slotshardware structure capacity

4.3 Required Buffer Fields

Field nameTypeDescription
entity_idU32References entity catalog slot

4.4 Optional Buffer Fields

The DUT may add structure-specific fields:

Field nameTypeDescription
completedBOOLExecution completed (ROB)
addrU64Memory address (LQ/SQ)
readyBOOLOperands ready (IQ/scoreboard)
fu_typeENUMFunctional unit assigned

4.5 Buffer Operations

Insert:  DA_SLOT_SET  rob[slot].entity_id = id
Remove:  DA_SLOT_CLEAR rob[slot]
Update:  DA_SLOT_SET  rob[slot].completed = 1

5. Standard Events

The protocol defines the following event names. stage_transition is required for Gantt chart rendering; all others are optional. The viewer renders recognized events with specific visualizations and unknown events generically (name + fields in a tooltip).

5.1 stage_transition

Explicit pipeline stage change for an entity. The DUT emits this event each time an instruction advances to a new pipeline stage. Superscalar cores emit multiple stage_transition events in the same cycle frame (e.g., a 4-wide machine retiring 4 instructions produces 4 events).

Field nameTypeDescription
entity_idU32Entity that advanced
stageENUM(pipeline_stage)Stage the entity entered

The enum must be named pipeline_stage in the schema. Its values must match the names declared in the pipeline_stages DUT property (§4.1). For example:

ValueName
0fetch
1decode
2rename
3dispatch
4issue
5execute
6complete
7retire

The enum is DUT-defined — an in-order core might have just fetch, decode, execute, memory, writeback.

The viewer maintains a current_stage per entity. A Gantt bar for a stage spans from the time the entity entered it until the time it entered the next stage (or was cleared/flushed). Multi-cycle stages (e.g., a long-latency divide in execute) require no special handling — the entity simply stays in its current stage until the next stage_transition event.

5.2 annotate

Free-text annotation attached to an entity.

Field nameTypeDescription
entity_idU32Target entity
textSTRING_REFAnnotation text

Viewer: shows as a label on the entity's Gantt bar.

5.3 dependency

Data or structural dependency between two entities.

Field nameTypeDescription
src_idU32Producer entity
dst_idU32Consumer entity
dep_typeENUM(dep_type)Dependency kind

Standard dep_type enum values:

ValueName
0raw
1war
2waw
3structural

Viewer: draws an arrow from producer to consumer in the Gantt chart.

5.4 flush

Entity was squashed before retirement.

Field nameTypeDescription
entity_idU32Flushed entity
reasonENUM(flush_reason)Cause

Standard flush_reason enum values:

ValueName
0mispredict
1exception
2interrupt
3pipeline_clear

Viewer: marks the entity's Gantt bar with a squash indicator.

5.5 stall

Pipeline stall (not tied to a specific entity).

Field nameTypeDescription
reasonENUM(stall_reason)Stall cause

Standard stall_reason enum values are DUT-defined. Common examples: rob_full, iq_full, lq_full, sq_full, fetch_miss, dcache_miss, frontend_stall.

Viewer: renders a colored band on the timeline.


6. Counters

No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.

Common counters:

Storage nameFieldsMeaning
committed_insnscount: U64Retired instructions
bp_missescount: U64Branch mispredictions
dcache_missescount: U64D-cache misses
icache_missescount: U64I-cache misses

Writer updates via DA_SLOT_ADD:

uscope_slot_add(w, STOR_COMMITTED_INSNS, 0, FIELD_COUNT, 4);  // retired 4 this cycle

7. Summary Fields

The protocol defines standard summary field names for mipmap rendering. The viewer recognizes these and aggregates them appropriately.

Field nameTypeMeaning
committedU32Instructions committed in bucket
cycles_activeU32Non-idle cycles in bucket
flushesU16Flush events in bucket
bp_missesU16Branch mispredictions in bucket

Per-buffer occupancy summaries use the naming pattern <storage_name>_occ (e.g., rob_occ). The value is the sum of occupancy samples in the bucket; divide by cycles_active for average.

DUT-specific summary fields are rendered as generic bar charts.


8. DUT Properties

Properties use the cpu. key prefix so they coexist with other protocols in multi-protocol traces.

8.1 Required Properties

KeyDescriptionExample
dut_nameDUT instance nameboom_core_0
cpu.protocol_versionVersion of the cpu protocol0.1
cpu.isaInstruction set architectureRV64GC
cpu.pipeline_stagesComma-separated stage names, in orderfetch,...,retire

8.2 Optional Properties

KeyDescriptionExample
cpu.fetch_widthInstructions fetched per cycle4
cpu.commit_widthInstructions retired per cycle4
cpu.elf_pathPath to ELF for disassembly/path/to/fw.elf
cpu.vendorDUT vendorsifive

9. Viewer Reconstruction

9.1 Opening a Trace

  1. Read preamble → parse schema and DUT properties
  2. Walk scope tree from root / → find all scopes with protocol = "cpu"; each is a core
  3. Per core scope: identify entities storage (entity catalog), find all buffers (storages with entity_id field), identify counters (1-slot non-sparse storages)
  4. Read cpu.pipeline_stages property → build ordered stage list
  5. If cpu.elf_path property exists, load ELF for disassembly

9.2 Gantt Chart Rendering

For a time range [T0, T1) in picoseconds:

  1. Seek to segment covering T0 (binary search or chain walk)
  2. Load checkpoint → initial state of all storages
  3. Replay deltas and events T0..T1, tracking per-entity:
    • Birth: entity slot becomes valid in entities
    • Stage transitions: stage_transition event → record (entity_id, stage, timestamp)
    • Death: entity slot cleared in entities (retire or flush)
  4. For each entity, emit Gantt bars: each stage spans from its stage_transition timestamp until the next transition (or death)
  5. Entity labels: read pc and inst_bits from entity catalog, decode via ISA disassembler
  6. Dependency arrows: dependency events in the range
  7. Flush markers: flush events in the range
  8. Convert timestamps to domain-local cycle numbers for display using the scope's clock domain period

9.3 Occupancy View

For each buffer, count valid slots per cycle. The mipmap summary (<name>_occ fields) gives this at coarse granularity; delta replay gives exact per-cycle values when zoomed in.

9.4 Counter Graphs

Read counter storages at each cycle frame (via DA_SLOT_ADD deltas). Compute rates (delta / cycles) for display. Mipmap summaries provide pre-aggregated values for zoomed-out views.


10. Example: BOOM-like OoO Core

10.1 DUT Properties

dut_name              = "boom_tile0_core0"
cpu.isa               = "RV64GC"
cpu.fetch_width       = "4"
cpu.commit_width      = "4"
cpu.elf_path          = "/workspace/fw.elf"
cpu.pipeline_stages   = "fetch,decode,rename,dispatch,issue,execute,complete,retire"

10.2 Schema

Scopes:
  /       (id=0, root,      protocol=none)
  core0   (id=1, parent=0,  protocol="cpu")

Enums:
  pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
                  issue(4), execute(5), complete(6), retire(7)
  dep_type:       raw(0), war(1), waw(2), structural(3)
  flush_reason:   mispredict(0), exception(1), interrupt(2)
  stall_reason:   rob_full(0), iq_full(1), lq_full(2), sq_full(3),
                  fetch_miss(4), dcache_miss(5)

Storages (all scope=core0):
  entities    (sparse, 512 slots):  entity_id:U32, pc:U64, inst_bits:U32
  rob         (sparse, 256 slots):  entity_id:U32, completed:BOOL
  iq_int      (sparse, 48 slots):   entity_id:U32
  iq_fp       (sparse, 32 slots):   entity_id:U32
  iq_mem      (sparse, 48 slots):   entity_id:U32
  lq          (sparse, 32 slots):   entity_id:U32, addr:U64
  sq          (sparse, 32 slots):   entity_id:U32, addr:U64
  committed   (dense, 1 slot):      count:U64
  bp_misses   (dense, 1 slot):      count:U64

Events (all scope=core0):
  stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)
  annotate:         entity_id:U32, text:STRING_REF
  dependency:       src_id:U32, dst_id:U32, type:ENUM(dep_type)
  flush:            entity_id:U32, reason:ENUM(flush_reason)
  stall:            reason:ENUM(stall_reason)

Note: transient stages (fetch, decode, execute, etc.) are modeled purely via stage_transition events — no storages needed. Only physical structures that hold entities (ROB, IQ, LQ, SQ) are storages.

10.3 Example: 5-Stage In-Order Core

Same protocol, minimal schema:

DUT properties:
  cpu.pipeline_stages  = "fetch,decode,execute,memory,writeback"

Scopes:
  /       (id=0, root,      protocol=none)
  core0   (id=1, parent=0,  protocol="cpu")

Enums:
  pipeline_stage: fetch(0), decode(1), execute(2), memory(3), writeback(4)

Storages (all scope=core0):
  entities    (sparse, 8 slots):    entity_id:U32, pc:U64, inst_bits:U32
  committed   (dense, 1 slot):      count:U64

Events (all scope=core0):
  stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)

An in-order core may have no buffers at all — just the entity catalog and stage transitions. The viewer renders a Gantt chart purely from events.

10.4 Example: Dual-Core SoC

Multi-core uses transport-level scopes (§4.4 of the transport spec). Each core is a scope with protocol = "cpu". Storages and event types are defined per-scope, so entity IDs are per-core and no core_id field is needed in event payloads.

DUT properties:
  dut_name              = "my_soc"
  cpu.pipeline_stages   = "fetch,decode,rename,dispatch,issue,execute,complete,retire"
  cpu.isa               = "RV64GC"
  cpu.elf_path          = "/workspace/fw.elf"

Scopes:
  /            (id=0, root,         protocol=none)
  cpu_cluster  (id=1, parent=0,     protocol=none)
  core0        (id=2, parent=1,     protocol="cpu")
  core1        (id=3, parent=1,     protocol="cpu")

Enums (shared):
  pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
                  issue(4), execute(5), complete(6), retire(7)

Storages:
  entities  (scope=core0, sparse, 512):  entity_id:U32, pc:U64, inst_bits:U32
  rob       (scope=core0, sparse, 256):  entity_id:U32
  committed (scope=core0, dense, 1):     count:U64

  entities  (scope=core1, sparse, 512):  entity_id:U32, pc:U64, inst_bits:U32
  rob       (scope=core1, sparse, 256):  entity_id:U32
  committed (scope=core1, dense, 1):     count:U64

Events:
  stage_transition (scope=core0): entity_id:U32, stage:ENUM(pipeline_stage)
  stage_transition (scope=core1): entity_id:U32, stage:ENUM(pipeline_stage)
  flush            (scope=core0): entity_id:U32, reason:ENUM(flush_reason)
  flush            (scope=core1): entity_id:U32, reason:ENUM(flush_reason)

The viewer finds all scopes with protocol = "cpu", renders a per-core pipeline view for each, and can show them side-by-side.

Storage names (entities, rob) repeat across scopes — the storage_id is globally unique, but the name + scope combination gives the viewer the display path (core0/entities, core1/rob).

Cross-core events (cache coherence, IPIs) can be defined at the cpu_cluster scope with fields referencing the relevant scope IDs.


11. Version History

VersionDateChanges
0.12026-xx-xxInitial draft

µScope noc Protocol Specification

Version: 0.1-draft Protocol identifier: noc Transport version: µScope 0.x


1. Overview

The noc protocol defines conventions for tracing any on-chip interconnect — crossbar, mesh, ring, tree, or point-to-point — using the µScope transport layer. It works with any bus protocol: AXI4, CHI, ACE, TileLink, UCIe, or proprietary fabrics.

Like the cpu protocol, it does not prescribe a fixed schema. Instead, it defines semantic conventions that a DUT writer follows and a viewer relies on to render transaction Gantt charts, topology maps, latency histograms, and traffic heatmaps without prior knowledge of the specific interconnect microarchitecture.

1.1 Design Principles

  1. Generic over specific. The protocol works for a single-port AXI crossbar and a 64-node CHI mesh alike. The DUT declares its structures; the viewer renders whatever it finds.

  2. Convention over configuration. Semantics are conveyed through field names, storage shapes, and scope properties — not through protocol-specific binary metadata.

  3. Entity-centric. Every in-flight transaction has a unique ID in a transaction catalog. All buffers, events, and stages reference transactions by this ID. The viewer joins on it to build per-transaction timelines.

  4. Topology-agnostic. The protocol does not encode topology in the data model. Topology is declared via scope properties; the viewer uses it for visualization only.


2. Concepts

2.1 Transactions (Entities)

A transaction is an in-flight bus operation (read, write, snoop, etc.). Each transaction occupies a slot in the transaction catalog storage and is referenced by its slot index throughout the interconnect.

  • Transaction ID = slot index in the transaction catalog (U32).
  • When a transaction is issued, the writer allocates a slot (DA_SLOT_SET on its fields). When it completes, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused.
  • The transaction catalog must be sparse.

Transactions in the noc protocol are the direct analogue of entities in the cpu protocol (cpu spec §2.1).

2.2 Buffers

A buffer is any storage whose slots hold transaction references — a hardware structure that transactions pass through or reside in. Examples: virtual channel (VC) buffers, reorder buffers, outstanding request tables, credit pools.

A storage is recognized as a buffer if it contains a field named txn_id (§3.2). The viewer automatically tracks transaction membership in every buffer.

2.3 Stages

The viewer renders a per-transaction Gantt chart showing which pipeline stage each transaction is in over time. Since a transaction can occupy multiple buffers simultaneously (e.g., outstanding request table

  • VC buffer + arbitrating), stage progression is tracked explicitly via stage_transition events (§5.1), not inferred from buffer membership.

Buffers and stages are orthogonal:

  • Buffers model where a transaction physically resides (VC slot 3, ROB entry 7). A transaction can be in multiple buffers at once.
  • Stages model logical progression through the interconnect (issue → route → arbitrate → traverse → deliver → respond). A transaction is in exactly one stage at any time.

The DUT declares the stage ordering via noc.pipeline_stages (§4.1) and emits a stage_transition event each time a transaction advances. The viewer maintains a current_stage per transaction and draws Gantt bars from stage entry/exit times.

2.4 Counters

A counter is a 1-slot, non-sparse storage with numeric fields, mutated via DA_SLOT_ADD. The viewer infers counters from this shape and renders them as line graphs or sparklines. No protocol markup is needed.

2.5 Events

Events model instantaneous occurrences attached to transactions or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.

2.6 Router Sub-Scopes

For multi-router interconnects, each router can be a child scope with protocol="noc.router". This enables per-router buffers, counters, and events while keeping the transaction catalog on the nearest ancestor noc scope.

/                     (protocol=none)
  noc0                (protocol="noc")        ← transaction catalog here
    router_0_0        (protocol="noc.router") ← per-router buffers/counters
    router_0_1        (protocol="noc.router")
    router_1_0        (protocol="noc.router")
    router_1_1        (protocol="noc.router")

A noc.router scope does not have its own transaction catalog. It references transactions from the parent noc scope's catalog via the txn_id field. The viewer resolves txn_id by walking up the scope tree to the nearest noc scope.

2.7 Cross-Scope Transaction Handoff

When a transaction crosses a scope boundary — e.g., a chiplet-to-chiplet transfer via a D2D link, or a protocol bridge (AXI→CHI) — it receives a new txn_id in the destination scope. The txn_handoff event (§5.7) stitches the two identities together, enabling end-to-end latency tracking across scope boundaries.

The txn_handoff event is emitted at a common ancestor scope of the source and destination scopes. The viewer joins on these events to build cross-scope transaction timelines.


3. Transaction Catalog

3.1 Storage Convention

The transaction catalog is a storage named transactions.

PropertyValue
Nametransactions
Sparseyes (SF_SPARSE)
Num slotsmax concurrent in-flight transactions (DUT-specific)

3.2 Required Fields

Field nameTypeDescription
txn_idU32Unique transaction ID (equals the slot index)
opcodeENUMTransaction type (read, write, snoop, etc.)
addrU64Target address
lenU16Burst length (number of beats)
sizeU8Beat size (log2 bytes, e.g., 3 = 8 bytes)
src_portU16Source port / initiator ID
dst_portU16Destination port / target ID

3.3 Optional Fields

The DUT may add any additional fields. Common examples:

Field nameTypeDescription
qosU8Quality-of-service priority
txn_classENUMTransaction class (posted, non-posted, etc.)
protU8Protection bits (privileged, secure, etc.)
cacheU8Cache allocation hints
snoopU8Snoop attribute bits
domainENUMShareability domain
exclBOOLExclusive access flag
tagU16Transaction tag (for reorder tracking)

3.4 Transaction Lifecycle

Issue:      DA_SLOT_SET  transactions[id].txn_id = id
            DA_SLOT_SET  transactions[id].opcode = ...
            DA_SLOT_SET  transactions[id].addr = ...
            DA_SLOT_SET  transactions[id].len = ...
            DA_SLOT_SET  transactions[id].size = ...
            DA_SLOT_SET  transactions[id].src_port = ...
            DA_SLOT_SET  transactions[id].dst_port = ...

Complete:   DA_SLOT_CLEAR transactions[id]

The txn_id field is always equal to the slot index. It is stored explicitly so that buffer storages and events can reference it using a uniform U32 field, independent of the transport's slot indexing.


4. Buffers and Stages

4.1 Stage Ordering via Scope Properties

Each noc scope declares pipeline stages using a scope property:

noc.pipeline_stages = "issue,route,arbitrate,traverse,deliver,respond"

The value is a comma-separated list in pipeline order (earliest first). The viewer uses this ordering for Gantt chart column layout and coloring. Stage names must match the values used in stage_transition events (§5.1). Each noc scope declares its own stages, enabling heterogeneous interconnects in the same trace.

4.2 Buffer Storage Convention

Any storage with a field named txn_id of type U32 is a buffer.

PropertyValue
Sparseyes (SF_SPARSE)
Num slotshardware structure capacity

4.3 Required Buffer Fields

Field nameTypeDescription
txn_idU32References transaction catalog slot

4.4 Optional Buffer Fields

The DUT may add structure-specific fields:

Field nameTypeDescription
vcU8Virtual channel assignment
priorityU8Arbitration priority
flit_typeENUMFlit type (header, data, tail)
creditsU8Available credits

4.5 Buffer Operations

Insert:  DA_SLOT_SET  vc_buf[slot].txn_id = id
Remove:  DA_SLOT_CLEAR vc_buf[slot]
Update:  DA_SLOT_SET  vc_buf[slot].credits = 3

4.6 Common Buffers

Buffer nameModels
vc_buf_<port>Per-port virtual channel buffer
robReorder buffer for out-of-order completion
ortOutstanding request table / tracker
snoop_filterSnoop filter entries
retry_bufTransactions awaiting retry

4.7 Example Stage Sets

AXI4 crossbar:

noc.pipeline_stages = "ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"

CHI mesh:

noc.pipeline_stages = "req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"

TileLink ring:

noc.pipeline_stages = "acquire,route,grant,grant_ack"

5. Standard Events

The protocol defines the following event names. stage_transition is required for Gantt chart rendering; all others are optional. The viewer renders recognized events with specific visualizations and unknown events generically (name + fields in a tooltip).

5.1 stage_transition

Explicit stage change for a transaction. The DUT emits this event each time a transaction advances to a new pipeline stage.

Field nameTypeDescription
txn_idU32Transaction that advanced
stageENUM(pipeline_stage)Stage the transaction entered

The pipeline_stage enum values must match the names declared in the noc.pipeline_stages scope property (§4.1). For example (AXI4):

ValueName
0ar_issue
1route
2arbitrate
3transport
4target_accept
5r_data
6r_last

The enum is DUT-defined — a simple crossbar might have just issue, arbitrate, transfer, complete.

The viewer maintains a current_stage per transaction. A Gantt bar for a stage spans from the cycle the transaction entered it until the cycle it entered the next stage (or was cleared).

5.2 beat

Individual data beat in a burst transfer.

Field nameTypeDescription
txn_idU32Parent transaction
beat_numU16Beat number within burst (0-based)
data_bytesU16Bytes transferred in this beat

Viewer: shows beat markers on the transaction's Gantt bar during the data transfer stage. Useful for identifying partial transfers and stalls between beats.

5.3 retry

Transaction retry — the target or interconnect rejected the transaction and it must be re-attempted.

Field nameTypeDescription
txn_idU32Retried transaction
reasonENUM(retry_reason)Cause of retry

Standard retry_reason enum values:

ValueName
0target_busy
1no_credits
2vc_full
3arb_lost
4protocol_retry

Viewer: marks a retry indicator on the transaction's Gantt bar.

5.4 timeout

Watchdog timeout — a transaction exceeded the expected completion time.

Field nameTypeDescription
txn_idU32Timed-out transaction
threshold_cyclesU32Watchdog threshold that was exceeded

Viewer: marks a timeout indicator on the transaction's Gantt bar and highlights it in the topology view.

Credit flow control update on a link.

Field nameTypeDescription
portU16Port ID
directionENUM(credit_direction)Credit grant or consume
creditsU8Number of credits

Standard credit_direction enum values:

ValueName
0grant
1consume

Viewer: renders credit level as a per-port sparkline.

5.6 arb_decision

Arbitration outcome — records which transaction won arbitration at a port.

Field nameTypeDescription
winner_txnU32Transaction that won arbitration
portU16Port where arbitration occurred
num_contendersU8Number of competing transactions

Viewer: shows arbitration events in the timeline. High num_contenders values indicate congestion hotspots.

5.7 txn_handoff

Cross-scope transaction stitching — links a transaction in one scope to its continuation in another scope.

Field nameTypeDescription
src_scopeU16Scope ID of the source transaction
src_txn_idU32Transaction ID in the source scope
dst_scopeU16Scope ID of the destination transaction
dst_txn_idU32Transaction ID in the destination scope

This event is emitted at a common ancestor scope of src_scope and dst_scope. It enables end-to-end latency tracking across chiplet boundaries, protocol bridges, or any other scope boundary where a transaction receives a new identity.

Viewer: draws a handoff arrow between the two transaction timelines and computes end-to-end latency by joining the linked transactions.

5.8 annotate

Free-text annotation attached to a transaction.

Field nameTypeDescription
txn_idU32Target transaction
textSTRING_REFAnnotation text

Viewer: shows as a label on the transaction's Gantt bar.


6. Counters

No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.

Common counters:

Storage nameFieldsMeaning
bytes_txcount: U64Bytes transmitted
bytes_rxcount: U64Bytes received
arb_conflictscount: U64Arbitration conflicts (>1 contender)
retriescount: U64Transaction retries
txn_completedcount: U64Transactions completed

Writer updates via DA_SLOT_ADD:

uscope_slot_add(w, STOR_BYTES_TX, 0, FIELD_COUNT, 64);  // 64 bytes this cycle

For per-router counters, place the counter storage on the router's sub-scope (§2.6).


7. Summary Fields

The protocol defines standard summary field names for mipmap rendering. Each summary field is scoped to its noc scope (via scope_id in summary_field_def_t), so multi-interconnect traces have independent summaries without name collisions.

Field nameTypeMeaning
txn_completedU32Transactions completed in bucket
bytes_transferredU64Total bytes transferred in bucket
avg_latency_ticksU32Average transaction latency in bucket
retriesU16Retry events in bucket

Per-buffer occupancy summaries use the naming pattern <storage_name>_occ (e.g., vc_buf_0_occ). The value is the sum of occupancy samples in the bucket; divide by active cycles for average.

DUT-specific summary fields are rendered as generic bar charts.


8. Scope Properties

Properties are stored on each scope (transport spec §3.4.1). The noc protocol uses the noc. key prefix. Each noc scope carries its own properties, enabling heterogeneous interconnects in the same trace.

Properties that describe the overall trace (e.g., dut_name) belong on the root scope.

8.1 Required Properties (on each noc scope)

KeyDescriptionExample
noc.protocol_versionVersion of the noc protocol0.1
noc.bus_protocolUnderlying bus protocolAXI4, CHI, TileLink, UCIe
noc.topologyInterconnect topologycrossbar, mesh, ring, tree, p2p
noc.pipeline_stagesComma-separated stage names, in orderissue,route,arbitrate,traverse,deliver,respond
clock.period_psClock period in picoseconds1000 (1 GHz)

8.2 Optional Properties (on each noc scope)

KeyDescriptionExample
noc.dim_xMesh X dimension4
noc.dim_yMesh Y dimension4
noc.num_vcsNumber of virtual channels per port4
noc.data_widthData bus width in bits128
noc.addr_widthAddress bus width in bits48
noc.num_portsTotal number of ports16
noc.routingRouting algorithmxy, adaptive

8.3 Root Scope Properties

KeyDescriptionExample
dut_nameDUT instance namemy_soc
vendorDUT vendor (top-level)acme

9. Viewer Reconstruction

9.1 Opening a Trace

  1. Read preamble → parse schema (including scope properties)
  2. Walk scope tree from root / → find all scopes with protocol = "noc"; each is an interconnect instance
  3. Per noc scope: a. Read scope properties → noc.pipeline_stages, noc.bus_protocol, noc.topology, etc. b. Identify transactions storage (transaction catalog) c. Find all buffers (storages with txn_id field) d. Identify counters (1-slot non-sparse storages) e. Find child scopes with protocol = "noc.router" for per-router detail
  4. Per noc scope: build ordered stage list from noc.pipeline_stages
  5. If noc.topology = "mesh", read noc.dim_x and noc.dim_y for topology rendering

9.2 Transaction Gantt Chart

For a cycle range [C0, C1):

  1. Seek to segment covering C0 (binary search or chain walk)
  2. Load checkpoint → initial state of all storages
  3. Replay deltas and events C0..C1, tracking per-transaction:
    • Birth: transaction slot becomes valid in transactions
    • Stage transitions: stage_transition event → record (txn_id, stage, cycle)
    • Death: transaction slot cleared in transactions (completion)
  4. For each transaction, emit Gantt bars: each stage spans from its stage_transition cycle until the next transition (or death)
  5. Transaction labels: read opcode, addr, src_port, dst_port from the transaction catalog
  6. Retry markers: retry events in the range
  7. Beat markers: beat events in the range
  8. Timeout markers: timeout events in the range

9.3 Topology View

Using the noc.topology scope property and src_port/dst_port fields from the transaction catalog:

  1. Render the interconnect topology (mesh grid, ring, tree, etc.)
  2. Animate transaction flow by mapping stage_transition events to router positions
  3. Color links by utilization (bytes per cycle / data width)
  4. Highlight congestion hotspots using arb_decision contention data

For mesh topologies, map port IDs to (x, y) coordinates using noc.dim_x and noc.dim_y.

9.4 Latency Histogram

Compute per-transaction latency from birth-to-death ticks in the transactions catalog. Group by opcode, src_port, dst_port, or address range for drill-down analysis.

9.5 Cross-Scope Stitching

  1. Find txn_handoff events across all noc scopes
  2. Join (src_scope, src_txn_id) to (dst_scope, dst_txn_id)
  3. Build end-to-end transaction timelines spanning multiple scopes
  4. Compute end-to-end latency by summing per-scope stage durations

9.6 Occupancy View

For each buffer, count valid slots per cycle. The mipmap summary (<name>_occ fields) gives this at coarse granularity; delta replay gives exact per-cycle values when zoomed in.

9.7 Counter Graphs

Read counter storages at each cycle frame (via DA_SLOT_ADD deltas). Compute rates (delta / cycles) for display. Mipmap summaries provide pre-aggregated values for zoomed-out views.


10. Examples

10.1 AXI4 Crossbar

A simple single-scope NoC tracing an AXI4 crossbar with 4 initiator ports and 2 target ports.

Scopes:
  /           (id=0, root,      protocol=none)
    properties: dut_name="axi_xbar"
  noc0        (id=1, parent=0,  protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="AXI4",
                noc.topology="crossbar", noc.data_width="64",
                noc.num_ports="6", clock.period_ps="1000",
                noc.pipeline_stages="ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"

Enums:
  opcode:         read(0), write(1), read_linked(2), write_cond(3)
  pipeline_stage: ar_issue(0), route(1), arbitrate(2), transport(3),
                  target_accept(4), r_data(5), r_last(6)
  retry_reason:   target_busy(0), no_credits(1), arb_lost(2)
  credit_direction: grant(0), consume(1)

Storages (all scope=noc0):
  transactions  (sparse, 64 slots):   txn_id:U32, opcode:ENUM(opcode), addr:U64,
                                      len:U16, size:U8, src_port:U16, dst_port:U16,
                                      qos:U8
  ort           (sparse, 32 slots):   txn_id:U32
  bytes_tx      (dense, 1 slot):      count:U64
  bytes_rx      (dense, 1 slot):      count:U64
  arb_conflicts (dense, 1 slot):      count:U64
  txn_completed (dense, 1 slot):      count:U64

Events (all scope=noc0):
  stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
  beat:             txn_id:U32, beat_num:U16, data_bytes:U16
  retry:            txn_id:U32, reason:ENUM(retry_reason)
  arb_decision:     winner_txn:U32, port:U16, num_contenders:U8
  link_credit:      port:U16, direction:ENUM(credit_direction), credits:U8
  annotate:         txn_id:U32, text:STRING_REF

10.2 CHI Mesh NoC

A 4x4 CHI mesh with per-router sub-scopes. The transaction catalog lives on the parent noc scope; router sub-scopes hold local buffers and counters.

Scopes:
  /                 (id=0,  root,       protocol=none)
    properties: dut_name="chi_mesh_soc"
  noc0              (id=1,  parent=0,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
                noc.num_vcs="4", noc.data_width="256",
                clock.period_ps="500",
                noc.pipeline_stages="req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"
  router_0_0        (id=2,  parent=1,   protocol="noc.router")
  router_0_1        (id=3,  parent=1,   protocol="noc.router")
  ...
  router_3_3        (id=17, parent=1,   protocol="noc.router")

Enums:
  opcode:         read_no_snp(0), read_once(1), read_shared(2), read_unique(3),
                  write_no_snp(4), write_unique(5), snoop_shared(6),
                  snoop_unique(7), comp_data(8), comp_ack(9)
  pipeline_stage: req_issue(0), req_accept(1), snoop_send(2),
                  snoop_resp(3), dat_transfer(4), comp_ack(5)
  retry_reason:   target_busy(0), no_credits(1), vc_full(2),
                  arb_lost(3), protocol_retry(4)
  txn_class:      req(0), snp(1), dat(2), rsp(3)

Storages (scope=noc0):
  transactions  (sparse, 256 slots):  txn_id:U32, opcode:ENUM(opcode), addr:U64,
                                      len:U16, size:U8, src_port:U16, dst_port:U16,
                                      qos:U8, txn_class:ENUM(txn_class)

Storages (scope=router_0_0, one set per router):
  vc_buf_n      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_s      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_e      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_w      (sparse, 4 slots):    txn_id:U32, vc:U8
  vc_buf_local  (sparse, 4 slots):    txn_id:U32, vc:U8
  bytes_fwd     (dense, 1 slot):      count:U64
  arb_conflicts (dense, 1 slot):      count:U64

Events (scope=noc0):
  stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
  retry:            txn_id:U32, reason:ENUM(retry_reason)
  annotate:         txn_id:U32, text:STRING_REF

Events (scope=router_0_0, one set per router):
  arb_decision:     winner_txn:U32, port:U16, num_contenders:U8
  link_credit:      port:U16, direction:ENUM(credit_direction), credits:U8

The viewer discovers all 16 routers as noc.router children of noc0, maps them to a 4x4 grid via noc.dim_x/noc.dim_y, and renders per-router buffer occupancy alongside the global transaction Gantt chart.

10.3 Multi-Chiplet with D2D

Two chiplets connected via a UCIe D2D link. Each chiplet has its own noc scope with an independent transaction catalog. The txn_handoff event on the SoC-level scope stitches transactions across the link.

Scopes:
  /                       (id=0, root,       protocol=none)
    properties: dut_name="multi_chiplet_soc"
  chiplet0                (id=1, parent=0,   protocol=none)
  chiplet0_noc            (id=2, parent=1,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
                noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
                clock.period_ps="500"
  chiplet1                (id=3, parent=0,   protocol=none)
  chiplet1_noc            (id=4, parent=3,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
                noc.topology="mesh", noc.dim_x="2", noc.dim_y="2",
                noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
                clock.period_ps="500"
  d2d_link                (id=5, parent=0,   protocol="noc")
    properties: noc.protocol_version="0.1", noc.bus_protocol="UCIe",
                noc.topology="p2p",
                noc.pipeline_stages="d2d_issue,phy_encode,link_traverse,phy_decode,d2d_deliver",
                clock.period_ps="500"

Storages:
  transactions (scope=chiplet0_noc, sparse, 256): txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16
  transactions (scope=chiplet1_noc, sparse, 128): txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16
  transactions (scope=d2d_link, sparse, 32):      txn_id:U32, opcode:ENUM, addr:U64,
                                                   len:U16, size:U8, src_port:U16, dst_port:U16

Events (scope=root):
  txn_handoff:  src_scope:U16, src_txn_id:U32, dst_scope:U16, dst_txn_id:U32

Handoff sequence for a cross-chiplet read:

  1. Chiplet 0 issues a read → transactions[42] in chiplet0_noc
  2. The read reaches the D2D egress port → DA_SLOT_CLEAR on chiplet0_noc.transactions[42]
  3. D2D link picks it up → transactions[7] in d2d_link
  4. Root scope emits txn_handoff(src_scope=2, src_txn_id=42, dst_scope=5, dst_txn_id=7)
  5. D2D link delivers to chiplet 1 → DA_SLOT_CLEAR on d2d_link.transactions[7]
  6. Chiplet 1 ingests the read → transactions[19] in chiplet1_noc
  7. Root scope emits txn_handoff(src_scope=5, src_txn_id=7, dst_scope=4, dst_txn_id=19)
  8. The viewer chains: chiplet0_noc:42 → d2d_link:7 → chiplet1_noc:19 and computes end-to-end latency

11. Version History

VersionDateChanges
0.12026-xx-xxInitial draft

Rust Crate API Reference

Crate: uscope Location: crates/uscope/


1. Overview

The uscope Rust crate provides a complete reader and writer for the µScope trace format. It implements the transport layer (file header, preamble, schema, segments, checkpoints, deltas, string table, section table) and the CPU protocol layer (entity catalog, pipeline stages, typed events).

Dependencies

CratePurpose
byteorderLittle-endian integer read/write
lz4_flexPure-Rust LZ4 compression

No other runtime dependencies.


2. Schema Building

Use SchemaBuilder and DutDescBuilder to define the trace structure before writing.

2.1 SchemaBuilder

#![allow(unused)]
fn main() {
use uscope::schema::{SchemaBuilder, FieldSpec};
use uscope::types::SF_SPARSE;

let mut sb = SchemaBuilder::new();

// Clock domain: 5 GHz (200 ps period)
let clk = sb.clock_domain("core_clk", 200);

// Scope hierarchy
sb.scope("root", None, None, None);
let scope = sb.scope("core0", Some(0), Some("cpu"), Some(clk));

// Enum type
let stage_enum = sb.enum_type(
    "pipeline_stage",
    &["fetch", "decode", "execute", "writeback"],
);

// Storage (entity catalog)
let entities = sb.storage(
    "entities", scope, 512, SF_SPARSE,
    &[
        ("entity_id", FieldSpec::U32),
        ("pc",        FieldSpec::U64),
        ("inst_bits", FieldSpec::U32),
    ],
);

// Event type
let stage_ev = sb.event(
    "stage_transition", scope,
    &[
        ("entity_id", FieldSpec::U32),
        ("stage",     FieldSpec::Enum(stage_enum)),
    ],
);

let schema = sb.build();
}

Methods:

MethodReturnsDescription
clock_domain(name, period_ps)u8Add a clock domain
scope(name, parent, protocol, clock_id)u16Add a scope
enum_type(name, values)u8Add an enum type
storage(name, scope, slots, flags, fields)u16Add a storage definition
event(name, scope, fields)u16Add an event type
summary_field(name, type, scope)Add a summary field
strings_mut()&mut StringPoolBuilderAccess the string pool
build()SchemaConsume builder, produce schema

2.2 DutDescBuilder

#![allow(unused)]
fn main() {
use uscope::schema::DutDescBuilder;

let mut dut = DutDescBuilder::new();
dut.property("dut_name", "boom_core_0")
   .property("cpu.isa", "RV64GC")
   .property("cpu.pipeline_stages", "fetch,decode,execute,writeback");

// Build using the schema's shared string pool
let dut_desc = dut.build(sb.strings_mut());
}

2.3 FieldSpec

VariantWire typeSize
FieldSpec::U8FT_U81
FieldSpec::U16FT_U162
FieldSpec::U32FT_U324
FieldSpec::U64FT_U648
FieldSpec::I8FT_I81
FieldSpec::I16FT_I162
FieldSpec::I32FT_I324
FieldSpec::I64FT_I648
FieldSpec::BoolFT_BOOL1
FieldSpec::StringRefFT_STRING_REF4
FieldSpec::Enum(id)FT_ENUM1

3. Writer

Writer<W> writes µScope trace files in streaming, append-only fashion.

3.1 Creating a Writer

#![allow(unused)]
fn main() {
use uscope::writer::Writer;
use std::fs::File;

let file = File::create("trace.uscope")?;
let mut w = Writer::create(file, &dut_desc, &schema, checkpoint_interval_ps)?;
}

The checkpoint_interval_ps parameter controls how often a full checkpoint is written. Smaller intervals allow faster random-access seeks at the cost of larger files.

3.2 Writing Cycles

All storage mutations and events must occur within a begin_cycle / end_cycle pair. Time must be monotonically non-decreasing.

#![allow(unused)]
fn main() {
w.begin_cycle(time_ps);

// Mutate storage slots
w.slot_set(storage_id, slot, field, value);
w.slot_add(storage_id, slot, field, delta);
w.slot_clear(storage_id, slot);

// Emit events (payload is pre-serialized, fields concatenated LE)
w.event(event_type_id, &payload_bytes);

w.end_cycle()?;
}
MethodDescription
begin_cycle(time_ps)Start a cycle frame at the given time
slot_set(storage, slot, field, value)Set a field value (marks slot valid)
slot_clear(storage, slot)Mark slot invalid (sparse only)
slot_add(storage, slot, field, delta)Add to a field value
event(type_id, payload)Emit an event with raw payload
end_cycle()Finish the cycle frame

3.3 String Table

For STRING_REF fields, insert strings into the writer's string table:

#![allow(unused)]
fn main() {
let text_idx = w.string_table.insert("addi x0, x0, 0");
// Use text_idx as the u32 value for a STRING_REF field in event payloads
}

3.4 Finalization

#![allow(unused)]
fn main() {
let file = w.close()?;  // Writes string table, segment table, section table
}

Calling close() sets F_COMPLETE, writes the section table, and returns the underlying writer. The file is then readable by Reader.


4. Reader

Reader opens µScope trace files for random-access reading.

4.1 Opening a File

#![allow(unused)]
fn main() {
use uscope::reader::Reader;

let mut r = Reader::open("trace.uscope")?;
}

Handles both finalized (F_COMPLETE) and in-progress files. For finalized files, the section table is used for fast segment lookup. For in-progress files, the segment chain is walked from tail_offset.

4.2 Metadata Access

#![allow(unused)]
fn main() {
let header = r.header();           // FileHeader
let schema = r.schema();           // Schema (clock domains, scopes, storages, events)
let dut = r.dut_desc();            // DutDesc (key-value properties)
let config = r.trace_config();     // TraceConfig (checkpoint_interval_ps)
let offsets = r.field_offsets();    // Precomputed field offsets per storage

// Look up a DUT property by key
let isa = r.dut_property("cpu.isa");  // Some("RV64GC")

// String table (for STRING_REF field values)
if let Some(st) = r.string_table() {
    let text = st.get(0);  // Some("addi x0, x0, 0")
}
}

4.3 State Reconstruction

Reconstruct the full storage state at any point in time. The reader finds the appropriate segment, loads its checkpoint, and replays deltas up to the target time.

#![allow(unused)]
fn main() {
let state = r.state_at(time_ps)?;

// Query storage state
let valid = state.slot_valid(storage_id, slot);
let value = state.slot_field(storage_id, slot, field_index, &offsets[storage_id]);
}

4.4 Event Queries

#![allow(unused)]
fn main() {
let events = r.events_in_range(t0_ps, t1_ps)?;
for ev in &events {
    println!("t={} type={} payload={:?}", ev.time_ps, ev.event_type_id, ev.payload);
}
}

4.5 Segment-Level Access

#![allow(unused)]
fn main() {
let n = r.segment_count();
let (storages, events, ops) = r.segment_replay(seg_idx)?;
}

segment_replay returns the checkpoint state after full delta replay, plus all events and storage operations (TimedOp) in the segment.

4.6 Live Tailing

For traces being written concurrently:

#![allow(unused)]
fn main() {
loop {
    if r.poll_new_segments()? {
        // New segments available — re-query events or state
    }
    std::thread::sleep(std::time::Duration::from_millis(100));
}
}

5. CPU Protocol Helpers

The protocols::cpu module provides higher-level APIs that implement the CPU protocol conventions on top of the transport-layer primitives.

5.1 CpuSchemaBuilder

Constructs a complete CPU-protocol schema with all standard enums, storages, and events.

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::CpuSchemaBuilder;
use uscope::schema::FieldSpec;

let (dut_builder, mut schema_builder, ids) = CpuSchemaBuilder::new("core0")
    .isa("RV64GC")
    .pipeline_stages(&["fetch", "decode", "rename", "dispatch",
                        "issue", "execute", "complete", "retire"])
    .fetch_width(4)
    .commit_width(4)
    .entity_slots(512)
    .buffer("rob", 256, &[("completed", FieldSpec::Bool)])
    .buffer("iq_int", 48, &[])
    .counter("committed_insns")
    .counter("bp_misses")
    .build();

let dut = dut_builder.build(schema_builder.strings_mut());
let schema = schema_builder.build();
}

Builder methods:

MethodDescription
isa(name)Set ISA (e.g. "RV64GC")
pipeline_stages(names)Define pipeline stage enum
fetch_width(n)Set fetch width DUT property
commit_width(n)Set commit width DUT property
entity_slots(n)Max in-flight entities (default: 512)
elf_path(path)Set ELF path for disassembly
vendor(name)Set vendor DUT property
buffer(name, slots, fields)Add a hardware buffer storage
counter(name)Add a counter (1-slot dense storage)
stall_reasons(names)Override default stall reason enum

CpuIds — returned by build(), contains all assigned IDs:

FieldTypeDescription
scope_idu16CPU scope ID
entities_storage_idu16Entity catalog storage ID
stage_transition_event_idu16Stage transition event type
annotate_event_idu16Annotation event type
dependency_event_idu16Dependency event type
flush_event_idu16Flush event type
stall_event_idu16Stall event type
field_entity_idu16Field index: entity_id
field_pcu16Field index: pc
field_inst_bitsu16Field index: inst_bits
buffersVec<(String, u16)>Buffer (name, storage_id) pairs
countersVec<(String, u16, u16)>Counter (name, storage_id, field) triples

5.2 CpuWriter

Typed helpers that emit the correct transport-layer operations for CPU protocol semantics.

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::CpuWriter;

let cpu = CpuWriter::new(ids);

w.begin_cycle(time_ps);

// Fetch: allocate entity in catalog
cpu.fetch(&mut w, entity_id, pc, inst_bits);

// Stage transition
cpu.stage_transition(&mut w, entity_id, stage_index);

// Retire: clear entity from catalog
cpu.retire(&mut w, entity_id);

// Flush: emit flush event + clear entity
cpu.flush(&mut w, entity_id, reason);

// Annotation: insert text into string table + emit event
cpu.annotate(&mut w, entity_id, "decoded: addi x1, x0, 1");

// Dependency: record data/structural dependency
cpu.dependency(&mut w, src_entity, dst_entity, dep_type);

// Stall
cpu.stall(&mut w, reason);

// Counter increment
cpu.counter_add(&mut w, "committed_insns", 1);

w.end_cycle()?;
}
MethodTransport opsDescription
fetch(w, id, pc, bits)3 × slot_setAllocate entity
stage_transition(w, id, stage)1 × eventPipeline stage change
retire(w, id)1 × slot_clearNormal retirement
flush(w, id, reason)1 × event + 1 × slot_clearSquash
annotate(w, id, text)1 × string_insert + 1 × eventText annotation
dependency(w, src, dst, type)1 × eventData dependency
stall(w, reason)1 × eventPipeline stall
counter_add(w, name, delta)1 × slot_addIncrement counter

6. Example: Full Write-Read Cycle

#![allow(unused)]
fn main() {
use uscope::protocols::cpu::{CpuSchemaBuilder, CpuWriter};
use uscope::writer::Writer;
use uscope::reader::Reader;
use std::fs::File;

// Build schema
let (dut_builder, mut sb, ids) = CpuSchemaBuilder::new("core0")
    .isa("RV64GC")
    .pipeline_stages(&["fetch", "decode", "execute", "writeback"])
    .entity_slots(16)
    .build();

let dut = dut_builder.build(sb.strings_mut());
let schema = sb.build();

// Write
let file = File::create("trace.uscope").unwrap();
let mut w = Writer::create(file, &dut, &schema, 10_000).unwrap();
let cpu = CpuWriter::new(ids.clone());

w.begin_cycle(0);
cpu.fetch(&mut w, 0, 0x8000_0000, 0x13);
cpu.stage_transition(&mut w, 0, 0);
w.end_cycle().unwrap();

w.begin_cycle(1000);
cpu.stage_transition(&mut w, 0, 1);
w.end_cycle().unwrap();

w.begin_cycle(2000);
cpu.stage_transition(&mut w, 0, 2);
w.end_cycle().unwrap();

w.begin_cycle(3000);
cpu.stage_transition(&mut w, 0, 3);
cpu.retire(&mut w, 0);
w.end_cycle().unwrap();

w.close().unwrap();

// Read
let mut r = Reader::open("trace.uscope").unwrap();
assert_eq!(r.header().total_time_ps, 3000);

let state = r.state_at(1500).unwrap();
assert!(state.slot_valid(ids.entities_storage_id, 0)); // still in-flight

let state = r.state_at(3000).unwrap();
assert!(!state.slot_valid(ids.entities_storage_id, 0)); // retired

let events = r.events_in_range(0, 3000).unwrap();
assert_eq!(events.len(), 4); // 4 stage transitions
}

uscope-cpu: CPU Protocol Library

Crate: uscope-cpu Location: crates/uscope-cpu/


Overview

The uscope-cpu crate provides the CPU protocol interpretation layer on top of the uscope transport crate. It understands instruction lifecycles, pipeline stages, performance counters, and hardware buffers — concepts that the transport layer treats as opaque storages and events.

Architecture

uscope-cpu (this crate)          uscope (transport)
┌──────────────────────┐         ┌─────────────────┐
│ CpuTrace             │────────▶│ Reader           │
│  - instructions      │         │  - state_at()    │
│  - stages            │         │  - segment_replay│
│  - counters          │         │  - schema()      │
│  - buffers           │         └─────────────────┘
│  - lazy loading      │
│  - performance stats │
└──────────────────────┘

Dependencies

CratePurpose
uscopeTransport layer (Reader, Schema, state reconstruction)
instruction-decoderRISC-V ISA decode (optional, behind decode feature)

CpuTrace

The main entry point. Opens a trace file, resolves the CPU protocol schema, and provides query methods.

Opening a trace

#![allow(unused)]
fn main() {
use uscope_cpu::CpuTrace;

let mut trace = CpuTrace::open("trace.uscope")?;

// File overview
let info = trace.file_info();
println!("Version: {}.{}", info.version_major, info.version_minor);
println!("Segments: {}", info.num_segments);
println!("Max cycle: {}", trace.max_cycle());
println!("Period: {} ps", trace.period_ps());

// Schema access
for (name, _) in trace.counter_names() {
    println!("Counter: {}", name);
}
for buf in trace.buffer_infos() {
    println!("Buffer: {} ({} slots)", buf.name, buf.capacity);
}
}

Counter queries

#![allow(unused)]
fn main() {
// Cumulative value at a cycle
let val = trace.counter_value_at(0, 100);

// Rate over a window (instructions per cycle)
let ipc = trace.counter_rate_at(0, 100, 64);

// Single-cycle delta
let delta = trace.counter_delta_at(0, 100);

// Downsample for sparkline rendering (min/max envelope)
let data = trace.counter_downsample(0, 0, 1000, 200);
for (min_rate, max_rate) in &data {
    // render bar from min to max
}
}

Buffer state

#![allow(unused)]
fn main() {
let state = trace.buffer_state_at(0, 50)?;
println!("Capacity: {}", state.capacity);

// Occupied slots
for slot in &state.slots {
    println!("Slot 0x{:02x}: entity_id={}", slot.0, slot.1[0]);
}

// Storage-level properties (pointer pairs)
for prop in &state.properties {
    println!("{}: {} (role={}, pair_id={})",
        prop.name, prop.value, prop.role, prop.pair_id);
}
}

Lazy segment loading

#![allow(unused)]
fn main() {
// Load specific segments (instruction/stage data)
let result = trace.load_segments(&[0, 1, 2])?;
println!("Loaded {} instructions", result.instructions.len());

// Or load segments covering a cycle range
let loaded = trace.ensure_loaded(100, 200);
}

Metadata

#![allow(unused)]
fn main() {
for (key, value) in trace.metadata() {
    println!("{}: {}", key, value);
}
}

Types

InstructionData

#![allow(unused)]
fn main() {
pub struct InstructionData {
    pub id: u32,              // Entity ID
    pub sim_id: u64,          // Simulator-assigned ID
    pub thread_id: u16,
    pub rbid: Option<u32>,    // Retire buffer slot
    pub iq_id: Option<u32>,   // Issue queue ID
    pub dq_id: Option<u32>,   // Dispatch queue ID
    pub ready_cycle: Option<u32>,
    pub pc: u64,
    pub disasm: String,
    pub tooltip: String,
    pub stage_range: Range<u32>,  // Index range into stages vec
    pub retire_status: RetireStatus,
    pub first_cycle: u32,
    pub last_cycle: u32,
}
}

StageSpan

#![allow(unused)]
fn main() {
pub struct StageSpan {
    pub stage_name_idx: u16,  // Index into stage name table
    pub lane: u16,
    pub start_cycle: u32,
    pub end_cycle: u32,
}
}

BufferInfo

#![allow(unused)]
fn main() {
pub struct BufferInfo {
    pub name: String,
    pub storage_id: u16,
    pub capacity: u16,
    pub fields: Vec<(String, u8)>,
    pub properties: Vec<BufferPropertyDef>,
}

pub struct BufferPropertyDef {
    pub name: String,
    pub field_type: u8,
    pub role: u8,     // 0=plain, 1=HEAD_PTR, 2=TAIL_PTR
    pub pair_id: u8,  // Groups head/tail into pairs
}
}

CounterSeries

#![allow(unused)]
fn main() {
pub struct CounterSeries {
    pub name: String,
    pub samples: Vec<(u32, u64)>,  // (cycle, cumulative_value)
    pub default_mode: CounterDisplayMode,
}
}

SegmentIndex

#![allow(unused)]
fn main() {
pub struct SegmentIndex {
    pub segments: Vec<(u32, u32)>,  // (start_cycle, end_cycle)
}

impl SegmentIndex {
    pub fn segments_in_range(&self, start: u32, end: u32) -> Vec<usize>;
}
}

Feature Flags

FeatureDefaultDescription
decodeyesRISC-V instruction decode via instruction-decoder

C DPI Library API Reference

Header: uscope_dpi.h Location: dpi/


1. Overview

The C DPI library is a standalone, write-only µScope trace library designed for integration with hardware simulators via DPI (Direct Programming Interface). It produces trace files that are binary-compatible with the Rust reader.

Design Principles

  • Single .c + .h (plus vendored LZ4) — easy to integrate
  • C99 — compiles with any standard C compiler
  • No dynamic allocation during per-cycle operations — pre-allocated buffers
  • Write-only — no reader (use the Rust crate for reading)
  • Zero Rust dependency — fully self-contained

Building

make -C dpi            # builds libuscope_dpi.a
make -C dpi test       # builds and runs the test program

Link with -luscope_dpi (or include uscope_dpi.c and lz4.c directly).


2. Schema Building

Before opening a writer, define the trace schema.

2.1 Create / Free

uscope_schema_def_t *schema = uscope_schema_new();
// ... add clocks, scopes, enums, storages, events ...
// Schema is consumed by uscope_writer_open() — do not free after open.
// If not opening a writer, free with:
uscope_schema_free(schema);

2.2 Clock Domains

uint8_t clk = uscope_schema_add_clock(schema, "core_clk", 1000); // 1 GHz
ParameterTypeDescription
nameconst char *Clock name
period_psuint32_tPeriod in picoseconds
Returnsuint8_tClock domain ID

2.3 Scopes

uscope_schema_add_scope(schema, "root", 0xFFFF, NULL, 0xFF);
uint16_t scope = uscope_schema_add_scope(schema, "core0", 0, "cpu", clk);
ParameterTypeDescription
nameconst char *Scope name
parentuint16_tParent scope ID (0xFFFF = root)
protocolconst char *Protocol name (NULL = none)
clock_iduint8_tClock domain (0xFF = inherit)
Returnsuint16_tScope ID

2.4 Enums

const char *stages[] = {"fetch", "decode", "execute", "writeback"};
uint8_t stage_enum = uscope_schema_add_enum(schema, "pipeline_stage", stages, 4);

2.5 Storages

Fields are passed as parallel arrays of names, types, and enum IDs.

const char  *fields[]    = {"entity_id", "pc",          "inst_bits"};
uint8_t      types[]     = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
uint8_t      enum_ids[]  = {0,             0,              0};

uint16_t entities = uscope_schema_add_storage(
    schema, "entities", scope, /*num_slots=*/512, USCOPE_SF_SPARSE,
    /*num_fields=*/3, fields, types, enum_ids);
ParameterTypeDescription
nameconst char *Storage name
scope_iduint16_tOwning scope
num_slotsuint16_tNumber of slots
flagsuint16_tUSCOPE_SF_SPARSE or 0 (dense)
num_fieldsuint16_tNumber of fields
field_namesconst char **Field name array
field_typesconst uint8_t *Field type array
field_enum_idsconst uint8_t *Enum ID array (or NULL)
Returnsuint16_tStorage ID

2.6 Events

const char  *st_fields[] = {"entity_id",    "stage"};
uint8_t      st_types[]  = {USCOPE_FT_U32,  USCOPE_FT_ENUM};
uint8_t      st_enums[]  = {0,              stage_enum};

uint16_t st_event = uscope_schema_add_event(
    schema, "stage_transition", scope,
    /*num_fields=*/2, st_fields, st_types, st_enums);

3. Field Type Constants

ConstantValueSizeDescription
USCOPE_FT_U80x011Unsigned 8-bit
USCOPE_FT_U160x022Unsigned 16-bit
USCOPE_FT_U320x034Unsigned 32-bit
USCOPE_FT_U640x048Unsigned 64-bit
USCOPE_FT_I80x051Signed 8-bit
USCOPE_FT_I160x062Signed 16-bit
USCOPE_FT_I320x074Signed 32-bit
USCOPE_FT_I640x088Signed 64-bit
USCOPE_FT_BOOL0x091Boolean
USCOPE_FT_STRING_REF0x0A4String table index
USCOPE_FT_ENUM0x0B1Enum value

4. Writer

4.1 Open / Close

uscope_dut_property_t props[] = {
    {"dut_name", "boom_core_0"},
    {"cpu.isa",  "RV64GC"},
};

uscope_writer_t *w = uscope_writer_open(
    "trace.uscope",
    props, /*num_props=*/2,
    schema,                    // consumed — do not free
    /*checkpoint_interval_ps=*/1000000);

// ... write cycles ...

uscope_writer_close(w);  // finalizes and frees

uscope_writer_open takes ownership of the schema. Do not call uscope_schema_free after opening.

uscope_writer_close writes the string table, segment table, section table, sets F_COMPLETE, and frees all resources.

4.2 Per-Cycle Operations

All mutations must occur within a begin_cycle / end_cycle pair. Time must be monotonically non-decreasing.

uscope_begin_cycle(w, time_ps);

uscope_slot_set(w, storage_id, slot, field, value);
uscope_slot_clear(w, storage_id, slot);
uscope_slot_add(w, storage_id, slot, field, delta);
uscope_event(w, event_type_id, payload, payload_size);

uscope_end_cycle(w);
FunctionDescription
uscope_begin_cycle(w, time_ps)Start a cycle at the given time
uscope_slot_set(w, stor, slot, field, val)Set field value (marks slot valid)
uscope_slot_clear(w, stor, slot)Mark slot invalid
uscope_slot_add(w, stor, slot, field, val)Add to field value
uscope_event(w, type_id, payload, size)Emit event with raw payload
uscope_end_cycle(w)End cycle, flush segment if needed

4.3 Event Payloads

Event payloads are the field values concatenated in schema-definition order, little-endian, with no padding. Build them manually:

// stage_transition: entity_id (U32) + stage (ENUM/U8)
uint8_t payload[5];
uint32_t entity_id = 42;
memcpy(payload, &entity_id, 4);  // little-endian on LE platforms
payload[4] = 2;                  // stage index
uscope_event(w, st_event, payload, 5);

4.4 String Table

For STRING_REF fields in event payloads:

uint32_t idx = uscope_string_insert(w, "addi x0, x0, 0");
// Use idx as the 4-byte value in a STRING_REF field

5. Limits

ResourceMaximum
String pool (schema)64 KB
Clock domains16
Scopes256
Enum types64
Enum values per type256
Storages256
Event types256
Fields per storage/event32
Ops per cycle4096
Events per cycle1024
Event payload size256 bytes
Segments65536
Delta buffer4 MB (auto-grows)

6. Example: CPU Pipeline Trace

#include "uscope_dpi.h"
#include <string.h>

int main(void) {
    // Schema
    uscope_schema_def_t *s = uscope_schema_new();
    uint8_t clk = uscope_schema_add_clock(s, "clk", 1000);
    uscope_schema_add_scope(s, "root", 0xFFFF, NULL, 0xFF);
    uint16_t scope = uscope_schema_add_scope(s, "core0", 0, "cpu", clk);

    const char *stages[] = {"fetch", "decode", "execute", "writeback"};
    uint8_t se = uscope_schema_add_enum(s, "pipeline_stage", stages, 4);

    const char *ef[] = {"entity_id", "pc", "inst_bits"};
    uint8_t et[] = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
    uint16_t ent = uscope_schema_add_storage(s, "entities", scope,
                                              256, USCOPE_SF_SPARSE,
                                              3, ef, et, NULL);

    const char *sf[] = {"entity_id", "stage"};
    uint8_t st[] = {USCOPE_FT_U32, USCOPE_FT_ENUM};
    uint8_t sen[] = {0, se};
    uint16_t sev = uscope_schema_add_event(s, "stage_transition", scope,
                                            2, sf, st, sen);

    // DUT properties
    uscope_dut_property_t props[] = {
        {"dut_name", "core0"},
        {"cpu.isa", "RV64GC"},
        {"cpu.pipeline_stages", "fetch,decode,execute,writeback"},
    };

    // Open
    uscope_writer_t *w = uscope_writer_open("trace.uscope",
                                             props, 3, s, 100000);

    // Fetch instruction 0
    uscope_begin_cycle(w, 0);
    uscope_slot_set(w, ent, 0, 0, 0);          // entity_id
    uscope_slot_set(w, ent, 0, 1, 0x80000000); // pc
    uscope_slot_set(w, ent, 0, 2, 0x13);       // inst_bits
    uint8_t payload[5];
    uint32_t eid = 0;
    memcpy(payload, &eid, 4);
    payload[4] = 0; // fetch stage
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Decode
    uscope_begin_cycle(w, 1000);
    payload[4] = 1;
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Execute
    uscope_begin_cycle(w, 2000);
    payload[4] = 2;
    uscope_event(w, sev, payload, 5);
    uscope_end_cycle(w);

    // Writeback + retire
    uscope_begin_cycle(w, 3000);
    payload[4] = 3;
    uscope_event(w, sev, payload, 5);
    uscope_slot_clear(w, ent, 0);
    uscope_end_cycle(w);

    uscope_writer_close(w);
    return 0;
}

7. Integration with Simulators

SystemVerilog DPI

import "DPI-C" function chandle uscope_writer_open(
    input string path,
    /* ... */
);
import "DPI-C" function void uscope_begin_cycle(
    input chandle w, input longint unsigned time_ps
);
// ... etc

Verilator

Include uscope_dpi.c and lz4.c in the Verilator build:

verilator --cc top.sv --exe sim_main.cpp uscope_dpi.c lz4.c

Call the C API from sim_main.cpp or from DPI-exported functions in the SystemVerilog testbench.

uscope-cli: Command-Line Trace Inspector

Binary: uscope-cli Location: crates/uscope-cli/


Overview

uscope-cli is a standalone command-line tool for inspecting µScope CPU pipeline traces. It provides quick access to trace metadata, buffer state, instruction timelines, and counter data without needing the Reflex GUI.

All commands support --json for structured JSON output, making it suitable for scripting and CI pipelines.


Installation

cargo install --path crates/uscope-cli
# or run directly:
cargo run --bin uscope-cli -- <command> <file>

Commands

info — File overview

uscope-cli info trace.uscope

Prints: file header (version, flags, segments, duration), metadata (DUT properties), pipeline stage names, counter names, buffer names, and full schema dump (storages, events, enums).

# JSON output for scripting
uscope-cli info trace.uscope --json | jq '.counters'

state — Buffer state at a cycle

uscope-cli state trace.uscope --cycle 50

Shows the state of all buffers at the given cycle: occupied slots with field values, entity fields (rbid, fpb_id, etc.), and storage properties (pointer positions).

# Check ROB state at cycle 100
uscope-cli state trace.uscope --cycle 100 --json | jq '.buffers[] | select(.name == "rob")'

timeline — Instruction lifecycle

uscope-cli timeline trace.uscope --entity 42

Shows the complete lifecycle of instruction entity 42: fetch cycle, all stage transitions with durations, annotations, and retire/flush status.

# Find when entity 42 was in the execute stage
uscope-cli timeline trace.uscope --entity 42 --json | jq '.stages[] | select(.name == "Ex")'

counters — Counter values

# Show final counter values
uscope-cli counters trace.uscope

# Per-cycle values over a range
uscope-cli counters trace.uscope --range 100:200

# Filter by counter name
uscope-cli counters trace.uscope --counter retired_insns --range 0:50

buffers — Buffer occupancy

uscope-cli buffers trace.uscope --cycle 50

Like state but focused on buffer fill level, pointer pair positions, and occupancy percentage. Filter by buffer name with --buffer.

uscope-cli buffers trace.uscope --cycle 50 --buffer rob

Output Formats

FlagFormatUse case
(default)Human-readable aligned tableInteractive inspection
--jsonPretty-printed JSONScripting, piping to jq, CI

Examples

# Quick sanity check: does the trace have data?
uscope-cli info trace.uscope

# Debugging: what's in the ROB at cycle 50?
uscope-cli state trace.uscope --cycle 50

# Performance: what's the IPC?
uscope-cli counters trace.uscope --counter retired_insns

# Entity debugging: what happened to instruction 17?
uscope-cli timeline trace.uscope --entity 17

# Scripting: extract all counter names
uscope-cli info trace.uscope --json | jq -r '.counters[]'

uscope-mcp: MCP Server for AI-Assisted Debugging

Binary: uscope-mcp Location: crates/uscope-mcp/


Overview

uscope-mcp is a Model Context Protocol (MCP) server that lets Claude inspect µScope CPU pipeline traces. It exposes the uscope-cpu query API as MCP tools, enabling natural-language performance debugging.


Quick Start

1. Start the server

cargo run --bin uscope-mcp -- --trace /path/to/trace.uscope

2. Configure Claude Code

Add to .claude/settings.json:

{
  "mcpServers": {
    "uscope": {
      "command": "cargo",
      "args": ["run", "--release", "--bin", "uscope-mcp", "--",
               "--trace", "/path/to/trace.uscope"],
      "cwd": "/path/to/uscope/repo"
    }
  }
}

Or with a pre-built binary:

{
  "mcpServers": {
    "uscope": {
      "command": "/path/to/uscope-mcp",
      "args": ["--trace", "/path/to/trace.uscope"]
    }
  }
}

3. Ask Claude

"What's the IPC between cycles 100 and 500?"

"Show me the pipeline stages for entity 42"

"Why is the ROB full at cycle 200?"

"What caused the pipeline stall at cycle 350?"


MCP Tools

file_info

Returns trace header, schema, segments, counters, buffers, and metadata.

Parameters: none

state_at_cycle

Returns buffer contents at a specific cycle — slot values, entity fields, and storage properties.

Parameters:

  • cycle (number, required): cycle number to query

entity_timeline

Returns the complete lifecycle of an instruction: stages with durations, disasm, annotations, retire/flush status.

Parameters:

  • entity_id (number, required): entity ID to trace

counter_values

Returns counter data over a cycle range with per-cycle values, deltas, and rates.

Parameters:

  • counter (string, required): counter name (e.g., "retired_insns")
  • start_cycle (number, required): range start
  • end_cycle (number, required): range end

buffer_occupancy

Returns buffer fill level at a cycle — occupied slots, pointer pair positions, fill percentage.

Parameters:

  • buffer (string, required): buffer name (e.g., "rob")
  • cycle (number, required): cycle to query

analyze_performance

Returns a structured performance summary over a cycle range:

  • Instruction counts (total, retired, flushed, in-flight)
  • IPC (instructions per cycle)
  • Flush rate
  • Per-counter totals and rates
  • Buffer occupancy snapshots at start/mid/end
  • Per-stage average latency, sorted by bottleneck

Parameters:

  • start_cycle (number, required): range start
  • end_cycle (number, required): range end

Protocol

The server implements the Model Context Protocol over stdio using JSON-RPC 2.0. It handles:

  • initialize — server capabilities and info
  • notifications/initialized — acknowledged silently
  • tools/list — returns tool definitions with JSON Schema
  • tools/call — dispatches to tool handlers

All tool responses are structured JSON, formatted for AI reasoning. Errors are returned as MCP tool errors (not JSON-RPC errors) so Claude can see error messages.

Logging goes to stderr (stdout is the MCP channel).

konata2uscope

Binary: konata2uscope Location: crates/konata2uscope/


1. Overview

konata2uscope converts Konata (Kanata v0004) pipeline trace logs into µScope CPU protocol traces. This enables viewing Konata-format traces in µScope-compatible viewers with random-access seeking, mipmap summaries, and structured schema metadata.


2. Usage

konata2uscope <input.log[.gz]> -o <output.uscope> [options]
OptionDefaultDescription
-o <path>output.uscopeOutput file path
--clock-period-ps <ps>1000Clock period in picoseconds (1000 = 1 GHz)
--dut-name <name>core0DUT name for the trace

Gzip-compressed input (.log.gz) is detected automatically.


3. Two-Pass Architecture

Pass 1: Scan

Reads the entire Konata log to discover metadata:

  • All unique pipeline stage names (in first-occurrence order)
  • Maximum number of simultaneously in-flight instructions
  • Thread IDs
  • Total cycle count

This information is needed to construct the µScope schema before writing any trace data.

Pass 2: Emit

Re-reads the log and emits µScope data using the CPU protocol writer:

  • Entity allocation on instruction creation (I)
  • Stage transitions on stage start (S, lane 0)
  • Annotations on labels (L)
  • Retirement on retire commands (R, type 0)
  • Flushes on flush commands (R, type 1)
  • Dependencies on dependency arrows (W)

4. Konata Format Mapping

4.1 Commands

KonataDescriptionµScope mapping
C=\t<cycle>Set absolute cycleTime base
C\t<delta>Advance by delta cyclesTime base
I\t<id>\t<gid>\t<tid>Create instructionDA_SLOT_SET on entities
L\t<id>\t0\t<text>Disassembly labelannotate event; PC extraction
L\t<id>\t1\t<text>Detail labelannotate event
S\t<id>\t0\t<stage>Start stage (lane 0)stage_transition event
S\t<id>\t1+\t<stage>Start stall overlayannotate event
E\t<id>\t<lane>\t<stage>End stage(implicit in µScope)
R\t<id>\t<rid>\t0RetireDA_SLOT_CLEAR + counter
R\t<id>\t<rid>\t1Flushflush event + DA_SLOT_CLEAR
W\t<cons>\t<prod>\t<type>Dependencydependency event

4.2 PC Extraction

If a disassembly label (L type 0) starts with a hex address, it is extracted as the instruction PC. Supported formats:

  • 80000000 addi x0, x0, 0 → PC = 0x80000000
  • 0x80000000 addi x0, x0, 0 → PC = 0x80000000
  • 00001000: jal zero, 0x10 → PC = 0x00001000

If no hex address is found, PC defaults to 0.

4.3 Stage Names

Konata stage names are arbitrary strings. Pass 1 collects them in pipeline order (first occurrence). They become the pipeline_stage enum values in the µScope schema and the cpu.pipeline_stages DUT property.

4.4 Time Model

Konata cycles are converted to picoseconds: time_ps = cycle * clock_period_ps. The default clock period of 1000 ps corresponds to 1 GHz.

4.5 Lane Handling

Only lane 0 stage starts map to stage_transition events. Lane 1+ (stall overlays in Konata) are emitted as annotate events with the text stall:<stage_name>.


5. Example

Input: trace.log

Kanata	0004
C=	0
I	0	0	0
L	0	0	80000000 addi x0, x0, 0
S	0	0	Fetch
C	1
E	0	0	Fetch
S	0	0	Decode
C	1
E	0	0	Decode
S	0	0	Execute
C	1
E	0	0	Execute
S	0	0	Writeback
R	0	0	0

Conversion

$ konata2uscope trace.log -o trace.uscope --clock-period-ps 200
Pass 1: scanning trace.log...
  4 stages: [Fetch, Decode, Execute, Writeback]
  max in-flight: 1
  threads: 1
  total cycles: 3
Pass 2: emitting trace.uscope...
Done.

Resulting Schema

  • Clock: core_clk @ 200 ps (5 GHz)
  • Enum: pipeline_stage = {Fetch, Decode, Execute, Writeback}
  • Storage: entities (1 slot, sparse)
  • Events: stage_transition, annotate, dependency, flush, stall
  • DUT: cpu.pipeline_stages = "Fetch,Decode,Execute,Writeback"

The output file is a standard µScope trace readable by the Rust Reader.