µScope Trace Format Specification
Version: 0.3-draft
Magic: uSCP (0x75 0x53 0x43 0x50)
Byte order: Little-endian (all multi-byte integers throughout the file,
including field values in event payloads, checkpoint slot data, and summary
entries)
Alignment: All section offsets are 8-byte aligned
1. Overview
µScope is a binary trace format for cycle-accurate hardware introspection.
1.1 Layered Architecture
µScope is structured as two distinct layers:
flowchart TD proto["<b>Protocol Layer</b><br />Defines semantic meaning for a specific DUT type.<br />Contains reconstruction logic, decoders, and visualization rules.<br /><i>NOT part of this specification.</i>"] transport["<b>Transport Layer</b><br />The file format (this document). Knows about:<br />Storages · Events · Checkpoints + deltas · Summaries<br />Knows NOTHING about CPUs, pipelines, caches,<br />entities, counters, annotations, or any specific hardware."] proto --- transport
The protocol layer defines semantics, decoders, and visualization. The transport layer defines the binary format and read/write APIs.
1.2 Core Primitives
| Primitive | What it models |
|---|---|
| Storage | A named array of typed slots |
| Event | A timestamped occurrence with a typed payload |
All primitives are schema-defined. The transport layer imposes no assumptions about their fields, types, or semantics.
Everything else — entities, counters, annotations, dependencies, markers — is modeled using these two primitives and interpreted by the protocol layer.
1.3 String Representation
All human-readable strings in the format (field names, enum labels, DUT
properties, etc.) are stored in a single string pool. Structures
reference strings by uint16_t offset into the pool (max 64 KB, sufficient
for any realistic schema).
The string pool is null-terminated UTF-8 sequences packed sequentially, stored at the end of the schema chunk payload. Both the DUT descriptor and schema definitions reference it.
The optional string table section (§7) stores runtime strings referenced by
FT_STRING_REF fields in delta data. An FT_STRING_REF value is a 0-based
index into the string table's entries array.
1.4 File Layout
The file has two regions: a fixed preamble written at trace creation, and an append region that grows during simulation.
During simulation, only segments are appended. At close, finalization data is written and the file header is rewritten with final values.
1.5 Access Patterns
µScope supports three access patterns:
| Pattern | When | Mechanism |
|---|---|---|
| Streaming write | During simulation | Append segments, update tail_offset |
| Live read | While writer is still running | Follow tail_offset → prev chain |
| Random access | After finalization | Binary search segment table |
See §8 for details.
1.6 Time Model
All timestamps in µScope are in picoseconds (ps). This provides a universal time axis that accommodates multiple clock domains without conversion loss — every practical hardware clock period is an integer number of picoseconds (e.g., 5 GHz → 200 ps, 800 MHz → 1250 ps).
Cycle-frame deltas, segment boundaries, and summary buckets all use picosecond timestamps. The schema defines clock domains (§4.9), each with a name and period. Scopes are assigned to clock domains so the viewer can display domain-local cycle numbers (by dividing timestamps by the clock period).
Writers emit cycle-frame deltas equal to the clock period of the active domain (e.g., 200 for a 5 GHz clock). LEB128 encoding keeps this compact (1–2 bytes), and segment-level compression handles the repeating patterns efficiently.
2. File Header
Offset 0. Fixed size: 48 bytes.
typedef struct {
uint8_t magic[4]; // "uSCP" = {0x75, 0x53, 0x43, 0x50}
uint16_t version_major; // 0
uint16_t version_minor; // 2
uint64_t flags; // §2.1
uint64_t total_time_ps; // total trace duration in picoseconds (0 until finalized)
uint32_t num_segments; // updated after each segment flush
uint32_t preamble_end; // file offset where segments begin
uint64_t section_table_offset; // 0 until finalized
uint64_t tail_offset; // file offset of last segment header (0 = none)
} file_header_t; // 48 bytes
The preamble (§2.3) immediately follows the header at offset 48 and
extends to preamble_end. Readers scan preamble chunks to locate the
DUT descriptor, schema, and trace configuration.
2.1 Flags
| Bit | Name | Description |
|---|---|---|
| 0 | F_COMPLETE | Trace was cleanly finalized |
| 1 | F_COMPRESSED | Delta segments use compression |
| 2 | F_HAS_STRINGS | String table section present |
| 3-5 | F_COMP_METHOD | Compression method (0=LZ4, 1=ZSTD; 2–7 reserved, must not be used). LZ4 support is mandatory for all readers; ZSTD is optional. |
| 6 | F_COMPACT_DELTAS | Delta blobs may contain compact ops (§8.6.3). Ignored when F_INTERLEAVED_DELTAS is set. |
| 7 | F_INTERLEAVED_DELTAS | v0.2 interleaved frame format (§8.6.6). Ops and events use self-describing tags. |
| 8-63 | Reserved | Must be zero |
2.2 Header Lifecycle
| Field | At open | After each segment | At close |
|---|---|---|---|
magic | uSCP | — | — |
flags | F_COMPRESSED etc | — | F_COMPLETE set |
total_time_ps | 0 | — | final value |
num_segments | 0 | incremented | final value |
preamble_end | final value | — | — |
section_table_offset | 0 | — | final offset |
tail_offset | 0 | offset of new segment | offset of last segment |
After fully writing a segment, the writer commits it in this order:
- Write segment data (header + checkpoint + deltas) at EOF
- Memory barrier /
fsync - Write
tail_offset(single naturally-aligned 8-byte write — the commit point) - Write
num_segments(single naturally-aligned 4-byte write — advisory)
A live reader uses tail_offset as the sole authoritative indicator of
new data. num_segments may lag by one during live reads.
2.3 Preamble Chunks
The preamble immediately follows the file header (offset 48) and consists of a sequence of typed chunks. Each chunk has an 8-byte header:
typedef struct {
uint16_t type; // chunk type
uint16_t flags; // must be 0 (reserved for future use)
uint32_t size; // payload size in bytes
// uint8_t payload[size];
// padding to 8-byte alignment (0-filled)
} preamble_chunk_t; // 8 bytes + payload + padding
Chunk payloads are padded to 8-byte alignment. The next chunk starts at
offset 8 + align8(size) from the current chunk header.
enum preamble_chunk_type : uint16_t {
CHUNK_END = 0x0000, // terminates the preamble
CHUNK_DUT_DESC = 0x0001, // DUT descriptor (§3)
CHUNK_SCHEMA = 0x0002, // schema definition (§4)
CHUNK_TRACE_CONFIG = 0x0003, // trace session parameters (§2.4)
// future: CHUNK_ELF, CHUNK_SOURCE_MAP, ...
};
Mandatory chunks: A valid file must contain exactly one each of
CHUNK_DUT_DESC, CHUNK_SCHEMA, and CHUNK_TRACE_CONFIG. Readers
must reject files missing any of these.
Unknown chunks: Readers must skip chunk types they do not recognize
(advance by 8 + align8(size) bytes). This allows older readers to
open files written by newer writers that add new chunk types.
Ordering: Writers should emit chunks in the order DUT → Schema → Trace Config, but readers must not depend on ordering.
2.4 Trace Configuration Chunk
Session-level parameters that govern how the trace was captured.
typedef struct {
uint64_t checkpoint_interval_ps; // picoseconds between checkpoints
} trace_config_t; // 8 bytes (CHUNK_TRACE_CONFIG payload)
3. DUT Descriptor
CHUNK_DUT_DESC payload. Identifies what is being traced.
typedef struct {
uint16_t num_properties;
uint16_t reserved; // must be 0
// dut_property_t properties[num_properties];
} dut_desc_t; // 4 bytes + properties
3.1 DUT Properties
typedef struct {
uint16_t key; // offset into string pool
uint16_t value; // offset into string pool
} dut_property_t; // 4 bytes
Properties are opaque key-value pairs. The transport layer does not interpret them — only the protocol layer does. Protocol version, vendor, DUT name, and any domain-specific metadata are all properties.
Example properties for an OoO CPU (protocol-specific keys use a prefix to avoid collisions in multi-protocol traces):
| Key | Value |
|---|---|
dut_name | boom_core_0 |
cpu.vendor | acme |
cpu.protocol_version | 0.1 |
cpu.isa | RV64IMAFDCV |
cpu.pipeline_depth | 12 |
cpu.elf_path | /path/to/fw.elf |
4. Schema
CHUNK_SCHEMA payload. The schema defines the structure of all data in
the trace. Written once at trace creation, immutable thereafter. Fully
self-describing — a viewer can parse and display data without a protocol
plugin.
4.1 Schema Header
typedef struct {
uint8_t num_enums; // max 255 enum types
uint8_t num_clock_domains; // max 255 clock domains
uint16_t num_scopes;
uint16_t num_storages;
uint16_t num_event_types;
uint16_t num_summary_fields;
uint16_t string_pool_offset; // offset from schema start to string pool
// Followed by, in order:
// clock_domain_def_t clocks[num_clock_domains]
// scope_def_t scopes[num_scopes]
// enum_def_t enums[num_enums] (variable-size)
// storage_def_t storages[num_storages] (variable-size)
// event_def_t event_types[num_event_types] (variable-size)
// summary_field_def_t summary_fields[num_summary_fields]
// <string pool>
} schema_header_t; // 12 bytes
4.2 Field Types
enum field_type : uint8_t {
FT_U8 = 0x01,
FT_U16 = 0x02,
FT_U32 = 0x03,
FT_U64 = 0x04,
FT_I8 = 0x05,
FT_I16 = 0x06,
FT_I32 = 0x07,
FT_I64 = 0x08,
FT_BOOL = 0x09, // 1 byte
FT_STRING_REF = 0x0A, // uint32_t index into string table entries[]
FT_ENUM = 0x0B, // uint8_t index into a named enum
};
4.3 Field Definition
typedef struct {
uint16_t name; // offset into string pool
uint8_t type; // field_type (size derived from type)
uint8_t enum_id; // if type==FT_ENUM, else 0
uint8_t reserved[4];
} field_def_t; // 8 bytes
Field size is derived from the type:
| Type | Size (bytes) |
|---|---|
| FT_U8, FT_I8, FT_BOOL, FT_ENUM | 1 |
| FT_U16, FT_I16 | 2 |
| FT_U32, FT_I32, FT_STRING_REF | 4 |
| FT_U64, FT_I64 | 8 |
4.4 Scope Definition
Scopes define a hierarchical tree for organizing storages and events.
The schema must contain at least one scope: scope 0 is the root
scope (conventionally named /).
typedef struct {
uint16_t name; // offset into string pool
uint16_t scope_id; // 0-based; scope 0 = root
uint16_t parent_id; // parent scope_id, 0xFFFF = root (only valid for scope 0)
uint16_t protocol; // offset into string pool, 0xFFFF = no protocol
uint8_t clock_id; // clock domain index (§4.9), 0xFF = inherit from parent
uint8_t reserved[3];
} scope_def_t; // 12 bytes
Each scope optionally declares a protocol — a string identifying
which protocol layer applies (e.g., "cpu", "dma", "noc"). The
viewer uses the protocol to select the appropriate plugin for that
subtree. Scopes with protocol = 0xFFFF have no protocol and are
rendered generically.
There is no protocol inheritance. Each scope that needs a protocol must declare it explicitly. The root scope typically has no protocol.
Protocol identifiers: Vendor-specific protocols use a dotted
prefix: axelera.loom_core. The protocol generic (or no protocol)
means the viewer renders raw schema data without interpretation.
4.5 Enum Definition
typedef struct {
uint16_t name; // offset into string pool
uint8_t num_values;
uint8_t reserved;
// enum_value_t values[num_values];
} enum_def_t; // 4 bytes + values
typedef struct {
uint8_t value; // numeric value
uint8_t reserved;
uint16_t name; // offset into string pool
} enum_value_t; // 4 bytes
4.6 Storage Definition
typedef struct {
uint16_t name; // offset into string pool
uint16_t storage_id; // 0-based
uint16_t num_slots;
uint16_t num_fields;
uint16_t flags; // §4.6.1
uint16_t scope_id; // owning scope, 0xFFFF = root-level
uint16_t num_properties; // v0.3: number of storage-level properties
uint16_t reserved; // v0.3: must be 0
// field_def_t fields[num_fields];
// field_def_t properties[num_properties]; // v0.3
} storage_def_t; // 16 bytes + fields + properties
4.6.2 Storage Properties (v0.3)
Storage properties are named, typed scalar values attached to a storage
(not per-slot). They are checkpointed and updated via DA_PROP_SET deltas.
Use cases include buffer pointers (retire_ptr, allocate_ptr) and other
storage-level metadata that changes each cycle.
Properties are defined in the schema as field_def_t entries appended after
the slot field definitions. Each property has a name, type, and optional
enum_id, following the same rules as slot fields.
4.6.1 Storage Flags
| Bit | Name | Description |
|---|---|---|
| 0 | SF_SPARSE | Checkpoints store only valid entries + bitmask |
| 1 | SF_BUFFER | Buffer storage — sparse storage used as a named buffer (e.g., ROB, issue queue). The protocol layer uses this flag to detect buffer storages for dedicated visualization. |
| 2-15 | Reserved |
For SF_SPARSE storages, slot validity is tracked by the transport:
DA_SLOT_SETon any field of an invalid slot implicitly marks it valid.DA_SLOT_CLEARmarks a slot invalid.
Non-sparse storages have all slots always valid.
4.7 Event Definition
typedef struct {
uint16_t name; // offset into string pool
uint16_t event_type_id; // 0-based
uint16_t num_fields;
uint16_t scope_id; // owning scope, 0xFFFF = root-level
// field_def_t fields[num_fields];
} event_def_t; // 8 bytes + fields
4.8 Summary Field Definition
typedef struct {
uint16_t name; // offset into string pool
uint8_t type; // field_type (size derived from type, see §4.3)
uint8_t reserved;
uint16_t scope_id; // owning scope (same field as in storage/event defs)
uint16_t reserved2;
} summary_field_def_t; // 8 bytes
Summary fields are scoped: in a multi-scope trace, each scope has its own
set of summary fields. Fields with the same name in different scopes are
independent (e.g., core0/committed vs. core1/committed).
Summary fields are opaque to the transport. The writer computes and writes values; the transport stores and retrieves them. What each field means (counter rate, storage occupancy, event frequency) is the protocol layer's concern.
4.9 Clock Domain Definition
typedef struct {
uint16_t name; // offset into string pool (e.g., "core_clk")
uint16_t clock_id; // 0-based
uint32_t period_ps; // clock period in picoseconds (0 = unknown)
} clock_domain_def_t; // 8 bytes
Each clock domain defines a named clock with a period in picoseconds.
Scopes reference clock domains via clock_id in scope_def_t (§4.4).
The viewer uses period_ps to convert picosecond timestamps to
domain-local cycle numbers for display (cycle = timestamp / period_ps).
A trace must define at least one clock domain. If the DUT has a single clock, one domain suffices. Multi-clock SoCs define one domain per distinct clock frequency.
| Example | period_ps | Frequency |
|---|---|---|
core_clk | 200 | 5.0 GHz |
bus_clk | 1000 | 1.0 GHz |
mem_clk | 1250 | 800 MHz |
slow_periph_clk | 30000 | 33.3 MHz |
5. Section Table
Written at finalization only (when F_COMPLETE is set).
enum section_type : uint16_t {
SECTION_END = 0x0000,
SECTION_SUMMARY = 0x0001,
SECTION_STRINGS = 0x0002,
SECTION_SEGMENTS = 0x0003,
SECTION_COUNTER_SUMMARY = 0x0010, // trace summary (counter mipmaps + instruction density)
};
typedef struct {
uint16_t type;
uint16_t flags;
uint32_t reserved;
uint64_t offset;
uint64_t size;
} section_entry_t; // 24 bytes
The table is terminated by a SECTION_END entry.
For incomplete files (F_COMPLETE not set), section_table_offset is 0
and the section table does not exist. Readers must use the segment chain
(§8.2) to discover segments.
6. Trace Summary Section (TSUM)
Written at finalization into a SECTION_COUNTER_SUMMARY section. Contains
instruction density mipmaps and per-counter mipmaps in a self-contained blob.
6.1 TSUM Wire Format
Offset Size Field
────── ───── ──────────────────────────────────
0 4 magic: b"TSUM" (0x54 0x53 0x55 0x4D)
4 4 base_interval_cycles (u32 LE)
8 4 fan_out (u32 LE)
12 8 total_instructions (u64 LE)
─── 20 bytes fixed header ───
20 4 num_density_levels (u32 LE)
... For each density level:
4 bytes: num_entries (u32 LE)
num_entries × 4 bytes: instruction counts (u32 LE each)
4 num_counters (u32 LE)
... For each counter:
4 bytes: name_len (u32 LE)
name_len bytes: name (UTF-8, not null-terminated)
2 bytes: storage_id (u16 LE)
4 bytes: num_levels (u32 LE)
For each level:
4 bytes: num_entries (u32 LE)
num_entries × 24 bytes: mipmap entries
6.2 Mipmap Entry
Each mipmap entry is 24 bytes:
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 8 | min_delta | Minimum per-cycle delta in this bucket |
| 8 | 8 | max_delta | Maximum per-cycle delta in this bucket |
| 16 | 8 | sum | Total delta accumulated in this bucket |
6.3 Backward Compatibility (CSUM)
Readers must also accept the legacy CSUM magic (b"CSUM", 0x43 0x53 0x55
0x4D) which predates instruction density support. The CSUM layout is:
0 4 magic: b"CSUM"
4 4 base_interval_cycles (u32 LE)
8 4 fan_out (u32 LE)
12 4 num_counters (u32 LE)
... Counter mipmaps (same format as TSUM)
When reading CSUM, set total_instructions = 0 and instruction_density = [].
7. String Table (Optional)
For runtime strings referenced by FT_STRING_REF fields in storage slots
or event payloads. Written at finalization.
typedef struct {
uint32_t num_entries;
uint32_t reserved;
// string_index_t entries[num_entries];
// followed by packed null-terminated string data
} string_table_header_t;
typedef struct {
uint32_t offset; // byte offset into string data (relative to end of entries array)
uint32_t length; // string length in bytes (excluding null terminator)
} string_index_t; // 8 bytes
An FT_STRING_REF field value is a 0-based index into the entries[]
array. The reader looks up entries[value] to get the offset and length
of the string data. Writers assign sequential indices starting from 0.
8. Segments
A segment is one checkpoint-interval's worth of data: a full state snapshot (checkpoint) followed by compressed cycle-by-cycle deltas.
8.1 Segment Header
Each segment is self-describing and linked to the previous segment, forming a backward chain.
typedef struct {
uint32_t segment_magic; // "uSEG" = {0x75, 0x53, 0x45, 0x47}
uint32_t flags;
uint64_t time_start_ps; // segment start time in picoseconds
uint64_t time_end_ps; // exclusive
uint64_t prev_segment_offset; // file offset of previous segment (0 = first)
uint32_t checkpoint_size;
uint32_t deltas_compressed_size;
uint32_t deltas_raw_size;
uint32_t num_frames; // number of cycle_frame records in decompressed delta blob
uint32_t num_frames_active; // frames with at least one op or event
uint32_t reserved;
// checkpoint data (checkpoint_size bytes)
// compressed delta data (deltas_compressed_size bytes)
} segment_header_t; // 56 bytes
The segment_magic field allows validation when walking the chain and
recovery of incomplete files.
8.2 Segment Chain
Segments form a singly-linked list via prev_segment_offset, traversable
from tail_offset in the file header backward to the first segment
(prev_segment_offset == 0).
flowchart LR S2["Segment 2<br />[400ns,600ns)<br />prev→S1"] -->|prev| S1["Segment 1<br />[200ns,400ns)<br />prev→S0"] S1 -->|prev| S0["Segment 0<br />[0,200ns)<br />prev=0"] tail(["tail_offset"]) -.->|points to| S2
8.3 Segment Table (Finalization Only)
At close, the writer builds a flat segment table for fast random access.
This table is referenced by SECTION_SEGMENTS in the section table.
typedef struct {
uint64_t offset; // file offset of segment_header_t
uint64_t time_start_ps;
uint64_t time_end_ps; // exclusive
} segment_index_entry_t; // 24 bytes
Binary search on time_start_ps gives O(log n) seek to any timestamp.
8.4 Reading Strategies
Finalized file (F_COMPLETE set):
- Read file header →
preamble_end,section_table_offset - Scan preamble chunks → extract DUT, schema, trace config
- Read section table → find
SECTION_SEGMENTS - Binary search segment table for target timestamp → get segment offset
- Read segment header + checkpoint + deltas at that offset
Live file (F_COMPLETE not set):
- Read file header →
preamble_end,tail_offset - Scan preamble chunks → extract DUT, schema, trace config
- Read segment at
tail_offset→ followprev_segment_offsetchain - Build in-memory segment index (done once, O(n) in segments)
- To check for new data: re-read
tail_offsetfrom file header
Streaming write (writer perspective):
- Write file header (with
tail_offset=0) + preamble chunks + CHUNK_END - Set
preamble_endin file header - For each checkpoint interval:
a. Write
segment_header_t+ checkpoint + compressed deltas at EOF b. Rewritetail_offsetandnum_segmentsin file header - At close: write string table, summary, segment table, section table;
set
F_COMPLETE; rewrite file header with final values
8.5 Checkpoint Format
A checkpoint is a sequence of storage blocks, one per storage.
typedef struct {
uint16_t storage_id;
uint16_t reserved;
uint32_t size; // payload size in bytes
// payload
} checkpoint_block_t; // 8 bytes
8.5.1 Sparse Storage Block
checkpoint_block_t { storage_id, size }
uint8_t valid_mask[ceil(num_slots/8)];
// For each set bit: slot_data[slot_size]
// v0.3: property_data[property_data_size] (if num_properties > 0)
8.5.2 Dense Storage Block
checkpoint_block_t { storage_id, size }
// slot_data[slot_size] × num_slots
// v0.3: property_data[property_data_size] (if num_properties > 0)
For storages with num_properties > 0, property values are appended after
slot data as tightly-packed field values (same packing rules as slot data).
The size field in the checkpoint block covers the total payload including
property data.
8.6 Delta Format
8.6.1 Cycle Frame
Wire format (variable-length — not representable as a C struct):
cycle_frame:
[LEB128] time_delta_ps 1–10 bytes, unsigned delta in ps from previous frame
[uint8] op_format 0 = wide (16B delta_op_t), 1 = compact (8B delta_op_compact_t)
[uint8] reserved must be 0
[uint16] num_ops
[uint16] num_events
[repeated] ops × num_ops (size per op depends on op_format)
[repeated] events × num_events (event_record_t, variable-size)
The op_format field is only meaningful when F_COMPACT_DELTAS is set
in the file header. If the flag is not set, op_format must be 0 (wide)
and readers may skip checking it.
The time delta uses unsigned LEB128 encoding (same as DWARF / protobuf). Values are in picoseconds. For a 5 GHz clock (200 ps period), consecutive cycles produce a repeating delta of 200:
| Delta value | Encoded bytes | Typical scenario |
|---|---|---|
| 0 | 1 (0x00) | Multiple frames at same timestamp |
| 1–127 | 1 | Sub-ns deltas (rare) |
| 128–16383 | 2 | Most clock periods (e.g., 200–1250) |
| 16384+ | 3+ | Large idle gaps |
The first frame in each segment uses segment_header_t.time_start_ps as
the base, so each segment is independently decodable without prior context.
8.6.2 Delta Operations
enum delta_action : uint8_t {
DA_SLOT_SET = 0x01, // set a field value
DA_SLOT_CLEAR = 0x02, // mark slot invalid (sparse only)
DA_SLOT_ADD = 0x03, // add value to field (for counters etc.)
DA_PROP_SET = 0x04, // v0.3: set a storage-level property
};
typedef struct {
uint8_t action;
uint8_t reserved;
uint16_t storage_id;
uint16_t slot_index;
uint16_t field_index; // ignored for DA_SLOT_CLEAR; prop_index for DA_PROP_SET
uint64_t value; // ignored for DA_SLOT_CLEAR
} delta_op_t; // 16 bytes
8.6.3 Compact Delta Variant
When the file header flag F_COMPACT_DELTAS is set, delta blobs may
contain compact 8-byte ops. The op_format field in cycle_frame_t
(§8.6.1) determines which layout all ops in that frame use.
typedef struct {
uint8_t action;
uint8_t storage_id_lo; // low 8 bits of storage_id
uint16_t slot_index;
uint16_t field_index;
uint16_t value16;
} delta_op_compact_t; // 8 bytes
Compact ops have the following limitations. If any op in a frame violates these, the writer must use wide format for the entire frame:
storage_idmust be 0–255value16is zero-extended to 64 bits; values > 65535 cannot be representedDA_SLOT_CLEARignoresfield_indexandvalue16(same as wide format)
8.6.4 Event Records
typedef struct {
uint16_t event_type_id;
uint16_t reserved; // must be 0
uint32_t payload_size;
// uint8_t payload[payload_size];
} event_record_t;
The payload_size must equal the sum of field sizes for this event type
as defined in the schema. Writers must not emit a different size. Readers
should validate this but may use payload_size to skip events with
unrecognized event_type_id without consulting the schema.
8.6.5 Payload Wire Format
Event payloads and checkpoint slot data use the same packing rule: fields are concatenated in schema-definition order with no padding and no alignment. Multi-byte fields use little-endian byte order (as with all integers in the file). The total payload size equals the sum of all field sizes as derived from their types (see §4.3).
Checkpoint blocks (§8.5) and cycle frames within the delta blob are also tightly packed with no inter-block or intra-block padding.
8.6.6 Interleaved Frame Format (v0.2)
When the file header flag F_INTERLEAVED_DELTAS is set, cycle frames
use a self-describing tagged item stream instead of separate op/event
arrays. This preserves the exact call order of ops and events within a
cycle, which the v0.1 format cannot represent.
cycle_frame_v2:
[LEB128] time_delta_ps 1–10 bytes, unsigned delta in ps
[uint16] num_items total number of tagged items
[repeated] items × num_items (self-describing via tag byte)
Each item starts with a tag byte that determines its type and size:
| Tag | Type | Total size | Layout |
|---|---|---|---|
0x01 | Wide op | 16 bytes | tag:u8 action:u8 storage_id:u16 slot:u16 field:u16 value:u64 |
0x02 | Compact op | 8 bytes | tag:u8 action:u8 storage_id_lo:u8 slot:u16 field:u16 value16:u16 |
0x03 | Event | 8+N bytes | tag:u8 reserved:u8 event_type_id:u16 payload_size:u32 payload[N] |
The tag byte is size-neutral: it replaces the reserved byte in
wide ops and one byte of the reserved:u16 in events. The frame header
shrinks by 3 bytes compared to v0.1 (no op_format, no separate counts).
Compact decision is per-frame, same logic as v0.1: if all ops in
the frame satisfy storage_id ≤ 255 and value ≤ 65535, all ops use
tag 0x02; otherwise all ops use tag 0x01. Events always use 0x03.
When F_INTERLEAVED_DELTAS is set, F_COMPACT_DELTAS is ignored.
Readers must support both v0.1 (§8.6.1) and v0.2 frame formats by
checking the F_INTERLEAVED_DELTAS flag.
8.6.7 Compression
Per-segment, single LZ4 or ZSTD block. Method indicated in file header
flags. Readers must reject files with unknown F_COMP_METHOD values.
9. Writer API
// ── Lifecycle ──
uscope_writer_t* uscope_writer_open(const char* path,
const dut_desc_t* dut,
const schema_t* schema,
uint32_t checkpoint_interval);
void uscope_writer_close(uscope_writer_t* w);
// ── Per-cycle ──
void uscope_begin_cycle(uscope_writer_t* w, uint64_t time_ps);
void uscope_slot_set(uscope_writer_t* w, uint16_t storage_id,
uint16_t slot, uint16_t field, uint64_t value);
void uscope_slot_clear(uscope_writer_t* w, uint16_t storage_id,
uint16_t slot);
void uscope_slot_add(uscope_writer_t* w, uint16_t storage_id,
uint16_t slot, uint16_t field, uint64_t value);
void uscope_event(uscope_writer_t* w, uint16_t event_type_id,
const void* payload);
void uscope_end_cycle(uscope_writer_t* w);
// ── Checkpoints ──
typedef void (*uscope_checkpoint_fn)(uscope_writer_t* w, void* user_data);
void uscope_set_checkpoint_callback(uscope_writer_t* w,
uscope_checkpoint_fn fn, void* ud);
void uscope_checkpoint_storage(uscope_writer_t* w, uint16_t storage_id,
const uint8_t* valid_mask,
const void* slot_data,
uint32_t num_valid_slots);
9.1 DPI Bridge
The transport-level DPI is generic. Protocol-specific convenience wrappers are defined by each protocol, not by this spec.
import "DPI-C" function chandle uscope_open(string path);
import "DPI-C" function void uscope_close(chandle w);
import "DPI-C" function void uscope_begin_cycle(chandle w, longint unsigned time_ps);
import "DPI-C" function void uscope_end_cycle(chandle w);
import "DPI-C" function void uscope_slot_set(
chandle w, shortint unsigned storage_id, shortint unsigned slot,
shortint unsigned field, longint unsigned value
);
import "DPI-C" function void uscope_slot_clear(
chandle w, shortint unsigned storage_id, shortint unsigned slot
);
import "DPI-C" function void uscope_slot_add(
chandle w, shortint unsigned storage_id, shortint unsigned slot,
shortint unsigned field, longint unsigned value
);
import "DPI-C" function void uscope_event_raw(
chandle w, shortint unsigned event_type_id,
input byte unsigned payload[]
);
10. Reader API
// ── Lifecycle ──
uscope_reader_t* uscope_reader_open(const char* path);
void uscope_reader_close(uscope_reader_t* r);
// ── Metadata ──
const file_header_t* uscope_header(const uscope_reader_t* r);
const dut_desc_t* uscope_dut_desc(const uscope_reader_t* r);
const schema_t* uscope_schema(const uscope_reader_t* r);
const char* uscope_scope_protocol(const uscope_reader_t* r,
uint16_t scope_id);
const char* uscope_dut_property(const uscope_reader_t* r,
const char* key);
bool uscope_is_complete(const uscope_reader_t* r);
// ── Summary (finalized files only) ──
uint32_t uscope_summary_levels(const uscope_reader_t* r);
const void* uscope_summary_data(const uscope_reader_t* r, uint32_t level,
uint32_t* out_count);
// ── State reconstruction ──
uscope_state_t* uscope_state_at(uscope_reader_t* r, uint64_t time_ps);
void uscope_state_free(uscope_state_t* s);
bool uscope_slot_valid(const uscope_state_t* s, uint16_t storage_id,
uint16_t slot);
uint64_t uscope_slot_field(const uscope_state_t* s, uint16_t storage_id,
uint16_t slot, uint16_t field);
uint32_t uscope_storage_occupancy(const uscope_state_t* s,
uint16_t storage_id);
// ── Events ──
uscope_event_iter_t* uscope_events_in_range(uscope_reader_t* r,
uint64_t time_start_ps,
uint64_t time_end_ps);
bool uscope_event_next(uscope_event_iter_t* it, uint64_t* time_ps,
uint16_t* event_type_id, const void** payload);
void uscope_event_iter_free(uscope_event_iter_t* it);
// ── Live tailing ──
bool uscope_poll_new_segments(uscope_reader_t* r);
11. Konata Trace Reconstruction
This section demonstrates that µScope's two primitives (storages + events) carry all information needed to reconstruct a Konata-format pipeline visualization.
| Konata cmd | µScope equivalent |
|---|---|
I (create) | DA_SLOT_SET on entity catalog slot |
L (label) | Entity catalog fields (pc, inst_bits) decoded by protocol plugin |
S (stage start) | stage_transition event |
E (stage end) | Next stage_transition or entity cleared/flushed |
R (retire) | DA_SLOT_CLEAR on entity catalog slot |
W (flush) | flush event with entity ID |
C (cycle) | Absolute timestamp (segment base + cumulative LEB128 deltas) |
| Dependency arrows | dependency events linking entity IDs |
See the cpu protocol specification for the full reconstruction
algorithm (§9 of that document).
12. Design Rationale
-
Two primitives: Storages and events are sufficient to model any time-series structured data. Entities, counters, annotations, and dependencies are protocol-level patterns built on top.
-
Two-layer architecture: Format never changes when adding DUT types. Only new protocol specs are written.
-
Schema-driven, self-describing: Unknown protocols render generically.
-
String pool: Arbitrary-length names, no wasted padding, smaller structs. One pool shared by DUT descriptor and schema.
-
No styling in transport: Colors, line styles, layout rules, display hints (hex, hidden, key) belong in the protocol layer or viewer configuration. The transport layer is pure data.
-
Append-only segments with backward chain: Segments are appended during simulation with no pre-allocated tables. A
tail_offsetin the file header lets readers discover new segments. At finalization, a flat segment table is built for fast random access. This supports streaming write, live read, and fast seek — all from a single file. -
Checkpoint + delta: O(1) seek to segment, O(n) replay within segment. Cycle timestamps are LEB128 delta-encoded — 1 byte per frame for consecutive cycles instead of 8.
-
Mipmap summaries: O(screen_pixels) overview rendering. Summary semantics are opaque to the transport — the protocol layer defines what each field means.
-
Single file, section-based: Portable, self-locating sections. No sidecar files.
-
Chunked preamble: DUT descriptor, schema, and trace config are typed chunks with length headers. Older readers skip unknown chunk types, so new metadata can be added (embedded ELF, source maps, protocol config) without bumping the format version.
-
Scoped hierarchy with per-scope protocols: Storages and events are organized into a tree of scopes rooted at
/(matching hardware hierarchy: SoC → tile → core). Each scope can declare its own protocol, enabling mixed-protocol traces (CPU + DMA + NoC in one file). No inheritance — each scope is explicit.
13. Comparison with FST
13.1 Core Difference: Signals vs. Structures
| Aspect | FST | µScope |
|---|---|---|
| Data model | Flat signals (bit-vectors) | Typed structures + protocol semantics |
| Semantics | None | Schema (transport) + protocol (domain) |
| Aggregation | External post-processing | Built-in mipmaps |
| Extensibility | New signals only | New protocols, same format |
13.2 When FST is Better
- Signal-level debugging (exact wire values)
- RTL verification (waveform comparison)
- Tool ecosystem (GTKWave, Surfer, DVT)
- Zero instrumentation cost (
$dumpvars)
13.3 When µScope is Better
- Microarchitectural introspection
- Large structures (1024-entry ROB = one sparse storage)
- Performance analysis with built-in summaries
- Billion-cycle interactive exploration
- Non-RTL environments (architectural simulators)
- Multiple DUT types with one format
13.4 Complementary Use
| Phase | Tool | Why |
|---|---|---|
| RTL signal debug | FST + GTKWave | Bit-accurate, zero setup |
| Microarch exploration | µScope viewer | Structured, schema-aware |
| Performance analysis | µScope summaries | Multi-resolution aggregation |
| Bug root-cause | µScope → FST | Find cycle in µScope, drill into FST |
14. Version History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-xx-xx | Initial draft (CPU-specific) |
| 2.0 | 2025-xx-xx | Architecture-agnostic, schema-driven |
| 3.0 | 2025-xx-xx | Transport/protocol layer separation |
| 3.1 | 2025-xx-xx | String pools, styling removed, Konata proof |
| 4.0 | 2025-xx-xx | Aggressive simplification: |
| — Entities removed (modeled as storages) | ||
| — Annotations removed (modeled as events) | ||
| — Counters removed (modeled as 1-slot storages) | ||
| — SF_CIRCULAR, SF_CAM removed (head/tail = fields) | ||
| — DA_HEAD, DA_TAIL, DA_COUNTER_* removed | ||
| — Field/event/counter display flags removed | ||
| — Summary source/aggregation semantics removed | ||
| — Two string pools merged into one | ||
| — DUT descriptor simplified (vendor/version = properties) | ||
| — Delta actions: 7 → 3 (SET, CLEAR, ADD); 4 in v0.3 | ||
| — Section types: 7 → 4 | ||
| — LEB128 delta-encoded cycle in cycle frames | ||
— Append-only segment chain with tail_offset | ||
| — Live read support (no finalization required) | ||
| — Segment table moved to finalization-only | ||
| 4.1 | 2026-xx-xx | Chunked preamble: |
| — File header: 64 → 48 bytes | ||
| — DUT descriptor, schema, trace config become chunks | ||
— dut_desc_offset, schema_offset removed from header | ||
— checkpoint_interval moved to CHUNK_TRACE_CONFIG | ||
— dut_desc_t.size, schema_header_t.size removed | ||
| — Unknown chunk types skipped (forward compatibility) | ||
— storage_id widened to uint16 throughout | ||
— num_enums narrowed to uint8 | ||
— field_def_t.size removed (derived from type) | ||
— Compact deltas: per-frame op_format + file flag | ||
— num_deltas → num_cycle_frames | ||
| — Payload wire format specified (tight packing, LE) | ||
| — Live-read commit ordering specified | ||
| — Scopes: hierarchical grouping of storages/events | ||
| — Per-scope protocol assignment (multi-protocol traces) | ||
| 4.2 | 2026-xx-xx | Picosecond time model: |
| — All timestamps in picoseconds (universal time axis) | ||
| — Clock domain definitions in schema (name + period) | ||
| — Scopes assigned to clock domains | ||
— total_cycles → total_time_ps | ||
— cycle_start/cycle_end → time_start_ps/time_end_ps | ||
— checkpoint_interval → checkpoint_interval_ps | ||
| 4.3 | 2026-03-21 | Interleaved frame format (v0.2): |
— F_INTERLEAVED_DELTAS flag (bit 7) | ||
| — Tagged item stream replaces separate op/event arrays | ||
| — Preserves call order of ops and events within a cycle | ||
— version_minor bumped to 2 | ||
— F_COMPACT_DELTAS ignored when interleaved is set | ||
| 4.4 | 2026-03-29 | Trace summary + buffer flag: |
— SF_BUFFER storage flag (bit 1) | ||
— SECTION_COUNTER_SUMMARY section type (0x0010) | ||
| — TraceSummary (TSUM) replaces abstract summary §6 | ||
| — Instruction density mipmap + counter mipmaps | ||
| — Backward-compatible CSUM reader for legacy files | ||
| 4.5 | 2026-04-01 | Storage properties (v0.3): |
— StorageDef header: 12 → 16 bytes | ||
— num_properties + reserved fields added | ||
— FieldDef[num_properties] appended after slot fields | ||
— DA_PROP_SET delta action (0x04) for property updates | ||
| — Checkpoint blocks include property data after slot data | ||
— version_minor bumped to 3 | ||
— SchemaHeader size corrected: 14 → 12 bytes |
15. Glossary
| Term | Definition |
|---|---|
| Checkpoint | Full snapshot of all storage state at a segment boundary. Enables random access without replaying from the start. |
| Chunk | A typed, length-prefixed block in the preamble. Unknown chunk types are skipped for forward compatibility. |
| Clock domain | A named clock with a period in picoseconds. Scopes are assigned to clock domains for cycle-number display. |
| Cycle frame | One timestamp's worth of delta operations and events. v0.1: separate op/event arrays (§8.6.1). v0.2: interleaved tagged items preserving call order (§8.6.6). |
| Delta | A single state change within a cycle frame (DA_SLOT_SET, DA_SLOT_CLEAR, DA_SLOT_ADD, or DA_PROP_SET). |
| DUT | Device Under Test. The hardware being traced. |
| Event | A timestamped occurrence with a schema-defined typed payload. Fire-and-forget (no persistent state). |
| Finalization | The process of writing summary, string table, segment table, and section table at trace close. Sets F_COMPLETE. |
| LEB128 | Little-Endian Base 128. Variable-length unsigned integer encoding. Used for time deltas. |
| Mipmap | Multi-resolution summary pyramid. Each level aggregates the level below by a fan-out factor. |
| Preamble | The chunk stream between the file header and the first segment. Contains DUT, schema, and trace config. |
| Protocol layer | Defines semantic meaning for a specific DUT type (e.g. cpu). Assigned per-scope. Not part of this spec. |
| Schema | Immutable definition of all scopes, storages, events, enums, and summary fields. Written once at creation. |
| Scope | A named node in a hierarchical tree rooted at /. Groups storages and events. Optionally declares a protocol. |
| Segment | One checkpoint-interval's worth of data: a checkpoint followed by compressed deltas. |
| Slot | One entry in a storage array. Contains one value per field defined in the storage schema. |
| Storage | A named, fixed-size array of typed slots. State is mutated by deltas and snapshotted in checkpoints. |
| String pool | Packed, null-terminated UTF-8 strings referenced by uint16_t offsets. Shared by DUT descriptor and schema. |
| String table | Optional section for runtime strings (e.g. disassembled instructions) referenced by FT_STRING_REF fields. |
| Tail offset | File header field pointing to the last completed segment. Updated after each segment flush. Enables live reading. |
| Transport layer | The binary file format defined by this spec. Knows about storages, events, and segments — nothing else. |
µScope cpu Protocol Specification
Version: 0.1-draft
Protocol identifier: cpu
Transport version: µScope 0.x
1. Overview
The cpu protocol defines conventions for tracing any pipelined CPU —
in-order, out-of-order, VLIW, or multi-threaded — using the µScope
transport layer. It does not prescribe a fixed schema. Instead, it
defines semantic conventions that a DUT writer follows and a viewer
relies on to render pipeline visualizations, occupancy charts, and
performance summaries without prior knowledge of the specific
microarchitecture.
1.1 Design Principles
-
Generic over specific. The protocol works for a 5-stage in-order core and a 20-stage OoO core alike. The DUT declares its structures; the viewer renders whatever it finds.
-
Convention over configuration. Semantics are conveyed through field names, storage shapes, and DUT properties — not through protocol-specific binary metadata.
-
Viewer decodes, trace stores data. The trace carries raw values (PC, instruction bits). The viewer decodes disassembly, register names, etc. using the ELF and ISA knowledge.
-
Entity-centric. Every in-flight instruction has a unique ID. All structures reference entities by ID. The viewer joins on this ID to build per-instruction timelines.
2. Concepts
2.1 Entities
An entity is an in-flight instruction (or micro-op). Each entity occupies a slot in the entity catalog storage and is referenced by its slot index throughout the pipeline.
- Entity ID = slot index in the entity catalog (
U32). - When an instruction is fetched, the writer allocates a slot
(
DA_SLOT_SETon its fields). When it retires or is flushed, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused. - The entity catalog must be sparse.
2.2 Buffers
A buffer is any storage whose slots hold entity references — a hardware structure that entities pass through or reside in. Examples: ROB, issue queues, load/store queues, scoreboards, reservation stations.
A storage is recognized as a buffer if it contains a field named
entity_id (§3.2). The viewer automatically tracks entity
membership in every buffer.
2.3 Stages
The viewer renders a per-entity Gantt chart showing which pipeline
stage each instruction is in over time. Since an entity can occupy
multiple buffers simultaneously (e.g., ROB + issue queue + executing),
stage progression is tracked explicitly via stage_transition
events (§5.1), not inferred from buffer membership.
Buffers and stages are orthogonal:
- Buffers model where an entity physically resides (ROB slot 42, LQ slot 7). An entity can be in multiple buffers at once.
- Stages model logical pipeline progress (fetch → decode → ... → retire). An entity is in exactly one stage at any time.
The DUT declares the stage ordering via pipeline_stages (§4.1) and
emits a stage_transition event each time an entity advances. The
viewer maintains a current_stage per entity and draws Gantt bars
from stage entry/exit times.
2.4 Counters
A counter is a 1-slot, non-sparse storage with numeric fields,
mutated via DA_SLOT_ADD. The viewer infers counters from this shape
and renders them as line graphs or sparklines. No protocol markup is
needed.
2.5 Events
Events model instantaneous occurrences attached to entities or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.
3. Entity Catalog
3.1 Storage Convention
The entity catalog is a storage named entities.
| Property | Value |
|---|---|
| Name | entities |
| Sparse | yes (SF_SPARSE) |
| Num slots | max concurrent in-flight entities (DUT-specific) |
3.2 Required Fields
| Field name | Type | Description |
|---|---|---|
entity_id | U32 | Unique entity ID (equals the slot index) |
pc | U64 | Program counter |
inst_bits | U32 | Raw instruction bits |
3.3 Optional Fields
The DUT may add any additional fields. Common examples:
| Field name | Type | Description |
|---|---|---|
thread_id | U16 | Hardware thread / hart ID |
is_compressed | BOOL | Compressed instruction (RVC, Thumb, ...) |
priv_level | ENUM | Privilege level at fetch |
3.4 Entity Lifecycle
Fetch: DA_SLOT_SET entities[id].entity_id = id
DA_SLOT_SET entities[id].pc = ...
DA_SLOT_SET entities[id].inst_bits = ...
Retire: DA_SLOT_CLEAR entities[id]
Flush: DA_SLOT_CLEAR entities[id]
(plus a flush event, §5.4)
The entity_id field is always equal to the slot index. It is stored
explicitly so that buffer storages and events can reference it using a
uniform U32 field, independent of the transport's slot indexing.
Slot reuse: After DA_SLOT_CLEAR, the slot may be reused for a new
instruction. The new occupant is a logically distinct entity — the viewer
treats each clear/set cycle as a new entity lifetime. The viewer must not
carry state (stage, annotations, dependencies) across a clear boundary.
4. Buffers and Stages
4.1 Stage Ordering via DUT Properties
The DUT declares pipeline stages using a DUT property:
pipeline_stages = "fetch,decode,rename,dispatch,issue,execute,complete,retire"
The value is a comma-separated list in pipeline order (earliest first).
The viewer uses this ordering for Gantt chart column layout and
coloring. Stage names must match the values used in stage_transition
events (§5.1).
4.2 Buffer Storage Convention
Any storage with a field named entity_id of type U32 is a buffer.
| Property | Value |
|---|---|
| Sparse | yes (SF_SPARSE) |
| Num slots | hardware structure capacity |
4.3 Required Buffer Fields
| Field name | Type | Description |
|---|---|---|
entity_id | U32 | References entity catalog slot |
4.4 Optional Buffer Fields
The DUT may add structure-specific fields:
| Field name | Type | Description |
|---|---|---|
completed | BOOL | Execution completed (ROB) |
addr | U64 | Memory address (LQ/SQ) |
ready | BOOL | Operands ready (IQ/scoreboard) |
fu_type | ENUM | Functional unit assigned |
4.5 Buffer Operations
Insert: DA_SLOT_SET rob[slot].entity_id = id
Remove: DA_SLOT_CLEAR rob[slot]
Update: DA_SLOT_SET rob[slot].completed = 1
5. Standard Events
The protocol defines the following event names. stage_transition is
required for Gantt chart rendering; all others are optional. The viewer
renders recognized events with specific visualizations and unknown
events generically (name + fields in a tooltip).
5.1 stage_transition
Explicit pipeline stage change for an entity. The DUT emits this event
each time an instruction advances to a new pipeline stage. Superscalar
cores emit multiple stage_transition events in the same cycle frame
(e.g., a 4-wide machine retiring 4 instructions produces 4 events).
| Field name | Type | Description |
|---|---|---|
entity_id | U32 | Entity that advanced |
stage | ENUM(pipeline_stage) | Stage the entity entered |
The enum must be named pipeline_stage in the schema. Its values must
match the names declared in the pipeline_stages DUT property (§4.1).
For example:
| Value | Name |
|---|---|
| 0 | fetch |
| 1 | decode |
| 2 | rename |
| 3 | dispatch |
| 4 | issue |
| 5 | execute |
| 6 | complete |
| 7 | retire |
The enum is DUT-defined — an in-order core might have just
fetch, decode, execute, memory, writeback.
The viewer maintains a current_stage per entity. A Gantt bar for a
stage spans from the time the entity entered it until the time it
entered the next stage (or was cleared/flushed). Multi-cycle stages
(e.g., a long-latency divide in execute) require no special handling —
the entity simply stays in its current stage until the next
stage_transition event.
5.2 annotate
Free-text annotation attached to an entity.
| Field name | Type | Description |
|---|---|---|
entity_id | U32 | Target entity |
text | STRING_REF | Annotation text |
Viewer: shows as a label on the entity's Gantt bar.
5.3 dependency
Data or structural dependency between two entities.
| Field name | Type | Description |
|---|---|---|
src_id | U32 | Producer entity |
dst_id | U32 | Consumer entity |
dep_type | ENUM(dep_type) | Dependency kind |
Standard dep_type enum values:
| Value | Name |
|---|---|
| 0 | raw |
| 1 | war |
| 2 | waw |
| 3 | structural |
Viewer: draws an arrow from producer to consumer in the Gantt chart.
5.4 flush
Entity was squashed before retirement.
| Field name | Type | Description |
|---|---|---|
entity_id | U32 | Flushed entity |
reason | ENUM(flush_reason) | Cause |
Standard flush_reason enum values:
| Value | Name |
|---|---|
| 0 | mispredict |
| 1 | exception |
| 2 | interrupt |
| 3 | pipeline_clear |
Viewer: marks the entity's Gantt bar with a squash indicator.
5.5 stall
Pipeline stall (not tied to a specific entity).
| Field name | Type | Description |
|---|---|---|
reason | ENUM(stall_reason) | Stall cause |
Standard stall_reason enum values are DUT-defined. Common examples:
rob_full, iq_full, lq_full, sq_full, fetch_miss,
dcache_miss, frontend_stall.
Viewer: renders a colored band on the timeline.
6. Counters
No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.
Common counters:
| Storage name | Fields | Meaning |
|---|---|---|
committed_insns | count: U64 | Retired instructions |
bp_misses | count: U64 | Branch mispredictions |
dcache_misses | count: U64 | D-cache misses |
icache_misses | count: U64 | I-cache misses |
Writer updates via DA_SLOT_ADD:
uscope_slot_add(w, STOR_COMMITTED_INSNS, 0, FIELD_COUNT, 4); // retired 4 this cycle
7. Summary Fields
The protocol defines standard summary field names for mipmap rendering. The viewer recognizes these and aggregates them appropriately.
| Field name | Type | Meaning |
|---|---|---|
committed | U32 | Instructions committed in bucket |
cycles_active | U32 | Non-idle cycles in bucket |
flushes | U16 | Flush events in bucket |
bp_misses | U16 | Branch mispredictions in bucket |
Per-buffer occupancy summaries use the naming pattern
<storage_name>_occ (e.g., rob_occ). The value is the sum of
occupancy samples in the bucket; divide by cycles_active for average.
DUT-specific summary fields are rendered as generic bar charts.
8. DUT Properties
Properties use the cpu. key prefix so they coexist with other
protocols in multi-protocol traces.
8.1 Required Properties
| Key | Description | Example |
|---|---|---|
dut_name | DUT instance name | boom_core_0 |
cpu.protocol_version | Version of the cpu protocol | 0.1 |
cpu.isa | Instruction set architecture | RV64GC |
cpu.pipeline_stages | Comma-separated stage names, in order | fetch,...,retire |
8.2 Optional Properties
| Key | Description | Example |
|---|---|---|
cpu.fetch_width | Instructions fetched per cycle | 4 |
cpu.commit_width | Instructions retired per cycle | 4 |
cpu.elf_path | Path to ELF for disassembly | /path/to/fw.elf |
cpu.vendor | DUT vendor | sifive |
9. Viewer Reconstruction
9.1 Opening a Trace
- Read preamble → parse schema and DUT properties
- Walk scope tree from root
/→ find all scopes withprotocol = "cpu"; each is a core - Per core scope: identify
entitiesstorage (entity catalog), find all buffers (storages withentity_idfield), identify counters (1-slot non-sparse storages) - Read
cpu.pipeline_stagesproperty → build ordered stage list - If
cpu.elf_pathproperty exists, load ELF for disassembly
9.2 Gantt Chart Rendering
For a time range [T0, T1) in picoseconds:
- Seek to segment covering
T0(binary search or chain walk) - Load checkpoint → initial state of all storages
- Replay deltas and events
T0..T1, tracking per-entity:- Birth: entity slot becomes valid in
entities - Stage transitions:
stage_transitionevent → record(entity_id, stage, timestamp) - Death: entity slot cleared in
entities(retire or flush)
- Birth: entity slot becomes valid in
- For each entity, emit Gantt bars: each stage spans from its
stage_transitiontimestamp until the next transition (or death) - Entity labels: read
pcandinst_bitsfrom entity catalog, decode via ISA disassembler - Dependency arrows:
dependencyevents in the range - Flush markers:
flushevents in the range - Convert timestamps to domain-local cycle numbers for display using the scope's clock domain period
9.3 Occupancy View
For each buffer, count valid slots per cycle. The mipmap summary
(<name>_occ fields) gives this at coarse granularity; delta replay
gives exact per-cycle values when zoomed in.
9.4 Counter Graphs
Read counter storages at each cycle frame (via DA_SLOT_ADD deltas).
Compute rates (delta / cycles) for display. Mipmap summaries provide
pre-aggregated values for zoomed-out views.
10. Example: BOOM-like OoO Core
10.1 DUT Properties
dut_name = "boom_tile0_core0"
cpu.isa = "RV64GC"
cpu.fetch_width = "4"
cpu.commit_width = "4"
cpu.elf_path = "/workspace/fw.elf"
cpu.pipeline_stages = "fetch,decode,rename,dispatch,issue,execute,complete,retire"
10.2 Schema
Scopes:
/ (id=0, root, protocol=none)
core0 (id=1, parent=0, protocol="cpu")
Enums:
pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
issue(4), execute(5), complete(6), retire(7)
dep_type: raw(0), war(1), waw(2), structural(3)
flush_reason: mispredict(0), exception(1), interrupt(2)
stall_reason: rob_full(0), iq_full(1), lq_full(2), sq_full(3),
fetch_miss(4), dcache_miss(5)
Storages (all scope=core0):
entities (sparse, 512 slots): entity_id:U32, pc:U64, inst_bits:U32
rob (sparse, 256 slots): entity_id:U32, completed:BOOL
iq_int (sparse, 48 slots): entity_id:U32
iq_fp (sparse, 32 slots): entity_id:U32
iq_mem (sparse, 48 slots): entity_id:U32
lq (sparse, 32 slots): entity_id:U32, addr:U64
sq (sparse, 32 slots): entity_id:U32, addr:U64
committed (dense, 1 slot): count:U64
bp_misses (dense, 1 slot): count:U64
Events (all scope=core0):
stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)
annotate: entity_id:U32, text:STRING_REF
dependency: src_id:U32, dst_id:U32, type:ENUM(dep_type)
flush: entity_id:U32, reason:ENUM(flush_reason)
stall: reason:ENUM(stall_reason)
Note: transient stages (fetch, decode, execute, etc.) are modeled
purely via stage_transition events — no storages needed. Only
physical structures that hold entities (ROB, IQ, LQ, SQ) are storages.
10.3 Example: 5-Stage In-Order Core
Same protocol, minimal schema:
DUT properties:
cpu.pipeline_stages = "fetch,decode,execute,memory,writeback"
Scopes:
/ (id=0, root, protocol=none)
core0 (id=1, parent=0, protocol="cpu")
Enums:
pipeline_stage: fetch(0), decode(1), execute(2), memory(3), writeback(4)
Storages (all scope=core0):
entities (sparse, 8 slots): entity_id:U32, pc:U64, inst_bits:U32
committed (dense, 1 slot): count:U64
Events (all scope=core0):
stage_transition: entity_id:U32, stage:ENUM(pipeline_stage)
An in-order core may have no buffers at all — just the entity catalog and stage transitions. The viewer renders a Gantt chart purely from events.
10.4 Example: Dual-Core SoC
Multi-core uses transport-level scopes (§4.4 of the transport spec).
Each core is a scope with protocol = "cpu". Storages and event types
are defined per-scope, so entity IDs are per-core and no core_id
field is needed in event payloads.
DUT properties:
dut_name = "my_soc"
cpu.pipeline_stages = "fetch,decode,rename,dispatch,issue,execute,complete,retire"
cpu.isa = "RV64GC"
cpu.elf_path = "/workspace/fw.elf"
Scopes:
/ (id=0, root, protocol=none)
cpu_cluster (id=1, parent=0, protocol=none)
core0 (id=2, parent=1, protocol="cpu")
core1 (id=3, parent=1, protocol="cpu")
Enums (shared):
pipeline_stage: fetch(0), decode(1), rename(2), dispatch(3),
issue(4), execute(5), complete(6), retire(7)
Storages:
entities (scope=core0, sparse, 512): entity_id:U32, pc:U64, inst_bits:U32
rob (scope=core0, sparse, 256): entity_id:U32
committed (scope=core0, dense, 1): count:U64
entities (scope=core1, sparse, 512): entity_id:U32, pc:U64, inst_bits:U32
rob (scope=core1, sparse, 256): entity_id:U32
committed (scope=core1, dense, 1): count:U64
Events:
stage_transition (scope=core0): entity_id:U32, stage:ENUM(pipeline_stage)
stage_transition (scope=core1): entity_id:U32, stage:ENUM(pipeline_stage)
flush (scope=core0): entity_id:U32, reason:ENUM(flush_reason)
flush (scope=core1): entity_id:U32, reason:ENUM(flush_reason)
The viewer finds all scopes with protocol = "cpu", renders a
per-core pipeline view for each, and can show them side-by-side.
Storage names (entities, rob) repeat across scopes — the
storage_id is globally unique, but the name + scope combination
gives the viewer the display path (core0/entities, core1/rob).
Cross-core events (cache coherence, IPIs) can be defined at the
cpu_cluster scope with fields referencing the relevant scope IDs.
11. Version History
| Version | Date | Changes |
|---|---|---|
| 0.1 | 2026-xx-xx | Initial draft |
µScope noc Protocol Specification
Version: 0.1-draft
Protocol identifier: noc
Transport version: µScope 0.x
1. Overview
The noc protocol defines conventions for tracing any on-chip
interconnect — crossbar, mesh, ring, tree, or point-to-point — using
the µScope transport layer. It works with any bus protocol: AXI4, CHI,
ACE, TileLink, UCIe, or proprietary fabrics.
Like the cpu protocol, it does not prescribe a fixed schema. Instead,
it defines semantic conventions that a DUT writer follows and a
viewer relies on to render transaction Gantt charts, topology maps,
latency histograms, and traffic heatmaps without prior knowledge of the
specific interconnect microarchitecture.
1.1 Design Principles
-
Generic over specific. The protocol works for a single-port AXI crossbar and a 64-node CHI mesh alike. The DUT declares its structures; the viewer renders whatever it finds.
-
Convention over configuration. Semantics are conveyed through field names, storage shapes, and scope properties — not through protocol-specific binary metadata.
-
Entity-centric. Every in-flight transaction has a unique ID in a transaction catalog. All buffers, events, and stages reference transactions by this ID. The viewer joins on it to build per-transaction timelines.
-
Topology-agnostic. The protocol does not encode topology in the data model. Topology is declared via scope properties; the viewer uses it for visualization only.
2. Concepts
2.1 Transactions (Entities)
A transaction is an in-flight bus operation (read, write, snoop, etc.). Each transaction occupies a slot in the transaction catalog storage and is referenced by its slot index throughout the interconnect.
- Transaction ID = slot index in the transaction catalog (
U32). - When a transaction is issued, the writer allocates a slot
(
DA_SLOT_SETon its fields). When it completes, the writer clears the slot (DA_SLOT_CLEAR). The slot can then be reused. - The transaction catalog must be sparse.
Transactions in the noc protocol are the direct analogue of entities
in the cpu protocol (cpu spec §2.1).
2.2 Buffers
A buffer is any storage whose slots hold transaction references — a hardware structure that transactions pass through or reside in. Examples: virtual channel (VC) buffers, reorder buffers, outstanding request tables, credit pools.
A storage is recognized as a buffer if it contains a field named
txn_id (§3.2). The viewer automatically tracks transaction membership
in every buffer.
2.3 Stages
The viewer renders a per-transaction Gantt chart showing which pipeline stage each transaction is in over time. Since a transaction can occupy multiple buffers simultaneously (e.g., outstanding request table
- VC buffer + arbitrating), stage progression is tracked explicitly
via
stage_transitionevents (§5.1), not inferred from buffer membership.
Buffers and stages are orthogonal:
- Buffers model where a transaction physically resides (VC slot 3, ROB entry 7). A transaction can be in multiple buffers at once.
- Stages model logical progression through the interconnect (issue → route → arbitrate → traverse → deliver → respond). A transaction is in exactly one stage at any time.
The DUT declares the stage ordering via noc.pipeline_stages (§4.1)
and emits a stage_transition event each time a transaction advances.
The viewer maintains a current_stage per transaction and draws Gantt
bars from stage entry/exit times.
2.4 Counters
A counter is a 1-slot, non-sparse storage with numeric fields,
mutated via DA_SLOT_ADD. The viewer infers counters from this shape
and renders them as line graphs or sparklines. No protocol markup is
needed.
2.5 Events
Events model instantaneous occurrences attached to transactions or to the timeline. The protocol defines standard event names (§5). The viewer renders recognized events with specific visualizations and unknown events generically.
2.6 Router Sub-Scopes
For multi-router interconnects, each router can be a child scope
with protocol="noc.router". This enables per-router buffers, counters,
and events while keeping the transaction catalog on the nearest ancestor
noc scope.
/ (protocol=none)
noc0 (protocol="noc") ← transaction catalog here
router_0_0 (protocol="noc.router") ← per-router buffers/counters
router_0_1 (protocol="noc.router")
router_1_0 (protocol="noc.router")
router_1_1 (protocol="noc.router")
A noc.router scope does not have its own transaction catalog. It
references transactions from the parent noc scope's catalog via the
txn_id field. The viewer resolves txn_id by walking up the scope
tree to the nearest noc scope.
2.7 Cross-Scope Transaction Handoff
When a transaction crosses a scope boundary — e.g., a chiplet-to-chiplet
transfer via a D2D link, or a protocol bridge (AXI→CHI) — it receives
a new txn_id in the destination scope. The txn_handoff event (§5.7)
stitches the two identities together, enabling end-to-end latency
tracking across scope boundaries.
The txn_handoff event is emitted at a common ancestor scope of the
source and destination scopes. The viewer joins on these events to build
cross-scope transaction timelines.
3. Transaction Catalog
3.1 Storage Convention
The transaction catalog is a storage named transactions.
| Property | Value |
|---|---|
| Name | transactions |
| Sparse | yes (SF_SPARSE) |
| Num slots | max concurrent in-flight transactions (DUT-specific) |
3.2 Required Fields
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Unique transaction ID (equals the slot index) |
opcode | ENUM | Transaction type (read, write, snoop, etc.) |
addr | U64 | Target address |
len | U16 | Burst length (number of beats) |
size | U8 | Beat size (log2 bytes, e.g., 3 = 8 bytes) |
src_port | U16 | Source port / initiator ID |
dst_port | U16 | Destination port / target ID |
3.3 Optional Fields
The DUT may add any additional fields. Common examples:
| Field name | Type | Description |
|---|---|---|
qos | U8 | Quality-of-service priority |
txn_class | ENUM | Transaction class (posted, non-posted, etc.) |
prot | U8 | Protection bits (privileged, secure, etc.) |
cache | U8 | Cache allocation hints |
snoop | U8 | Snoop attribute bits |
domain | ENUM | Shareability domain |
excl | BOOL | Exclusive access flag |
tag | U16 | Transaction tag (for reorder tracking) |
3.4 Transaction Lifecycle
Issue: DA_SLOT_SET transactions[id].txn_id = id
DA_SLOT_SET transactions[id].opcode = ...
DA_SLOT_SET transactions[id].addr = ...
DA_SLOT_SET transactions[id].len = ...
DA_SLOT_SET transactions[id].size = ...
DA_SLOT_SET transactions[id].src_port = ...
DA_SLOT_SET transactions[id].dst_port = ...
Complete: DA_SLOT_CLEAR transactions[id]
The txn_id field is always equal to the slot index. It is stored
explicitly so that buffer storages and events can reference it using a
uniform U32 field, independent of the transport's slot indexing.
4. Buffers and Stages
4.1 Stage Ordering via Scope Properties
Each noc scope declares pipeline stages using a scope property:
noc.pipeline_stages = "issue,route,arbitrate,traverse,deliver,respond"
The value is a comma-separated list in pipeline order (earliest first).
The viewer uses this ordering for Gantt chart column layout and
coloring. Stage names must match the values used in stage_transition
events (§5.1). Each noc scope declares its own stages, enabling
heterogeneous interconnects in the same trace.
4.2 Buffer Storage Convention
Any storage with a field named txn_id of type U32 is a buffer.
| Property | Value |
|---|---|
| Sparse | yes (SF_SPARSE) |
| Num slots | hardware structure capacity |
4.3 Required Buffer Fields
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | References transaction catalog slot |
4.4 Optional Buffer Fields
The DUT may add structure-specific fields:
| Field name | Type | Description |
|---|---|---|
vc | U8 | Virtual channel assignment |
priority | U8 | Arbitration priority |
flit_type | ENUM | Flit type (header, data, tail) |
credits | U8 | Available credits |
4.5 Buffer Operations
Insert: DA_SLOT_SET vc_buf[slot].txn_id = id
Remove: DA_SLOT_CLEAR vc_buf[slot]
Update: DA_SLOT_SET vc_buf[slot].credits = 3
4.6 Common Buffers
| Buffer name | Models |
|---|---|
vc_buf_<port> | Per-port virtual channel buffer |
rob | Reorder buffer for out-of-order completion |
ort | Outstanding request table / tracker |
snoop_filter | Snoop filter entries |
retry_buf | Transactions awaiting retry |
4.7 Example Stage Sets
AXI4 crossbar:
noc.pipeline_stages = "ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"
CHI mesh:
noc.pipeline_stages = "req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"
TileLink ring:
noc.pipeline_stages = "acquire,route,grant,grant_ack"
5. Standard Events
The protocol defines the following event names. stage_transition is
required for Gantt chart rendering; all others are optional. The viewer
renders recognized events with specific visualizations and unknown
events generically (name + fields in a tooltip).
5.1 stage_transition
Explicit stage change for a transaction. The DUT emits this event each time a transaction advances to a new pipeline stage.
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Transaction that advanced |
stage | ENUM(pipeline_stage) | Stage the transaction entered |
The pipeline_stage enum values must match the names declared in the
noc.pipeline_stages scope property (§4.1). For example (AXI4):
| Value | Name |
|---|---|
| 0 | ar_issue |
| 1 | route |
| 2 | arbitrate |
| 3 | transport |
| 4 | target_accept |
| 5 | r_data |
| 6 | r_last |
The enum is DUT-defined — a simple crossbar might have just
issue, arbitrate, transfer, complete.
The viewer maintains a current_stage per transaction. A Gantt bar for
a stage spans from the cycle the transaction entered it until the cycle
it entered the next stage (or was cleared).
5.2 beat
Individual data beat in a burst transfer.
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Parent transaction |
beat_num | U16 | Beat number within burst (0-based) |
data_bytes | U16 | Bytes transferred in this beat |
Viewer: shows beat markers on the transaction's Gantt bar during the data transfer stage. Useful for identifying partial transfers and stalls between beats.
5.3 retry
Transaction retry — the target or interconnect rejected the transaction and it must be re-attempted.
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Retried transaction |
reason | ENUM(retry_reason) | Cause of retry |
Standard retry_reason enum values:
| Value | Name |
|---|---|
| 0 | target_busy |
| 1 | no_credits |
| 2 | vc_full |
| 3 | arb_lost |
| 4 | protocol_retry |
Viewer: marks a retry indicator on the transaction's Gantt bar.
5.4 timeout
Watchdog timeout — a transaction exceeded the expected completion time.
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Timed-out transaction |
threshold_cycles | U32 | Watchdog threshold that was exceeded |
Viewer: marks a timeout indicator on the transaction's Gantt bar and highlights it in the topology view.
5.5 link_credit
Credit flow control update on a link.
| Field name | Type | Description |
|---|---|---|
port | U16 | Port ID |
direction | ENUM(credit_direction) | Credit grant or consume |
credits | U8 | Number of credits |
Standard credit_direction enum values:
| Value | Name |
|---|---|
| 0 | grant |
| 1 | consume |
Viewer: renders credit level as a per-port sparkline.
5.6 arb_decision
Arbitration outcome — records which transaction won arbitration at a port.
| Field name | Type | Description |
|---|---|---|
winner_txn | U32 | Transaction that won arbitration |
port | U16 | Port where arbitration occurred |
num_contenders | U8 | Number of competing transactions |
Viewer: shows arbitration events in the timeline. High num_contenders
values indicate congestion hotspots.
5.7 txn_handoff
Cross-scope transaction stitching — links a transaction in one scope to its continuation in another scope.
| Field name | Type | Description |
|---|---|---|
src_scope | U16 | Scope ID of the source transaction |
src_txn_id | U32 | Transaction ID in the source scope |
dst_scope | U16 | Scope ID of the destination transaction |
dst_txn_id | U32 | Transaction ID in the destination scope |
This event is emitted at a common ancestor scope of src_scope and
dst_scope. It enables end-to-end latency tracking across chiplet
boundaries, protocol bridges, or any other scope boundary where a
transaction receives a new identity.
Viewer: draws a handoff arrow between the two transaction timelines and computes end-to-end latency by joining the linked transactions.
5.8 annotate
Free-text annotation attached to a transaction.
| Field name | Type | Description |
|---|---|---|
txn_id | U32 | Target transaction |
text | STRING_REF | Annotation text |
Viewer: shows as a label on the transaction's Gantt bar.
6. Counters
No special protocol convention beyond shape detection. A 1-slot, non-sparse storage is a counter. The storage name is the counter label.
Common counters:
| Storage name | Fields | Meaning |
|---|---|---|
bytes_tx | count: U64 | Bytes transmitted |
bytes_rx | count: U64 | Bytes received |
arb_conflicts | count: U64 | Arbitration conflicts (>1 contender) |
retries | count: U64 | Transaction retries |
txn_completed | count: U64 | Transactions completed |
Writer updates via DA_SLOT_ADD:
uscope_slot_add(w, STOR_BYTES_TX, 0, FIELD_COUNT, 64); // 64 bytes this cycle
For per-router counters, place the counter storage on the router's sub-scope (§2.6).
7. Summary Fields
The protocol defines standard summary field names for mipmap rendering.
Each summary field is scoped to its noc scope (via scope_id in
summary_field_def_t), so multi-interconnect traces have independent
summaries without name collisions.
| Field name | Type | Meaning |
|---|---|---|
txn_completed | U32 | Transactions completed in bucket |
bytes_transferred | U64 | Total bytes transferred in bucket |
avg_latency_ticks | U32 | Average transaction latency in bucket |
retries | U16 | Retry events in bucket |
Per-buffer occupancy summaries use the naming pattern
<storage_name>_occ (e.g., vc_buf_0_occ). The value is the sum of
occupancy samples in the bucket; divide by active cycles for average.
DUT-specific summary fields are rendered as generic bar charts.
8. Scope Properties
Properties are stored on each scope (transport spec §3.4.1). The noc
protocol uses the noc. key prefix. Each noc scope carries its own
properties, enabling heterogeneous interconnects in the same trace.
Properties that describe the overall trace (e.g., dut_name) belong
on the root scope.
8.1 Required Properties (on each noc scope)
| Key | Description | Example |
|---|---|---|
noc.protocol_version | Version of the noc protocol | 0.1 |
noc.bus_protocol | Underlying bus protocol | AXI4, CHI, TileLink, UCIe |
noc.topology | Interconnect topology | crossbar, mesh, ring, tree, p2p |
noc.pipeline_stages | Comma-separated stage names, in order | issue,route,arbitrate,traverse,deliver,respond |
clock.period_ps | Clock period in picoseconds | 1000 (1 GHz) |
8.2 Optional Properties (on each noc scope)
| Key | Description | Example |
|---|---|---|
noc.dim_x | Mesh X dimension | 4 |
noc.dim_y | Mesh Y dimension | 4 |
noc.num_vcs | Number of virtual channels per port | 4 |
noc.data_width | Data bus width in bits | 128 |
noc.addr_width | Address bus width in bits | 48 |
noc.num_ports | Total number of ports | 16 |
noc.routing | Routing algorithm | xy, adaptive |
8.3 Root Scope Properties
| Key | Description | Example |
|---|---|---|
dut_name | DUT instance name | my_soc |
vendor | DUT vendor (top-level) | acme |
9. Viewer Reconstruction
9.1 Opening a Trace
- Read preamble → parse schema (including scope properties)
- Walk scope tree from root
/→ find all scopes withprotocol = "noc"; each is an interconnect instance - Per
nocscope: a. Read scope properties →noc.pipeline_stages,noc.bus_protocol,noc.topology, etc. b. Identifytransactionsstorage (transaction catalog) c. Find all buffers (storages withtxn_idfield) d. Identify counters (1-slot non-sparse storages) e. Find child scopes withprotocol = "noc.router"for per-router detail - Per
nocscope: build ordered stage list fromnoc.pipeline_stages - If
noc.topology = "mesh", readnoc.dim_xandnoc.dim_yfor topology rendering
9.2 Transaction Gantt Chart
For a cycle range [C0, C1):
- Seek to segment covering
C0(binary search or chain walk) - Load checkpoint → initial state of all storages
- Replay deltas and events
C0..C1, tracking per-transaction:- Birth: transaction slot becomes valid in
transactions - Stage transitions:
stage_transitionevent → record(txn_id, stage, cycle) - Death: transaction slot cleared in
transactions(completion)
- Birth: transaction slot becomes valid in
- For each transaction, emit Gantt bars: each stage spans from its
stage_transitioncycle until the next transition (or death) - Transaction labels: read
opcode,addr,src_port,dst_portfrom the transaction catalog - Retry markers:
retryevents in the range - Beat markers:
beatevents in the range - Timeout markers:
timeoutevents in the range
9.3 Topology View
Using the noc.topology scope property and src_port/dst_port fields
from the transaction catalog:
- Render the interconnect topology (mesh grid, ring, tree, etc.)
- Animate transaction flow by mapping
stage_transitionevents to router positions - Color links by utilization (bytes per cycle / data width)
- Highlight congestion hotspots using
arb_decisioncontention data
For mesh topologies, map port IDs to (x, y) coordinates using
noc.dim_x and noc.dim_y.
9.4 Latency Histogram
Compute per-transaction latency from birth-to-death ticks in the
transactions catalog. Group by opcode, src_port, dst_port, or
address range for drill-down analysis.
9.5 Cross-Scope Stitching
- Find
txn_handoffevents across allnocscopes - Join
(src_scope, src_txn_id)to(dst_scope, dst_txn_id) - Build end-to-end transaction timelines spanning multiple scopes
- Compute end-to-end latency by summing per-scope stage durations
9.6 Occupancy View
For each buffer, count valid slots per cycle. The mipmap summary
(<name>_occ fields) gives this at coarse granularity; delta replay
gives exact per-cycle values when zoomed in.
9.7 Counter Graphs
Read counter storages at each cycle frame (via DA_SLOT_ADD deltas).
Compute rates (delta / cycles) for display. Mipmap summaries provide
pre-aggregated values for zoomed-out views.
10. Examples
10.1 AXI4 Crossbar
A simple single-scope NoC tracing an AXI4 crossbar with 4 initiator ports and 2 target ports.
Scopes:
/ (id=0, root, protocol=none)
properties: dut_name="axi_xbar"
noc0 (id=1, parent=0, protocol="noc")
properties: noc.protocol_version="0.1", noc.bus_protocol="AXI4",
noc.topology="crossbar", noc.data_width="64",
noc.num_ports="6", clock.period_ps="1000",
noc.pipeline_stages="ar_issue,route,arbitrate,transport,target_accept,r_data,r_last"
Enums:
opcode: read(0), write(1), read_linked(2), write_cond(3)
pipeline_stage: ar_issue(0), route(1), arbitrate(2), transport(3),
target_accept(4), r_data(5), r_last(6)
retry_reason: target_busy(0), no_credits(1), arb_lost(2)
credit_direction: grant(0), consume(1)
Storages (all scope=noc0):
transactions (sparse, 64 slots): txn_id:U32, opcode:ENUM(opcode), addr:U64,
len:U16, size:U8, src_port:U16, dst_port:U16,
qos:U8
ort (sparse, 32 slots): txn_id:U32
bytes_tx (dense, 1 slot): count:U64
bytes_rx (dense, 1 slot): count:U64
arb_conflicts (dense, 1 slot): count:U64
txn_completed (dense, 1 slot): count:U64
Events (all scope=noc0):
stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
beat: txn_id:U32, beat_num:U16, data_bytes:U16
retry: txn_id:U32, reason:ENUM(retry_reason)
arb_decision: winner_txn:U32, port:U16, num_contenders:U8
link_credit: port:U16, direction:ENUM(credit_direction), credits:U8
annotate: txn_id:U32, text:STRING_REF
10.2 CHI Mesh NoC
A 4x4 CHI mesh with per-router sub-scopes. The transaction catalog
lives on the parent noc scope; router sub-scopes hold local buffers
and counters.
Scopes:
/ (id=0, root, protocol=none)
properties: dut_name="chi_mesh_soc"
noc0 (id=1, parent=0, protocol="noc")
properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
noc.num_vcs="4", noc.data_width="256",
clock.period_ps="500",
noc.pipeline_stages="req_issue,req_accept,snoop_send,snoop_resp,dat_transfer,comp_ack"
router_0_0 (id=2, parent=1, protocol="noc.router")
router_0_1 (id=3, parent=1, protocol="noc.router")
...
router_3_3 (id=17, parent=1, protocol="noc.router")
Enums:
opcode: read_no_snp(0), read_once(1), read_shared(2), read_unique(3),
write_no_snp(4), write_unique(5), snoop_shared(6),
snoop_unique(7), comp_data(8), comp_ack(9)
pipeline_stage: req_issue(0), req_accept(1), snoop_send(2),
snoop_resp(3), dat_transfer(4), comp_ack(5)
retry_reason: target_busy(0), no_credits(1), vc_full(2),
arb_lost(3), protocol_retry(4)
txn_class: req(0), snp(1), dat(2), rsp(3)
Storages (scope=noc0):
transactions (sparse, 256 slots): txn_id:U32, opcode:ENUM(opcode), addr:U64,
len:U16, size:U8, src_port:U16, dst_port:U16,
qos:U8, txn_class:ENUM(txn_class)
Storages (scope=router_0_0, one set per router):
vc_buf_n (sparse, 4 slots): txn_id:U32, vc:U8
vc_buf_s (sparse, 4 slots): txn_id:U32, vc:U8
vc_buf_e (sparse, 4 slots): txn_id:U32, vc:U8
vc_buf_w (sparse, 4 slots): txn_id:U32, vc:U8
vc_buf_local (sparse, 4 slots): txn_id:U32, vc:U8
bytes_fwd (dense, 1 slot): count:U64
arb_conflicts (dense, 1 slot): count:U64
Events (scope=noc0):
stage_transition: txn_id:U32, stage:ENUM(pipeline_stage)
retry: txn_id:U32, reason:ENUM(retry_reason)
annotate: txn_id:U32, text:STRING_REF
Events (scope=router_0_0, one set per router):
arb_decision: winner_txn:U32, port:U16, num_contenders:U8
link_credit: port:U16, direction:ENUM(credit_direction), credits:U8
The viewer discovers all 16 routers as noc.router children of noc0,
maps them to a 4x4 grid via noc.dim_x/noc.dim_y, and renders
per-router buffer occupancy alongside the global transaction Gantt chart.
10.3 Multi-Chiplet with D2D
Two chiplets connected via a UCIe D2D link. Each chiplet has its own
noc scope with an independent transaction catalog. The txn_handoff
event on the SoC-level scope stitches transactions across the link.
Scopes:
/ (id=0, root, protocol=none)
properties: dut_name="multi_chiplet_soc"
chiplet0 (id=1, parent=0, protocol=none)
chiplet0_noc (id=2, parent=1, protocol="noc")
properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
noc.topology="mesh", noc.dim_x="4", noc.dim_y="4",
noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
clock.period_ps="500"
chiplet1 (id=3, parent=0, protocol=none)
chiplet1_noc (id=4, parent=3, protocol="noc")
properties: noc.protocol_version="0.1", noc.bus_protocol="CHI",
noc.topology="mesh", noc.dim_x="2", noc.dim_y="2",
noc.pipeline_stages="req_issue,req_accept,dat_transfer,comp_ack",
clock.period_ps="500"
d2d_link (id=5, parent=0, protocol="noc")
properties: noc.protocol_version="0.1", noc.bus_protocol="UCIe",
noc.topology="p2p",
noc.pipeline_stages="d2d_issue,phy_encode,link_traverse,phy_decode,d2d_deliver",
clock.period_ps="500"
Storages:
transactions (scope=chiplet0_noc, sparse, 256): txn_id:U32, opcode:ENUM, addr:U64,
len:U16, size:U8, src_port:U16, dst_port:U16
transactions (scope=chiplet1_noc, sparse, 128): txn_id:U32, opcode:ENUM, addr:U64,
len:U16, size:U8, src_port:U16, dst_port:U16
transactions (scope=d2d_link, sparse, 32): txn_id:U32, opcode:ENUM, addr:U64,
len:U16, size:U8, src_port:U16, dst_port:U16
Events (scope=root):
txn_handoff: src_scope:U16, src_txn_id:U32, dst_scope:U16, dst_txn_id:U32
Handoff sequence for a cross-chiplet read:
- Chiplet 0 issues a read →
transactions[42]inchiplet0_noc - The read reaches the D2D egress port →
DA_SLOT_CLEARonchiplet0_noc.transactions[42] - D2D link picks it up →
transactions[7]ind2d_link - Root scope emits
txn_handoff(src_scope=2, src_txn_id=42, dst_scope=5, dst_txn_id=7) - D2D link delivers to chiplet 1 →
DA_SLOT_CLEARond2d_link.transactions[7] - Chiplet 1 ingests the read →
transactions[19]inchiplet1_noc - Root scope emits
txn_handoff(src_scope=5, src_txn_id=7, dst_scope=4, dst_txn_id=19) - The viewer chains:
chiplet0_noc:42 → d2d_link:7 → chiplet1_noc:19and computes end-to-end latency
11. Version History
| Version | Date | Changes |
|---|---|---|
| 0.1 | 2026-xx-xx | Initial draft |
Rust Crate API Reference
Crate: uscope
Location: crates/uscope/
1. Overview
The uscope Rust crate provides a complete reader and writer for the µScope
trace format. It implements the transport layer (file header, preamble, schema,
segments, checkpoints, deltas, string table, section table) and the CPU
protocol layer (entity catalog, pipeline stages, typed events).
Dependencies
| Crate | Purpose |
|---|---|
byteorder | Little-endian integer read/write |
lz4_flex | Pure-Rust LZ4 compression |
No other runtime dependencies.
2. Schema Building
Use SchemaBuilder and DutDescBuilder to define the trace structure before
writing.
2.1 SchemaBuilder
#![allow(unused)] fn main() { use uscope::schema::{SchemaBuilder, FieldSpec}; use uscope::types::SF_SPARSE; let mut sb = SchemaBuilder::new(); // Clock domain: 5 GHz (200 ps period) let clk = sb.clock_domain("core_clk", 200); // Scope hierarchy sb.scope("root", None, None, None); let scope = sb.scope("core0", Some(0), Some("cpu"), Some(clk)); // Enum type let stage_enum = sb.enum_type( "pipeline_stage", &["fetch", "decode", "execute", "writeback"], ); // Storage (entity catalog) let entities = sb.storage( "entities", scope, 512, SF_SPARSE, &[ ("entity_id", FieldSpec::U32), ("pc", FieldSpec::U64), ("inst_bits", FieldSpec::U32), ], ); // Event type let stage_ev = sb.event( "stage_transition", scope, &[ ("entity_id", FieldSpec::U32), ("stage", FieldSpec::Enum(stage_enum)), ], ); let schema = sb.build(); }
Methods:
| Method | Returns | Description |
|---|---|---|
clock_domain(name, period_ps) | u8 | Add a clock domain |
scope(name, parent, protocol, clock_id) | u16 | Add a scope |
enum_type(name, values) | u8 | Add an enum type |
storage(name, scope, slots, flags, fields) | u16 | Add a storage definition |
event(name, scope, fields) | u16 | Add an event type |
summary_field(name, type, scope) | — | Add a summary field |
strings_mut() | &mut StringPoolBuilder | Access the string pool |
build() | Schema | Consume builder, produce schema |
2.2 DutDescBuilder
#![allow(unused)] fn main() { use uscope::schema::DutDescBuilder; let mut dut = DutDescBuilder::new(); dut.property("dut_name", "boom_core_0") .property("cpu.isa", "RV64GC") .property("cpu.pipeline_stages", "fetch,decode,execute,writeback"); // Build using the schema's shared string pool let dut_desc = dut.build(sb.strings_mut()); }
2.3 FieldSpec
| Variant | Wire type | Size |
|---|---|---|
FieldSpec::U8 | FT_U8 | 1 |
FieldSpec::U16 | FT_U16 | 2 |
FieldSpec::U32 | FT_U32 | 4 |
FieldSpec::U64 | FT_U64 | 8 |
FieldSpec::I8 | FT_I8 | 1 |
FieldSpec::I16 | FT_I16 | 2 |
FieldSpec::I32 | FT_I32 | 4 |
FieldSpec::I64 | FT_I64 | 8 |
FieldSpec::Bool | FT_BOOL | 1 |
FieldSpec::StringRef | FT_STRING_REF | 4 |
FieldSpec::Enum(id) | FT_ENUM | 1 |
3. Writer
Writer<W> writes µScope trace files in streaming, append-only fashion.
3.1 Creating a Writer
#![allow(unused)] fn main() { use uscope::writer::Writer; use std::fs::File; let file = File::create("trace.uscope")?; let mut w = Writer::create(file, &dut_desc, &schema, checkpoint_interval_ps)?; }
The checkpoint_interval_ps parameter controls how often a full checkpoint is
written. Smaller intervals allow faster random-access seeks at the cost of
larger files.
3.2 Writing Cycles
All storage mutations and events must occur within a begin_cycle /
end_cycle pair. Time must be monotonically non-decreasing.
#![allow(unused)] fn main() { w.begin_cycle(time_ps); // Mutate storage slots w.slot_set(storage_id, slot, field, value); w.slot_add(storage_id, slot, field, delta); w.slot_clear(storage_id, slot); // Emit events (payload is pre-serialized, fields concatenated LE) w.event(event_type_id, &payload_bytes); w.end_cycle()?; }
| Method | Description |
|---|---|
begin_cycle(time_ps) | Start a cycle frame at the given time |
slot_set(storage, slot, field, value) | Set a field value (marks slot valid) |
slot_clear(storage, slot) | Mark slot invalid (sparse only) |
slot_add(storage, slot, field, delta) | Add to a field value |
event(type_id, payload) | Emit an event with raw payload |
end_cycle() | Finish the cycle frame |
3.3 String Table
For STRING_REF fields, insert strings into the writer's string table:
#![allow(unused)] fn main() { let text_idx = w.string_table.insert("addi x0, x0, 0"); // Use text_idx as the u32 value for a STRING_REF field in event payloads }
3.4 Finalization
#![allow(unused)] fn main() { let file = w.close()?; // Writes string table, segment table, section table }
Calling close() sets F_COMPLETE, writes the section table, and returns the
underlying writer. The file is then readable by Reader.
4. Reader
Reader opens µScope trace files for random-access reading.
4.1 Opening a File
#![allow(unused)] fn main() { use uscope::reader::Reader; let mut r = Reader::open("trace.uscope")?; }
Handles both finalized (F_COMPLETE) and in-progress files. For finalized
files, the section table is used for fast segment lookup. For in-progress
files, the segment chain is walked from tail_offset.
4.2 Metadata Access
#![allow(unused)] fn main() { let header = r.header(); // FileHeader let schema = r.schema(); // Schema (clock domains, scopes, storages, events) let dut = r.dut_desc(); // DutDesc (key-value properties) let config = r.trace_config(); // TraceConfig (checkpoint_interval_ps) let offsets = r.field_offsets(); // Precomputed field offsets per storage // Look up a DUT property by key let isa = r.dut_property("cpu.isa"); // Some("RV64GC") // String table (for STRING_REF field values) if let Some(st) = r.string_table() { let text = st.get(0); // Some("addi x0, x0, 0") } }
4.3 State Reconstruction
Reconstruct the full storage state at any point in time. The reader finds the appropriate segment, loads its checkpoint, and replays deltas up to the target time.
#![allow(unused)] fn main() { let state = r.state_at(time_ps)?; // Query storage state let valid = state.slot_valid(storage_id, slot); let value = state.slot_field(storage_id, slot, field_index, &offsets[storage_id]); }
4.4 Event Queries
#![allow(unused)] fn main() { let events = r.events_in_range(t0_ps, t1_ps)?; for ev in &events { println!("t={} type={} payload={:?}", ev.time_ps, ev.event_type_id, ev.payload); } }
4.5 Segment-Level Access
#![allow(unused)] fn main() { let n = r.segment_count(); let (storages, events, ops) = r.segment_replay(seg_idx)?; }
segment_replay returns the checkpoint state after full delta replay, plus
all events and storage operations (TimedOp) in the segment.
4.6 Live Tailing
For traces being written concurrently:
#![allow(unused)] fn main() { loop { if r.poll_new_segments()? { // New segments available — re-query events or state } std::thread::sleep(std::time::Duration::from_millis(100)); } }
5. CPU Protocol Helpers
The protocols::cpu module provides higher-level APIs that implement the CPU
protocol conventions on top of the transport-layer primitives.
5.1 CpuSchemaBuilder
Constructs a complete CPU-protocol schema with all standard enums, storages, and events.
#![allow(unused)] fn main() { use uscope::protocols::cpu::CpuSchemaBuilder; use uscope::schema::FieldSpec; let (dut_builder, mut schema_builder, ids) = CpuSchemaBuilder::new("core0") .isa("RV64GC") .pipeline_stages(&["fetch", "decode", "rename", "dispatch", "issue", "execute", "complete", "retire"]) .fetch_width(4) .commit_width(4) .entity_slots(512) .buffer("rob", 256, &[("completed", FieldSpec::Bool)]) .buffer("iq_int", 48, &[]) .counter("committed_insns") .counter("bp_misses") .build(); let dut = dut_builder.build(schema_builder.strings_mut()); let schema = schema_builder.build(); }
Builder methods:
| Method | Description |
|---|---|
isa(name) | Set ISA (e.g. "RV64GC") |
pipeline_stages(names) | Define pipeline stage enum |
fetch_width(n) | Set fetch width DUT property |
commit_width(n) | Set commit width DUT property |
entity_slots(n) | Max in-flight entities (default: 512) |
elf_path(path) | Set ELF path for disassembly |
vendor(name) | Set vendor DUT property |
buffer(name, slots, fields) | Add a hardware buffer storage |
counter(name) | Add a counter (1-slot dense storage) |
stall_reasons(names) | Override default stall reason enum |
CpuIds — returned by build(), contains all assigned IDs:
| Field | Type | Description |
|---|---|---|
scope_id | u16 | CPU scope ID |
entities_storage_id | u16 | Entity catalog storage ID |
stage_transition_event_id | u16 | Stage transition event type |
annotate_event_id | u16 | Annotation event type |
dependency_event_id | u16 | Dependency event type |
flush_event_id | u16 | Flush event type |
stall_event_id | u16 | Stall event type |
field_entity_id | u16 | Field index: entity_id |
field_pc | u16 | Field index: pc |
field_inst_bits | u16 | Field index: inst_bits |
buffers | Vec<(String, u16)> | Buffer (name, storage_id) pairs |
counters | Vec<(String, u16, u16)> | Counter (name, storage_id, field) triples |
5.2 CpuWriter
Typed helpers that emit the correct transport-layer operations for CPU protocol semantics.
#![allow(unused)] fn main() { use uscope::protocols::cpu::CpuWriter; let cpu = CpuWriter::new(ids); w.begin_cycle(time_ps); // Fetch: allocate entity in catalog cpu.fetch(&mut w, entity_id, pc, inst_bits); // Stage transition cpu.stage_transition(&mut w, entity_id, stage_index); // Retire: clear entity from catalog cpu.retire(&mut w, entity_id); // Flush: emit flush event + clear entity cpu.flush(&mut w, entity_id, reason); // Annotation: insert text into string table + emit event cpu.annotate(&mut w, entity_id, "decoded: addi x1, x0, 1"); // Dependency: record data/structural dependency cpu.dependency(&mut w, src_entity, dst_entity, dep_type); // Stall cpu.stall(&mut w, reason); // Counter increment cpu.counter_add(&mut w, "committed_insns", 1); w.end_cycle()?; }
| Method | Transport ops | Description |
|---|---|---|
fetch(w, id, pc, bits) | 3 × slot_set | Allocate entity |
stage_transition(w, id, stage) | 1 × event | Pipeline stage change |
retire(w, id) | 1 × slot_clear | Normal retirement |
flush(w, id, reason) | 1 × event + 1 × slot_clear | Squash |
annotate(w, id, text) | 1 × string_insert + 1 × event | Text annotation |
dependency(w, src, dst, type) | 1 × event | Data dependency |
stall(w, reason) | 1 × event | Pipeline stall |
counter_add(w, name, delta) | 1 × slot_add | Increment counter |
6. Example: Full Write-Read Cycle
#![allow(unused)] fn main() { use uscope::protocols::cpu::{CpuSchemaBuilder, CpuWriter}; use uscope::writer::Writer; use uscope::reader::Reader; use std::fs::File; // Build schema let (dut_builder, mut sb, ids) = CpuSchemaBuilder::new("core0") .isa("RV64GC") .pipeline_stages(&["fetch", "decode", "execute", "writeback"]) .entity_slots(16) .build(); let dut = dut_builder.build(sb.strings_mut()); let schema = sb.build(); // Write let file = File::create("trace.uscope").unwrap(); let mut w = Writer::create(file, &dut, &schema, 10_000).unwrap(); let cpu = CpuWriter::new(ids.clone()); w.begin_cycle(0); cpu.fetch(&mut w, 0, 0x8000_0000, 0x13); cpu.stage_transition(&mut w, 0, 0); w.end_cycle().unwrap(); w.begin_cycle(1000); cpu.stage_transition(&mut w, 0, 1); w.end_cycle().unwrap(); w.begin_cycle(2000); cpu.stage_transition(&mut w, 0, 2); w.end_cycle().unwrap(); w.begin_cycle(3000); cpu.stage_transition(&mut w, 0, 3); cpu.retire(&mut w, 0); w.end_cycle().unwrap(); w.close().unwrap(); // Read let mut r = Reader::open("trace.uscope").unwrap(); assert_eq!(r.header().total_time_ps, 3000); let state = r.state_at(1500).unwrap(); assert!(state.slot_valid(ids.entities_storage_id, 0)); // still in-flight let state = r.state_at(3000).unwrap(); assert!(!state.slot_valid(ids.entities_storage_id, 0)); // retired let events = r.events_in_range(0, 3000).unwrap(); assert_eq!(events.len(), 4); // 4 stage transitions }
uscope-cpu: CPU Protocol Library
Crate: uscope-cpu
Location: crates/uscope-cpu/
Overview
The uscope-cpu crate provides the CPU protocol interpretation layer on top of the uscope transport crate. It understands instruction lifecycles, pipeline stages, performance counters, and hardware buffers — concepts that the transport layer treats as opaque storages and events.
Architecture
uscope-cpu (this crate) uscope (transport)
┌──────────────────────┐ ┌─────────────────┐
│ CpuTrace │────────▶│ Reader │
│ - instructions │ │ - state_at() │
│ - stages │ │ - segment_replay│
│ - counters │ │ - schema() │
│ - buffers │ └─────────────────┘
│ - lazy loading │
│ - performance stats │
└──────────────────────┘
Dependencies
| Crate | Purpose |
|---|---|
uscope | Transport layer (Reader, Schema, state reconstruction) |
instruction-decoder | RISC-V ISA decode (optional, behind decode feature) |
CpuTrace
The main entry point. Opens a trace file, resolves the CPU protocol schema, and provides query methods.
Opening a trace
#![allow(unused)] fn main() { use uscope_cpu::CpuTrace; let mut trace = CpuTrace::open("trace.uscope")?; // File overview let info = trace.file_info(); println!("Version: {}.{}", info.version_major, info.version_minor); println!("Segments: {}", info.num_segments); println!("Max cycle: {}", trace.max_cycle()); println!("Period: {} ps", trace.period_ps()); // Schema access for (name, _) in trace.counter_names() { println!("Counter: {}", name); } for buf in trace.buffer_infos() { println!("Buffer: {} ({} slots)", buf.name, buf.capacity); } }
Counter queries
#![allow(unused)] fn main() { // Cumulative value at a cycle let val = trace.counter_value_at(0, 100); // Rate over a window (instructions per cycle) let ipc = trace.counter_rate_at(0, 100, 64); // Single-cycle delta let delta = trace.counter_delta_at(0, 100); // Downsample for sparkline rendering (min/max envelope) let data = trace.counter_downsample(0, 0, 1000, 200); for (min_rate, max_rate) in &data { // render bar from min to max } }
Buffer state
#![allow(unused)] fn main() { let state = trace.buffer_state_at(0, 50)?; println!("Capacity: {}", state.capacity); // Occupied slots for slot in &state.slots { println!("Slot 0x{:02x}: entity_id={}", slot.0, slot.1[0]); } // Storage-level properties (pointer pairs) for prop in &state.properties { println!("{}: {} (role={}, pair_id={})", prop.name, prop.value, prop.role, prop.pair_id); } }
Lazy segment loading
#![allow(unused)] fn main() { // Load specific segments (instruction/stage data) let result = trace.load_segments(&[0, 1, 2])?; println!("Loaded {} instructions", result.instructions.len()); // Or load segments covering a cycle range let loaded = trace.ensure_loaded(100, 200); }
Metadata
#![allow(unused)] fn main() { for (key, value) in trace.metadata() { println!("{}: {}", key, value); } }
Types
InstructionData
#![allow(unused)] fn main() { pub struct InstructionData { pub id: u32, // Entity ID pub sim_id: u64, // Simulator-assigned ID pub thread_id: u16, pub rbid: Option<u32>, // Retire buffer slot pub iq_id: Option<u32>, // Issue queue ID pub dq_id: Option<u32>, // Dispatch queue ID pub ready_cycle: Option<u32>, pub pc: u64, pub disasm: String, pub tooltip: String, pub stage_range: Range<u32>, // Index range into stages vec pub retire_status: RetireStatus, pub first_cycle: u32, pub last_cycle: u32, } }
StageSpan
#![allow(unused)] fn main() { pub struct StageSpan { pub stage_name_idx: u16, // Index into stage name table pub lane: u16, pub start_cycle: u32, pub end_cycle: u32, } }
BufferInfo
#![allow(unused)] fn main() { pub struct BufferInfo { pub name: String, pub storage_id: u16, pub capacity: u16, pub fields: Vec<(String, u8)>, pub properties: Vec<BufferPropertyDef>, } pub struct BufferPropertyDef { pub name: String, pub field_type: u8, pub role: u8, // 0=plain, 1=HEAD_PTR, 2=TAIL_PTR pub pair_id: u8, // Groups head/tail into pairs } }
CounterSeries
#![allow(unused)] fn main() { pub struct CounterSeries { pub name: String, pub samples: Vec<(u32, u64)>, // (cycle, cumulative_value) pub default_mode: CounterDisplayMode, } }
SegmentIndex
#![allow(unused)] fn main() { pub struct SegmentIndex { pub segments: Vec<(u32, u32)>, // (start_cycle, end_cycle) } impl SegmentIndex { pub fn segments_in_range(&self, start: u32, end: u32) -> Vec<usize>; } }
Feature Flags
| Feature | Default | Description |
|---|---|---|
decode | yes | RISC-V instruction decode via instruction-decoder |
C DPI Library API Reference
Header: uscope_dpi.h
Location: dpi/
1. Overview
The C DPI library is a standalone, write-only µScope trace library designed for integration with hardware simulators via DPI (Direct Programming Interface). It produces trace files that are binary-compatible with the Rust reader.
Design Principles
- Single .c + .h (plus vendored LZ4) — easy to integrate
- C99 — compiles with any standard C compiler
- No dynamic allocation during per-cycle operations — pre-allocated buffers
- Write-only — no reader (use the Rust crate for reading)
- Zero Rust dependency — fully self-contained
Building
make -C dpi # builds libuscope_dpi.a
make -C dpi test # builds and runs the test program
Link with -luscope_dpi (or include uscope_dpi.c and lz4.c directly).
2. Schema Building
Before opening a writer, define the trace schema.
2.1 Create / Free
uscope_schema_def_t *schema = uscope_schema_new();
// ... add clocks, scopes, enums, storages, events ...
// Schema is consumed by uscope_writer_open() — do not free after open.
// If not opening a writer, free with:
uscope_schema_free(schema);
2.2 Clock Domains
uint8_t clk = uscope_schema_add_clock(schema, "core_clk", 1000); // 1 GHz
| Parameter | Type | Description |
|---|---|---|
name | const char * | Clock name |
period_ps | uint32_t | Period in picoseconds |
| Returns | uint8_t | Clock domain ID |
2.3 Scopes
uscope_schema_add_scope(schema, "root", 0xFFFF, NULL, 0xFF);
uint16_t scope = uscope_schema_add_scope(schema, "core0", 0, "cpu", clk);
| Parameter | Type | Description |
|---|---|---|
name | const char * | Scope name |
parent | uint16_t | Parent scope ID (0xFFFF = root) |
protocol | const char * | Protocol name (NULL = none) |
clock_id | uint8_t | Clock domain (0xFF = inherit) |
| Returns | uint16_t | Scope ID |
2.4 Enums
const char *stages[] = {"fetch", "decode", "execute", "writeback"};
uint8_t stage_enum = uscope_schema_add_enum(schema, "pipeline_stage", stages, 4);
2.5 Storages
Fields are passed as parallel arrays of names, types, and enum IDs.
const char *fields[] = {"entity_id", "pc", "inst_bits"};
uint8_t types[] = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
uint8_t enum_ids[] = {0, 0, 0};
uint16_t entities = uscope_schema_add_storage(
schema, "entities", scope, /*num_slots=*/512, USCOPE_SF_SPARSE,
/*num_fields=*/3, fields, types, enum_ids);
| Parameter | Type | Description |
|---|---|---|
name | const char * | Storage name |
scope_id | uint16_t | Owning scope |
num_slots | uint16_t | Number of slots |
flags | uint16_t | USCOPE_SF_SPARSE or 0 (dense) |
num_fields | uint16_t | Number of fields |
field_names | const char ** | Field name array |
field_types | const uint8_t * | Field type array |
field_enum_ids | const uint8_t * | Enum ID array (or NULL) |
| Returns | uint16_t | Storage ID |
2.6 Events
const char *st_fields[] = {"entity_id", "stage"};
uint8_t st_types[] = {USCOPE_FT_U32, USCOPE_FT_ENUM};
uint8_t st_enums[] = {0, stage_enum};
uint16_t st_event = uscope_schema_add_event(
schema, "stage_transition", scope,
/*num_fields=*/2, st_fields, st_types, st_enums);
3. Field Type Constants
| Constant | Value | Size | Description |
|---|---|---|---|
USCOPE_FT_U8 | 0x01 | 1 | Unsigned 8-bit |
USCOPE_FT_U16 | 0x02 | 2 | Unsigned 16-bit |
USCOPE_FT_U32 | 0x03 | 4 | Unsigned 32-bit |
USCOPE_FT_U64 | 0x04 | 8 | Unsigned 64-bit |
USCOPE_FT_I8 | 0x05 | 1 | Signed 8-bit |
USCOPE_FT_I16 | 0x06 | 2 | Signed 16-bit |
USCOPE_FT_I32 | 0x07 | 4 | Signed 32-bit |
USCOPE_FT_I64 | 0x08 | 8 | Signed 64-bit |
USCOPE_FT_BOOL | 0x09 | 1 | Boolean |
USCOPE_FT_STRING_REF | 0x0A | 4 | String table index |
USCOPE_FT_ENUM | 0x0B | 1 | Enum value |
4. Writer
4.1 Open / Close
uscope_dut_property_t props[] = {
{"dut_name", "boom_core_0"},
{"cpu.isa", "RV64GC"},
};
uscope_writer_t *w = uscope_writer_open(
"trace.uscope",
props, /*num_props=*/2,
schema, // consumed — do not free
/*checkpoint_interval_ps=*/1000000);
// ... write cycles ...
uscope_writer_close(w); // finalizes and frees
uscope_writer_open takes ownership of the schema. Do not call
uscope_schema_free after opening.
uscope_writer_close writes the string table, segment table, section table,
sets F_COMPLETE, and frees all resources.
4.2 Per-Cycle Operations
All mutations must occur within a begin_cycle / end_cycle pair. Time must
be monotonically non-decreasing.
uscope_begin_cycle(w, time_ps);
uscope_slot_set(w, storage_id, slot, field, value);
uscope_slot_clear(w, storage_id, slot);
uscope_slot_add(w, storage_id, slot, field, delta);
uscope_event(w, event_type_id, payload, payload_size);
uscope_end_cycle(w);
| Function | Description |
|---|---|
uscope_begin_cycle(w, time_ps) | Start a cycle at the given time |
uscope_slot_set(w, stor, slot, field, val) | Set field value (marks slot valid) |
uscope_slot_clear(w, stor, slot) | Mark slot invalid |
uscope_slot_add(w, stor, slot, field, val) | Add to field value |
uscope_event(w, type_id, payload, size) | Emit event with raw payload |
uscope_end_cycle(w) | End cycle, flush segment if needed |
4.3 Event Payloads
Event payloads are the field values concatenated in schema-definition order, little-endian, with no padding. Build them manually:
// stage_transition: entity_id (U32) + stage (ENUM/U8)
uint8_t payload[5];
uint32_t entity_id = 42;
memcpy(payload, &entity_id, 4); // little-endian on LE platforms
payload[4] = 2; // stage index
uscope_event(w, st_event, payload, 5);
4.4 String Table
For STRING_REF fields in event payloads:
uint32_t idx = uscope_string_insert(w, "addi x0, x0, 0");
// Use idx as the 4-byte value in a STRING_REF field
5. Limits
| Resource | Maximum |
|---|---|
| String pool (schema) | 64 KB |
| Clock domains | 16 |
| Scopes | 256 |
| Enum types | 64 |
| Enum values per type | 256 |
| Storages | 256 |
| Event types | 256 |
| Fields per storage/event | 32 |
| Ops per cycle | 4096 |
| Events per cycle | 1024 |
| Event payload size | 256 bytes |
| Segments | 65536 |
| Delta buffer | 4 MB (auto-grows) |
6. Example: CPU Pipeline Trace
#include "uscope_dpi.h"
#include <string.h>
int main(void) {
// Schema
uscope_schema_def_t *s = uscope_schema_new();
uint8_t clk = uscope_schema_add_clock(s, "clk", 1000);
uscope_schema_add_scope(s, "root", 0xFFFF, NULL, 0xFF);
uint16_t scope = uscope_schema_add_scope(s, "core0", 0, "cpu", clk);
const char *stages[] = {"fetch", "decode", "execute", "writeback"};
uint8_t se = uscope_schema_add_enum(s, "pipeline_stage", stages, 4);
const char *ef[] = {"entity_id", "pc", "inst_bits"};
uint8_t et[] = {USCOPE_FT_U32, USCOPE_FT_U64, USCOPE_FT_U32};
uint16_t ent = uscope_schema_add_storage(s, "entities", scope,
256, USCOPE_SF_SPARSE,
3, ef, et, NULL);
const char *sf[] = {"entity_id", "stage"};
uint8_t st[] = {USCOPE_FT_U32, USCOPE_FT_ENUM};
uint8_t sen[] = {0, se};
uint16_t sev = uscope_schema_add_event(s, "stage_transition", scope,
2, sf, st, sen);
// DUT properties
uscope_dut_property_t props[] = {
{"dut_name", "core0"},
{"cpu.isa", "RV64GC"},
{"cpu.pipeline_stages", "fetch,decode,execute,writeback"},
};
// Open
uscope_writer_t *w = uscope_writer_open("trace.uscope",
props, 3, s, 100000);
// Fetch instruction 0
uscope_begin_cycle(w, 0);
uscope_slot_set(w, ent, 0, 0, 0); // entity_id
uscope_slot_set(w, ent, 0, 1, 0x80000000); // pc
uscope_slot_set(w, ent, 0, 2, 0x13); // inst_bits
uint8_t payload[5];
uint32_t eid = 0;
memcpy(payload, &eid, 4);
payload[4] = 0; // fetch stage
uscope_event(w, sev, payload, 5);
uscope_end_cycle(w);
// Decode
uscope_begin_cycle(w, 1000);
payload[4] = 1;
uscope_event(w, sev, payload, 5);
uscope_end_cycle(w);
// Execute
uscope_begin_cycle(w, 2000);
payload[4] = 2;
uscope_event(w, sev, payload, 5);
uscope_end_cycle(w);
// Writeback + retire
uscope_begin_cycle(w, 3000);
payload[4] = 3;
uscope_event(w, sev, payload, 5);
uscope_slot_clear(w, ent, 0);
uscope_end_cycle(w);
uscope_writer_close(w);
return 0;
}
7. Integration with Simulators
SystemVerilog DPI
import "DPI-C" function chandle uscope_writer_open(
input string path,
/* ... */
);
import "DPI-C" function void uscope_begin_cycle(
input chandle w, input longint unsigned time_ps
);
// ... etc
Verilator
Include uscope_dpi.c and lz4.c in the Verilator build:
verilator --cc top.sv --exe sim_main.cpp uscope_dpi.c lz4.c
Call the C API from sim_main.cpp or from DPI-exported functions in the
SystemVerilog testbench.
uscope-cli: Command-Line Trace Inspector
Binary: uscope-cli
Location: crates/uscope-cli/
Overview
uscope-cli is a standalone command-line tool for inspecting µScope CPU pipeline traces. It provides quick access to trace metadata, buffer state, instruction timelines, and counter data without needing the Reflex GUI.
All commands support --json for structured JSON output, making it suitable for scripting and CI pipelines.
Installation
cargo install --path crates/uscope-cli
# or run directly:
cargo run --bin uscope-cli -- <command> <file>
Commands
info — File overview
uscope-cli info trace.uscope
Prints: file header (version, flags, segments, duration), metadata (DUT properties), pipeline stage names, counter names, buffer names, and full schema dump (storages, events, enums).
# JSON output for scripting
uscope-cli info trace.uscope --json | jq '.counters'
state — Buffer state at a cycle
uscope-cli state trace.uscope --cycle 50
Shows the state of all buffers at the given cycle: occupied slots with field values, entity fields (rbid, fpb_id, etc.), and storage properties (pointer positions).
# Check ROB state at cycle 100
uscope-cli state trace.uscope --cycle 100 --json | jq '.buffers[] | select(.name == "rob")'
timeline — Instruction lifecycle
uscope-cli timeline trace.uscope --entity 42
Shows the complete lifecycle of instruction entity 42: fetch cycle, all stage transitions with durations, annotations, and retire/flush status.
# Find when entity 42 was in the execute stage
uscope-cli timeline trace.uscope --entity 42 --json | jq '.stages[] | select(.name == "Ex")'
counters — Counter values
# Show final counter values
uscope-cli counters trace.uscope
# Per-cycle values over a range
uscope-cli counters trace.uscope --range 100:200
# Filter by counter name
uscope-cli counters trace.uscope --counter retired_insns --range 0:50
buffers — Buffer occupancy
uscope-cli buffers trace.uscope --cycle 50
Like state but focused on buffer fill level, pointer pair positions, and occupancy percentage. Filter by buffer name with --buffer.
uscope-cli buffers trace.uscope --cycle 50 --buffer rob
Output Formats
| Flag | Format | Use case |
|---|---|---|
| (default) | Human-readable aligned table | Interactive inspection |
--json | Pretty-printed JSON | Scripting, piping to jq, CI |
Examples
# Quick sanity check: does the trace have data?
uscope-cli info trace.uscope
# Debugging: what's in the ROB at cycle 50?
uscope-cli state trace.uscope --cycle 50
# Performance: what's the IPC?
uscope-cli counters trace.uscope --counter retired_insns
# Entity debugging: what happened to instruction 17?
uscope-cli timeline trace.uscope --entity 17
# Scripting: extract all counter names
uscope-cli info trace.uscope --json | jq -r '.counters[]'
uscope-mcp: MCP Server for AI-Assisted Debugging
Binary: uscope-mcp
Location: crates/uscope-mcp/
Overview
uscope-mcp is a Model Context Protocol (MCP) server that lets Claude inspect µScope CPU pipeline traces. It exposes the uscope-cpu query API as MCP tools, enabling natural-language performance debugging.
Quick Start
1. Start the server
cargo run --bin uscope-mcp -- --trace /path/to/trace.uscope
2. Configure Claude Code
Add to .claude/settings.json:
{
"mcpServers": {
"uscope": {
"command": "cargo",
"args": ["run", "--release", "--bin", "uscope-mcp", "--",
"--trace", "/path/to/trace.uscope"],
"cwd": "/path/to/uscope/repo"
}
}
}
Or with a pre-built binary:
{
"mcpServers": {
"uscope": {
"command": "/path/to/uscope-mcp",
"args": ["--trace", "/path/to/trace.uscope"]
}
}
}
3. Ask Claude
"What's the IPC between cycles 100 and 500?"
"Show me the pipeline stages for entity 42"
"Why is the ROB full at cycle 200?"
"What caused the pipeline stall at cycle 350?"
MCP Tools
file_info
Returns trace header, schema, segments, counters, buffers, and metadata.
Parameters: none
state_at_cycle
Returns buffer contents at a specific cycle — slot values, entity fields, and storage properties.
Parameters:
cycle(number, required): cycle number to query
entity_timeline
Returns the complete lifecycle of an instruction: stages with durations, disasm, annotations, retire/flush status.
Parameters:
entity_id(number, required): entity ID to trace
counter_values
Returns counter data over a cycle range with per-cycle values, deltas, and rates.
Parameters:
counter(string, required): counter name (e.g.,"retired_insns")start_cycle(number, required): range startend_cycle(number, required): range end
buffer_occupancy
Returns buffer fill level at a cycle — occupied slots, pointer pair positions, fill percentage.
Parameters:
buffer(string, required): buffer name (e.g.,"rob")cycle(number, required): cycle to query
analyze_performance
Returns a structured performance summary over a cycle range:
- Instruction counts (total, retired, flushed, in-flight)
- IPC (instructions per cycle)
- Flush rate
- Per-counter totals and rates
- Buffer occupancy snapshots at start/mid/end
- Per-stage average latency, sorted by bottleneck
Parameters:
start_cycle(number, required): range startend_cycle(number, required): range end
Protocol
The server implements the Model Context Protocol over stdio using JSON-RPC 2.0. It handles:
initialize— server capabilities and infonotifications/initialized— acknowledged silentlytools/list— returns tool definitions with JSON Schematools/call— dispatches to tool handlers
All tool responses are structured JSON, formatted for AI reasoning. Errors are returned as MCP tool errors (not JSON-RPC errors) so Claude can see error messages.
Logging goes to stderr (stdout is the MCP channel).
konata2uscope
Binary: konata2uscope
Location: crates/konata2uscope/
1. Overview
konata2uscope converts Konata
(Kanata v0004) pipeline trace logs into µScope CPU protocol traces. This
enables viewing Konata-format traces in µScope-compatible viewers with
random-access seeking, mipmap summaries, and structured schema metadata.
2. Usage
konata2uscope <input.log[.gz]> -o <output.uscope> [options]
| Option | Default | Description |
|---|---|---|
-o <path> | output.uscope | Output file path |
--clock-period-ps <ps> | 1000 | Clock period in picoseconds (1000 = 1 GHz) |
--dut-name <name> | core0 | DUT name for the trace |
Gzip-compressed input (.log.gz) is detected automatically.
3. Two-Pass Architecture
Pass 1: Scan
Reads the entire Konata log to discover metadata:
- All unique pipeline stage names (in first-occurrence order)
- Maximum number of simultaneously in-flight instructions
- Thread IDs
- Total cycle count
This information is needed to construct the µScope schema before writing any trace data.
Pass 2: Emit
Re-reads the log and emits µScope data using the CPU protocol writer:
- Entity allocation on instruction creation (
I) - Stage transitions on stage start (
S, lane 0) - Annotations on labels (
L) - Retirement on retire commands (
R, type 0) - Flushes on flush commands (
R, type 1) - Dependencies on dependency arrows (
W)
4. Konata Format Mapping
4.1 Commands
| Konata | Description | µScope mapping |
|---|---|---|
C=\t<cycle> | Set absolute cycle | Time base |
C\t<delta> | Advance by delta cycles | Time base |
I\t<id>\t<gid>\t<tid> | Create instruction | DA_SLOT_SET on entities |
L\t<id>\t0\t<text> | Disassembly label | annotate event; PC extraction |
L\t<id>\t1\t<text> | Detail label | annotate event |
S\t<id>\t0\t<stage> | Start stage (lane 0) | stage_transition event |
S\t<id>\t1+\t<stage> | Start stall overlay | annotate event |
E\t<id>\t<lane>\t<stage> | End stage | (implicit in µScope) |
R\t<id>\t<rid>\t0 | Retire | DA_SLOT_CLEAR + counter |
R\t<id>\t<rid>\t1 | Flush | flush event + DA_SLOT_CLEAR |
W\t<cons>\t<prod>\t<type> | Dependency | dependency event |
4.2 PC Extraction
If a disassembly label (L type 0) starts with a hex address, it is extracted
as the instruction PC. Supported formats:
80000000 addi x0, x0, 0→ PC =0x800000000x80000000 addi x0, x0, 0→ PC =0x8000000000001000: jal zero, 0x10→ PC =0x00001000
If no hex address is found, PC defaults to 0.
4.3 Stage Names
Konata stage names are arbitrary strings. Pass 1 collects them in pipeline
order (first occurrence). They become the pipeline_stage enum values in the
µScope schema and the cpu.pipeline_stages DUT property.
4.4 Time Model
Konata cycles are converted to picoseconds: time_ps = cycle * clock_period_ps.
The default clock period of 1000 ps corresponds to 1 GHz.
4.5 Lane Handling
Only lane 0 stage starts map to stage_transition events. Lane 1+ (stall
overlays in Konata) are emitted as annotate events with the text
stall:<stage_name>.
5. Example
Input: trace.log
Kanata 0004
C= 0
I 0 0 0
L 0 0 80000000 addi x0, x0, 0
S 0 0 Fetch
C 1
E 0 0 Fetch
S 0 0 Decode
C 1
E 0 0 Decode
S 0 0 Execute
C 1
E 0 0 Execute
S 0 0 Writeback
R 0 0 0
Conversion
$ konata2uscope trace.log -o trace.uscope --clock-period-ps 200
Pass 1: scanning trace.log...
4 stages: [Fetch, Decode, Execute, Writeback]
max in-flight: 1
threads: 1
total cycles: 3
Pass 2: emitting trace.uscope...
Done.
Resulting Schema
- Clock:
core_clk@ 200 ps (5 GHz) - Enum:
pipeline_stage= {Fetch, Decode, Execute, Writeback} - Storage:
entities(1 slot, sparse) - Events:
stage_transition,annotate,dependency,flush,stall - DUT:
cpu.pipeline_stages = "Fetch,Decode,Execute,Writeback"
The output file is a standard µScope trace readable by the Rust Reader.