Skip to content

Conversation

@ananas-block
Copy link

@ananas-block ananas-block commented Dec 4, 2025

Overview

  • Summary of changes

Testing

  • Testing performed to validate the changes

Summary by CodeRabbit

  • New Features

    • Added getQueueInfo endpoint to list per-tree queue sizes.
    • getQueueElements now supports per-queue pagination (output/input/address) and richer per-queue/state/address payloads.
    • Added resumable Google Cloud Storage upload and access-token retrieval.
  • Breaking Changes

    • Removed getBatchAddressUpdateInfo endpoint.
    • getQueueElements request/response shapes changed (per-queue controls and new response structure).
  • Chores

    • Version bumped; .gitignore updated to ignore proto/**/*.bin.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Walkthrough

Replaces get_batch_address_update_info with get_queue_info; refactors get_queue_elements for per-queue pagination, merged proofs, and cache validation; splits indexed Merkle range-hash into v1/v2; generalizes queue hash cache over ConnectionTrait and defers caching until validation; adds GCS resumable upload and token logic; assorted logging, tests, and manifest updates.

Changes

Cohort / File(s) Summary
API surface & RPC
src/api/api.rs, src/api/method/mod.rs, src/api/rpc_server.rs, src/api/method/get_queue_info.rs
Removed get_batch_address_update_info; added get_queue_info endpoint, GetQueueInfoRequest/Response and QueueInfo types, module export, and RPC registration; updated API imports and impl.
Removed module
src/api/method/get_batch_address_update_info.rs
Deleted module and all its public types and function get_batch_address_update_info.
GetQueueElements refactor & OpenAPI
src/api/method/get_queue_elements.rs, src/openapi/mod.rs, src/openapi/specs/api.yaml
Reworked request/response to per-queue pagination (output/input/address); added State/Output/Input/Address queue types, Node and QueueRequest; implemented proof merging/dedup/hash-chain helpers; updated OpenAPI schemas.
New queue-info implementation
src/api/method/get_queue_info.rs
New module implementing get_queue_info, DB aggregation of per-tree queue sizes, and public request/response/QueueInfo models.
Queue elements helpers & tests
src/api/method/get_multiple_new_address_proofs.rs, tests/integration_tests/*, src/api/method/get_queue_elements.rs
Minor style change (or_default()); tests updated to use GetQueueElementsRequest with per-queue QueueRequest; adjusted assertions and public types.
Indexed Merkle tree hash versioning
src/ingester/persist/indexed_merkle_tree/helpers.rs, src/ingester/persist/indexed_merkle_tree/mod.rs, src/ingester/persist/persisted_indexed_merkle_tree.rs
Split compute_range_node_hash into v1/v2, added bigint_to_be_bytes_array, dispatch updates, and changed re-exports; callers updated to use v2 for zeroeth paths.
Proof querying & batching
src/ingester/persist/indexed_merkle_tree/proof.rs
Batched DB queries for next-smallest elements (UNION ALL), added by-address lookup helper, early empty-return optimization, and updated proof construction to use address map.
Persist changes & logging
src/ingester/persist/leaf_node.rs, src/ingester/persist/leaf_node_proof.rs, src/ingester/persist/persisted_state_tree.rs, src/ingester/persist/spend.rs, src/ingester/persist/mod.rs
Added debug logs around leaf persistence and proofs; get_subtrees returns prefix of EMPTY_SUBTREES when empty; batched spends now set Spent = true; improved cache-metadata error message.
Ingester/parser minor edits
src/ingester/parser/state_update.rs, src/ingester/parser/indexer_events.rs, src/ingester/parser/tx_event_parser_v2.rs, src/ingester/fetchers/grpc.rs
Import reorderings and formatting tweaks; include self.in_accounts in kept_account_hashes; grpc log formatting change.
Monitor: cache generics & visibility
src/monitor/mod.rs, src/monitor/queue_hash_cache.rs, src/monitor/queue_monitor.rs
Made queue_hash_cache public; generalized cache APIs over C: ConnectionTrait, added delete_hash_chains, and changed caching workflow to validate before storing and delete invalid cached chains.
Snapshot GCS resumable upload
src/snapshot/mod.rs, src/snapshot/gcs_utils/mod.rs, src/snapshot/gcs_utils/resumable_upload.rs
Added gcs_utils::resumable_upload module and integrated resumable upload flow into snapshot logic; added token retrieval and chunked resumable upload with retry/resume semantics.
Tests & DB utils
tests/integration_tests/utils.rs, tests/integration_tests/batched_*_tests.rs
PG pool session isolation set to READ COMMITTED; tests updated to new per-queue GetQueueElementsRequest and adjusted assertions.
Config / manifest / misc
.gitignore, Cargo.toml, src/main.rs
Added proto/**/*.bin to .gitignore; bumped package version and added jsonwebtoken = "9" dependency in Cargo.toml; minor doc formatting tweak in src/main.rs.
sequenceDiagram
  participant Client
  participant RPC as RPC_Server
  participant API as PhotonApi
  participant DB as Database
  participant Cache as HashChainCache
  participant ZKP as Hasher/ZKP

  Client->>RPC: send getQueueElements (per-queue request)
  RPC->>API: parse payload, call get_queue_elements
  API->>DB: begin txn, fetch tree metadata & queue elements
  API->>Cache: get_cached_hash_chains (per-batch)
  alt cache hit
    Cache-->>API: cached hash chains
  else cache miss / partial
    API->>ZKP: compute_state_queue_hash_chains
    ZKP-->>API: computed hash chains
  end
  API->>DB: fetch proofs, path nodes, subtrees
  DB-->>API: proofs, nodes, subtrees
  API->>API: merge_state_queue_proofs & deduplicate nodes
  API->>DB: validate on-chain vs computed chains
  opt validated chains
    API->>Cache: store validated hash chains
  end
  opt invalid cached indices
    API->>Cache: delete_hash_chains (invalid)
  end
  API->>DB: commit txn
  API-->>RPC: assemble response
  RPC-->>Client: return response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • src/api/method/get_queue_elements.rs — proof merging, deduplication, batching, public types.
    • src/ingester/persist/indexed_merkle_tree/helpers.rs & proof.rs — v1/v2 hashing correctness, bigint-to-bytes handling, batched/by-address queries.
    • src/monitor/queue_hash_cache.rs & src/monitor/queue_monitor.rs — generic ConnectionTrait migration and caching/validation workflow.
    • src/snapshot/gcs_utils/resumable_upload.rs — HTTP retry/resume semantics and token exchange correctness.
    • src/openapi/specs/api.yaml & src/openapi/mod.rs — ensure OpenAPI schemas match Rust types and nullable semantics.

Poem

🐰 I hopped through queues both big and small,
I stitched proofs, split hashes, and chased each call,
I learned to resume uploads across the stream,
I validated caches and kept the logs agleam,
A rabbit cheers: new paths make systems gleam! 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description uses only placeholder template text ('Summary of changes' and 'Testing performed to validate the changes') without providing actual details about the implementation, testing approach, or rationale for changes. Replace placeholder text with concrete details: describe the v2 queue elements API changes, explain multi-queue support (output/input/address), document testing performed, and clarify migration from v1 API.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'get queue elements v2 rpc for code rabbit review' directly references the main change: introducing a v2 RPC endpoint for queue elements retrieval, which is evident from extensive modifications to GetQueueElementsRequest/Response structures and related API implementations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sergey/get_queue_elements_v2_rpc

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 690dd5a and 480aa73.

📒 Files selected for processing (1)
  • src/api/method/get_queue_elements.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/api/method/get_queue_elements.rs (9)
src/common/mod.rs (1)
  • format_bytes (157-164)
src/ingester/persist/leaf_node_proof.rs (2)
  • get_multiple_compressed_leaf_proofs_by_indices (14-90)
  • indices (42-42)
src/ingester/persist/persisted_state_tree.rs (2)
  • get_subtrees (132-209)
  • bytes (571-574)
src/ingester/persist/leaf_node.rs (4)
  • leaf_index_to_node_index (30-32)
  • from (35-42)
  • from (46-53)
  • from (57-64)
src/snapshot/mod.rs (1)
  • new (364-385)
src/ingester/parser/tree_info.rs (2)
  • height (67-73)
  • get (18-26)
src/ingester/parser/indexer_events.rs (1)
  • tree_pubkey (34-39)
src/monitor/queue_hash_cache.rs (1)
  • get_cached_hash_chains (69-102)
src/snapshot/gcs_utils/resumable_upload.rs (1)
  • end (302-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (2)
src/api/method/get_queue_elements.rs (2)

477-488: LGTM: Proper error handling for hash validation.

The code now properly propagates errors from Hash::new() instead of using unwrap(), with descriptive error messages including the index and leaf_index for debugging. This addresses the previous review concern.


792-813: LGTM: Excellent optimization and caching strategy.

The code demonstrates good engineering practices:

  1. Lines 792-813: The optimization to skip redundant hash computations using processed_leaf_indices is excellent for sparse trees where many addresses share the same low element. This can significantly reduce computation in common cases.

  2. Lines 869-945: The hash chain caching strategy with fallback to fresh computation is well-implemented, with clear logging to help diagnose cache misses. The code properly sorts cached entries and validates the cache has sufficient data before using it.

These patterns show thoughtful consideration of performance and observability.

Also applies to: 869-945


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/ingester/persist/spend.rs (1)

78-98: Add token_accounts handling to match non-batched spend behavior.

The addition of Spent = true at line 94 correctly aligns the batched updates with spend_input_accounts. However, the batched function omits token_accounts entirely—unlike the non-batched version which updates both accounts (lines 21–42) and token_accounts (lines 44–66) with the Spent flag. This inconsistency means token accounts in batched trees won't be marked as spent.

Add token_accounts updates to spend_input_accounts_batched to mirror the non-batched implementation:

  • Iterate through accounts and update token_accounts::Entity with Spent = true (and PrevSpent)
  • Use the same execute_account_update_query_and_update_balances helper for consistency

Also update the function comment (line 70) from:

/// Update the nullifier queue index and nullifier of the input accounts in batched trees.

to:

/// Mark accounts as spent and update the nullifier queue index and nullifier of the input accounts in batched trees.
src/api/api.rs (1)

390-574: Missing getQueueInfo entry in OpenAPI specifications.

The method_api_specs() function includes getQueueElements but does not include getQueueInfo. This means the new endpoint won't appear in the generated OpenAPI documentation.

If this is intentional (e.g., the endpoint is internal), please disregard. Otherwise, add the spec entry:

OpenApiSpec {
    name: "getQueueInfo".to_string(),
    request: Some(GetQueueInfoRequest::schema().1),
    response: GetQueueInfoResponse::schema().1,
},
🧹 Nitpick comments (9)
src/ingester/parser/state_update.rs (1)

188-191: Confirm widened account_transactions retention and add a targeted test

Seeding kept_account_hashes with self.in_accounts means account_transactions are now kept even when they only reference input/spent accounts that lack any associated tree metadata. This aligns with the comment about input accounts “not having tree info but should be kept,” but it does subtly widen what survives the known-tree filter.

To lock this behavior in and avoid regressions, consider adding a small test case where:

  • StateUpdate::in_accounts contains a hash with no corresponding out_accounts or tree info,
  • account_transactions contains a transaction for that hash,
  • filter_by_known_trees is called with an empty/irrelevant tree set,

and assert that the transaction is still present in result.state_update.account_transactions.

src/ingester/parser/tx_event_parser_v2.rs (1)

18-40: Pubkey conversion helper and LightPubkey mapping look correct; only minor style tweaks possible

The to_light_pubkey helper cleanly centralizes conversion and its use when building light_program_ids / light_accounts is logically sound; no correctness or ownership issues stand out. If you want to shave a bit of boilerplate, you could lean on function pointers when mapping:

-    let light_program_ids: Vec<LightPubkey> =
-        program_ids.iter().map(|p| to_light_pubkey(p)).collect();
+    let light_program_ids: Vec<LightPubkey> =
+        program_ids.iter().map(to_light_pubkey).collect();

-    let light_accounts: Vec<Vec<LightPubkey>> = accounts
-        .into_iter()
-        .map(|acc_vec| {
-            acc_vec
-                .into_iter()
-                .map(|acc| to_light_pubkey(&acc))
-                .collect()
-        })
-        .collect();
+    let light_accounts: Vec<Vec<LightPubkey>> = accounts
+        .into_iter()
+        .map(|acc_vec| acc_vec.iter().map(to_light_pubkey).collect())
+        .collect();

Purely optional, current implementation is fine as-is.

src/ingester/persist/indexed_merkle_tree/helpers.rs (3)

93-93: Consider propagating error instead of unwrap on Poseidon initialization.

Poseidon::new_circom(2).unwrap() could theoretically panic if the Poseidon initialization fails. While this is unlikely for a hardcoded arity of 2, consider using ? with error conversion for defensive robustness, matching the error handling pattern used for subsequent operations.

-    let mut poseidon = Poseidon::<Fr>::new_circom(2).unwrap();
+    let mut poseidon = Poseidon::<Fr>::new_circom(2)
+        .map_err(|e| IngesterError::ParserError(format!("Failed to init Poseidon: {}", e)))?;

Note: The same pattern exists in compute_range_node_hash_v1 at line 74.


117-130: Redundant import inside function scope.

The import use light_hasher::bigint::bigint_to_be_bytes_array; on line 118 is redundant since it's already imported at the module level on line 8.

 pub fn get_zeroeth_exclusion_range_v1(tree: Vec<u8>) -> indexed_trees::Model {
-    use light_hasher::bigint::bigint_to_be_bytes_array;
-
     indexed_trees::Model {

132-145: Redundant import inside function scope.

The import use light_hasher::bigint::bigint_to_be_bytes_array; on line 133 is also redundant since it's already imported at the module level on line 8.

 pub fn get_top_element(tree: Vec<u8>) -> indexed_trees::Model {
-    use light_hasher::bigint::bigint_to_be_bytes_array;
-
     indexed_trees::Model {
src/api/method/get_queue_info.rs (1)

142-145: Silent fallback for missing queue mapping may hide data inconsistency.

When tree_to_queue.get(&tree_bytes) returns None, the code falls back to vec![0u8; 32], which produces a base58 string of all zeros. This could hide a data integrity issue where tree metadata lacks a queue pubkey.

Consider logging a warning or returning an error if a tree is expected to have a queue mapping but doesn't.

             let queue_bytes = tree_to_queue
                 .get(&tree_bytes)
                 .cloned()
-                .unwrap_or_else(|| vec![0u8; 32]);
+                .unwrap_or_else(|| {
+                    tracing::warn!("Missing queue pubkey for tree: {}", bs58::encode(&tree_bytes).into_string());
+                    vec![0u8; 32]
+                });
src/api/method/get_queue_elements.rs (3)

54-74: Consider consistent serialization behavior for optional fields.

The skip_serializing_if attribute is only applied to *_zkp_batch_size fields but not to *_start_index or *_limit fields. This inconsistency could lead to unexpected JSON serialization behavior where some null values are included and others are omitted.

For consistency, either apply skip_serializing_if to all optional fields or remove it entirely:

 pub struct GetQueueElementsRequest {
     pub tree: Hash,
 
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub output_queue_start_index: Option<u64>,
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub output_queue_limit: Option<u16>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub output_queue_zkp_batch_size: Option<u16>,
 
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub input_queue_start_index: Option<u64>,
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub input_queue_limit: Option<u16>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub input_queue_zkp_batch_size: Option<u16>,
     // ... same for address_queue fields
 }

569-569: Redundant import statement.

create_hash_chain_from_slice is already imported at line 16 (use light_compressed_account::hash_chain::create_hash_chain_from_slice;), making this inner use statement unnecessary.

 fn compute_state_queue_hash_chains(
     queue_elements: &[QueueElement],
     queue_type: QueueType,
     zkp_batch_size: usize,
 ) -> Result<Vec<Hash>, PhotonApiError> {
-    use light_compressed_account::hash_chain::create_hash_chain_from_slice;
-
     if zkp_batch_size == 0 || queue_elements.is_empty() {

906-914: Silent failure when storing hash chains to cache.

Using let _ = to discard the result of store_hash_chains_batch means cache storage failures are silently ignored. While this may be intentional (cache is optional), it could hide persistent issues and make debugging difficult.

Consider logging cache storage failures:

             if !chains_to_cache.is_empty() {
-                let _ = queue_hash_cache::store_hash_chains_batch(
+                if let Err(e) = queue_hash_cache::store_hash_chains_batch(
                     tx,
                     tree_pubkey,
                     QueueType::AddressV2,
                     batch_start_index as u64,
                     chains_to_cache,
                 )
-                .await;
+                .await
+                {
+                    log::warn!(
+                        "Failed to cache address queue hash chains for tree {}: {}",
+                        tree_pubkey,
+                        e
+                    );
+                }
             }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de9080f and 6fc9d0a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • .gitignore (1 hunks)
  • Cargo.toml (0 hunks)
  • src/api/api.rs (2 hunks)
  • src/api/method/get_batch_address_update_info.rs (0 hunks)
  • src/api/method/get_multiple_new_address_proofs.rs (1 hunks)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/api/method/get_queue_info.rs (1 hunks)
  • src/api/method/mod.rs (1 hunks)
  • src/api/rpc_server.rs (1 hunks)
  • src/ingester/fetchers/grpc.rs (1 hunks)
  • src/ingester/parser/indexer_events.rs (1 hunks)
  • src/ingester/parser/state_update.rs (2 hunks)
  • src/ingester/parser/tx_event_parser_v2.rs (3 hunks)
  • src/ingester/persist/indexed_merkle_tree/helpers.rs (4 hunks)
  • src/ingester/persist/indexed_merkle_tree/mod.rs (1 hunks)
  • src/ingester/persist/indexed_merkle_tree/proof.rs (0 hunks)
  • src/ingester/persist/leaf_node.rs (2 hunks)
  • src/ingester/persist/leaf_node_proof.rs (2 hunks)
  • src/ingester/persist/mod.rs (1 hunks)
  • src/ingester/persist/persisted_indexed_merkle_tree.rs (3 hunks)
  • src/ingester/persist/persisted_state_tree.rs (1 hunks)
  • src/ingester/persist/spend.rs (2 hunks)
  • src/main.rs (2 hunks)
  • src/monitor/mod.rs (1 hunks)
  • src/monitor/queue_hash_cache.rs (3 hunks)
  • src/monitor/queue_monitor.rs (2 hunks)
  • src/openapi/mod.rs (2 hunks)
  • src/openapi/specs/api.yaml (2 hunks)
  • tests/integration_tests/batched_address_tree_tests.rs (3 hunks)
  • tests/integration_tests/batched_state_tree_tests.rs (5 hunks)
  • tests/integration_tests/utils.rs (2 hunks)
💤 Files with no reviewable changes (3)
  • src/api/method/get_batch_address_update_info.rs
  • src/ingester/persist/indexed_merkle_tree/proof.rs
  • Cargo.toml
🧰 Additional context used
🧬 Code graph analysis (11)
src/api/method/mod.rs (2)
src/api/api.rs (1)
  • get_queue_info (276-281)
src/api/method/get_queue_info.rs (1)
  • get_queue_info (101-159)
src/ingester/persist/mod.rs (2)
src/ingester/parser/indexer_events.rs (1)
  • tree_pubkey (34-39)
src/ingester/parser/state_update.rs (1)
  • tree_pubkey (366-366)
src/ingester/persist/leaf_node.rs (1)
tests/integration_tests/mock_tests.rs (1)
  • map (1507-1514)
src/monitor/queue_hash_cache.rs (2)
src/ingester/parser/indexer_events.rs (1)
  • tree_pubkey (34-39)
src/ingester/parser/state_update.rs (1)
  • tree_pubkey (366-366)
src/openapi/mod.rs (2)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (161-260)
src/api/api.rs (1)
  • get_queue_elements (269-274)
src/ingester/persist/indexed_merkle_tree/mod.rs (1)
src/ingester/persist/indexed_merkle_tree/helpers.rs (3)
  • compute_range_node_hash_v1 (73-86)
  • compute_range_node_hash_v2 (92-102)
  • get_top_element (132-145)
src/ingester/persist/persisted_indexed_merkle_tree.rs (1)
src/ingester/persist/indexed_merkle_tree/helpers.rs (3)
  • compute_hash_with_cache (52-69)
  • compute_range_node_hash_v1 (73-86)
  • compute_range_node_hash_v2 (92-102)
tests/integration_tests/batched_state_tree_tests.rs (1)
tests/integration_tests/utils.rs (1)
  • setup (230-239)
src/api/method/get_queue_info.rs (2)
src/api/api.rs (2)
  • new (107-117)
  • get_queue_info (276-281)
src/api/method/utils.rs (1)
  • bs58 (567-567)
tests/integration_tests/batched_address_tree_tests.rs (1)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (161-260)
src/api/api.rs (1)
src/api/method/get_queue_info.rs (1)
  • get_queue_info (101-159)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (45)
src/ingester/persist/leaf_node_proof.rs (1)

32-37: LGTM! Useful observability addition.

The debug logging provides helpful context for troubleshooting proof fetching operations, capturing the number of indices, tree identifier, and current sequence number at an appropriate log level.

src/ingester/persist/persisted_state_tree.rs (1)

188-190: Guard against out-of-bounds slice on EMPTY_SUBTREES and confirm new length contract

This change looks semantically aligned with returning only tree_height subtrees, but it introduces a new runtime risk and an API shape change:

  1. Potential panic:
    EMPTY_SUBTREES[..tree_height] will panic if tree_height > EMPTY_SUBTREES.len() (40). Previously the empty-result path always returned the full EMPTY_SUBTREES and could not panic regardless of tree_height. Unless all call sites strictly guarantee tree_height <= 40, this is a new critical failure mode.

    Consider adding an explicit bound check (and failing fast with a structured error) or clamping:

    if results.is_empty() {
  •   return Ok(EMPTY_SUBTREES[..tree_height].to_vec());
    
  •   if tree_height > EMPTY_SUBTREES.len() {
    
  •       return Err(PhotonApiError::UnexpectedError(format!(
    
  •           "Requested tree_height {} exceeds EMPTY_SUBTREES capacity {}",
    
  •           tree_height,
    
  •           EMPTY_SUBTREES.len()
    
  •       )));
    
  •   }
    
  •   return Ok(EMPTY_SUBTREES[..tree_height].to_vec());
    
    }
    
    Alternatively, if you know `tree_height` is always `<= MAX_HEIGHT`, a `debug_assert!(tree_height <= EMPTY_SUBTREES.len());` here (and/or at the call sites) would at least encode that invariant.
    
    
  1. Changed return length for empty results:
    Callers that previously assumed a fixed length 40 vector from get_subtrees on empty DB state will now receive a shorter vector sized to tree_height. That matches the new per-tree, height-bounded semantics described in the PR, but it's worth double-checking all consumers to ensure they no longer hard-code EMPTY_SUBTREES.len() and instead rely on tree_height / returned length.
src/monitor/queue_hash_cache.rs (2)

3-3: LGTM: Import change supports generic refactoring.

The import of ConnectionTrait correctly supports the generic parameter changes in the function signatures below.


14-23: No API compatibility issues identified — all call sites are already compatible with the new generic signatures.

The refactoring from db: &DatabaseConnection to db: &C where C: ConnectionTrait is backward compatible and an improvement, not a breaking change. Both DatabaseConnection and DatabaseTransaction implement ConnectionTrait. All call sites in the codebase already pass compatible types:

  • queue_monitor.rs passes &DatabaseConnection
  • get_queue_elements.rs passes &DatabaseTransaction

This generic approach enables broader flexibility (transactions, connection pools, testing scenarios) while maintaining compatibility with existing code.

src/monitor/mod.rs (1)

1-1: LGTM: Module visibility change exposes queue hash cache functionality.

Making queue_hash_cache public aligns with the broader PR objective to expose queue/hash-cache functionality to higher-level components and enables the generic DB interface usage pattern.

src/monitor/queue_monitor.rs (1)

8-8: LGTM: Appropriate log-level adjustment reduces noise.

The change from warn! to debug! for incomplete batch messages is correct—incomplete batches are expected during normal operation. The added comments provide valuable context, and removing the unused warn import keeps the code clean.

Also applies to: 353-362

src/ingester/parser/indexer_events.rs (1)

5-5: Import reordering is fine

The reordered light_event::event import keeps semantics unchanged and matches the surrounding import style. No issues here.

src/ingester/parser/state_update.rs (1)

10-10: Import placement looks consistent

Bringing BatchNullifyContext and NewAddress in here matches their usage in StateUpdate and keeps imports grouped by domain. No behavior concerns.

src/ingester/persist/leaf_node.rs (1)

208-212: LGTM! Good observability practice.

The success logging provides useful feedback for debugging and monitoring persistence operations.

src/ingester/parser/tx_event_parser_v2.rs (1)

45-88: pubkey_array passthrough is consistent with other forwarded fields

Forwarding public_transaction_event.event.pubkey_array directly into PublicTransactionEvent matches the pattern used for other scalar/event fields and keeps the ingester’s view aligned with the light-event type; no additional transformation or cloning seems necessary here.

.gitignore (1)

16-17: LGTM!

Adding proto/**/*.bin to ignore generated proto binary files is a standard practice.

src/ingester/fetchers/grpc.rs (1)

303-306: LGTM!

The multi-line formatting improves readability without changing behavior.

src/ingester/persist/mod.rs (1)

182-186: LGTM!

The enhanced error message provides clearer guidance by including the tree pubkey and explaining that tree metadata must be synced before indexing.

tests/integration_tests/utils.rs (2)

34-34: LGTM!

The import alias properly avoids naming conflicts.


241-256: Clarify interaction with PHOTON_SKIP_ISOLATION_LEVEL flag.

The code sets the session isolation level to READ COMMITTED, but line 182 in the same file sets PHOTON_SKIP_ISOLATION_LEVEL=true to skip isolation level settings in tests. Please verify whether this isolation level configuration is intended to take effect or if it's overridden by the environment variable.

src/api/method/mod.rs (1)

24-24: LGTM!

The module replacement aligns with the PR's objective to migrate from get_batch_address_update_info to get_queue_info.

src/ingester/persist/persisted_indexed_merkle_tree.rs (3)

7-9: LGTM!

The import updates correctly bring in version-specific hash computation functions to support v1 and v2 tree types.


50-59: LGTM!

The switch to compute_range_node_hash_v2 for non-V1 tree types is consistent with the version-specific hash computation strategy (V1 trees use v1 hash at line 42, while V2+ trees use v2 hash here).


495-511: LGTM!

The test correctly uses compute_range_node_hash_v2 to match the updated production code for v2 tree types.

src/openapi/mod.rs (2)

23-25: LGTM!

The import statement correctly includes the new queue data types required for the updated API surface.


86-89: LGTM!

The OpenAPI component schemas correctly include the new queue data types, aligning with the API migration to the queue-based approach.

src/ingester/persist/indexed_merkle_tree/mod.rs (1)

8-12: LGTM - Public API updated to expose versioned hash functions.

The re-exports correctly expose both compute_range_node_hash_v1 and compute_range_node_hash_v2 to support both AddressV1 (3-field hash) and AddressV2 (2-field hash) indexed Merkle tree types.

src/api/rpc_server.rs (1)

197-201: LGTM - New RPC method follows established pattern.

The getQueueInfo method registration is consistent with other RPC methods in the file, properly parsing the payload and delegating to the API layer with appropriate error conversion.

tests/integration_tests/batched_address_tree_tests.rs (4)

9-9: LGTM - Import added for new request type.


166-177: LGTM - Test properly exercises new per-queue API.

The request correctly sets only address_queue_limit while leaving other queue-type parameters as None, which aligns with the API design that allows requesting specific queue types independently.


181-204: LGTM - Assertions updated for new response structure.

The test correctly extracts address_queue from the response and accesses addresses via element.0.to_bytes(), matching the new AddressQueueData structure.


225-247: LGTM - Post-update verification uses correct API pattern.

The verification correctly handles the optional address_queue with map().unwrap_or_default() for safe extraction.

src/ingester/persist/indexed_merkle_tree/helpers.rs (3)

14-32: LGTM - Tree type dispatch correctly routes to versioned hash functions.

The dispatch logic properly maps AddressV1 to compute_range_node_hash_v1 and AddressV2 to compute_range_node_hash_v2, with clear inline comments explaining the difference in hash construction.


71-86: Documentation clarifies v1 hash construction.

Good addition of the doc comment explaining that v1 uses the 3-field Poseidon hash including next_index.


88-102: Well-documented v2 hash function with commit reference.

The documentation correctly explains that next_index is stored but not included in the hash for v2, with a helpful commit reference for traceability.

src/api/method/get_queue_info.rs (3)

12-17: LGTM - Request structure with optional tree filter.

The deny_unknown_fields attribute provides good API contract enforcement. The optional trees parameter allows both filtered and unfiltered queries.


35-99: LGTM - Queue size fetching handles multiple tree types correctly.

The function correctly:

  • Filters by StateV2 and AddressV2 tree types
  • For StateV2: counts both input queue (nullifier) and output queue entries
  • For AddressV2: counts address queue entries
  • Uses composite key (tree_pubkey, queue_type) to handle multiple queues per tree

101-158: LGTM - Main function implementation is well-structured.

The function properly:

  • Validates and parses input pubkeys
  • Fetches queue sizes with optional filtering
  • Joins with tree metadata to get queue pubkeys
  • Returns the context slot for consistency
src/api/api.rs (2)

80-82: LGTM - Import added for new queue info types.


276-281: LGTM - API method follows established pattern.

The get_queue_info method correctly delegates to the module function with proper connection reference passing.

tests/integration_tests/batched_state_tree_tests.rs (4)

242-267: LGTM - Separate queue fetches with appropriate per-queue limits.

The test correctly makes separate API calls for output queue (limit=100) and input queue (limit=100), matching the new per-queue pagination design.


314-363: LGTM - Nullify event handling updated for new response structure.

The code correctly:

  • Extracts queue lengths using the nested optional structure with map_or
  • Safely handles the case where pre_input_queue might be None
  • Uses pre_input_queue.leaves and pre_input_queue.leaf_indices for batch processing

367-418: LGTM - Append event handling correctly uses new queue structure.

The code properly extracts pre_output_queue and accesses leaves and leaf_indices for the batch append verification.


478-511: LGTM - Final assertions verify empty queues after processing.

The assertions correctly check that both output and input queues are empty after all batch events are processed, using the new nested response structure.

src/api/method/get_queue_elements.rs (6)

30-42: LGTM!

The node index encoding is well-documented and correctly implemented. The use of debug_assert is appropriate for internal validation during development.


161-260: LGTM!

The main function is well-structured with:

  • Proper validation that at least one queue is requested
  • Correct transaction handling with isolation
  • Clean separation of output, input, and address queue fetching
  • Proper merging of state queue proofs

262-310: LGTM!

The merge logic correctly collects proofs from both queues and deduplicates nodes. Since both queues are fetched from the same tree in the same transaction, the roots should be consistent.


1000-1005: LGTM!

Clean helper function with proper error handling.


88-150: LGTM!

The data structures are well-designed with:

  • Appropriate use of Default derive for structures that may be empty
  • Consistent serde attributes (deny_unknown_fields, rename_all = "camelCase")
  • Conditional serialization with skip_serializing_if for optional/empty fields

955-958: is_multiple_of is now stable in Rust 1.87.0 and later.

The is_multiple_of method was stabilized in Rust 1.87.0 (May 15, 2025) and no longer requires nightly Rust. The code is compatible with stable Rust, so no changes are needed.

Comment on lines +3743 to +3764
tree:
$ref: '#/components/schemas/Hash'
outputQueueStartIndex:
type: integer
format: uint16
format: uint64
nullable: true
minimum: 0
queueType:
outputQueueLimit:
type: integer
format: uint8
format: uint16
nullable: true
minimum: 0
startOffset:
inputQueueStartIndex:
type: integer
format: uint64
nullable: true
minimum: 0
tree:
$ref: '#/components/schemas/Hash'
inputQueueLimit:
type: integer
format: uint16
nullable: true
minimum: 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Request schema is missing several parameters from Rust implementation.

The Rust GetQueueElementsRequest includes additional parameters that are not documented in the OpenAPI spec:

  • address_queue_start_index, address_queue_limit, address_queue_zkp_batch_size
  • output_queue_zkp_batch_size, input_queue_zkp_batch_size

Add the missing parameters to align with the Rust implementation:

                    properties:
                      tree:
                        $ref: '#/components/schemas/Hash'
                      outputQueueStartIndex:
                        type: integer
                        format: uint64
                        nullable: true
                        minimum: 0
                      outputQueueLimit:
                        type: integer
                        format: uint16
                        nullable: true
                        minimum: 0
+                      outputQueueZkpBatchSize:
+                        type: integer
+                        format: uint16
+                        nullable: true
+                        minimum: 0
                      inputQueueStartIndex:
                        type: integer
                        format: uint64
                        nullable: true
                        minimum: 0
                      inputQueueLimit:
                        type: integer
                        format: uint16
                        nullable: true
                        minimum: 0
+                      inputQueueZkpBatchSize:
+                        type: integer
+                        format: uint16
+                        nullable: true
+                        minimum: 0
+                      addressQueueStartIndex:
+                        type: integer
+                        format: uint64
+                        nullable: true
+                        minimum: 0
+                      addressQueueLimit:
+                        type: integer
+                        format: uint16
+                        nullable: true
+                        minimum: 0
+                      addressQueueZkpBatchSize:
+                        type: integer
+                        format: uint16
+                        nullable: true
+                        minimum: 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tree:
$ref: '#/components/schemas/Hash'
outputQueueStartIndex:
type: integer
format: uint16
format: uint64
nullable: true
minimum: 0
queueType:
outputQueueLimit:
type: integer
format: uint8
format: uint16
nullable: true
minimum: 0
startOffset:
inputQueueStartIndex:
type: integer
format: uint64
nullable: true
minimum: 0
tree:
$ref: '#/components/schemas/Hash'
inputQueueLimit:
type: integer
format: uint16
nullable: true
minimum: 0
tree:
$ref: '#/components/schemas/Hash'
outputQueueStartIndex:
type: integer
format: uint64
nullable: true
minimum: 0
outputQueueLimit:
type: integer
format: uint16
nullable: true
minimum: 0
outputQueueZkpBatchSize:
type: integer
format: uint16
nullable: true
minimum: 0
inputQueueStartIndex:
type: integer
format: uint64
nullable: true
minimum: 0
inputQueueLimit:
type: integer
format: uint16
nullable: true
minimum: 0
inputQueueZkpBatchSize:
type: integer
format: uint16
nullable: true
minimum: 0
addressQueueStartIndex:
type: integer
format: uint64
nullable: true
minimum: 0
addressQueueLimit:
type: integer
format: uint16
nullable: true
minimum: 0
addressQueueZkpBatchSize:
type: integer
format: uint16
nullable: true
minimum: 0
🤖 Prompt for AI Agents
In src/openapi/specs/api.yaml around lines 3743 to 3764 the
GetQueueElementsRequest schema is missing parameters present in the Rust
implementation; add the following nullable integer fields to the schema with
appropriate format and minimum values: address_queue_start_index (uint64,
minimum 0), address_queue_limit (uint16, minimum 0),
address_queue_zkp_batch_size (uint32 or suitable integer format, minimum 0),
output_queue_zkp_batch_size (uint32, minimum 0), and input_queue_zkp_batch_size
(uint32, minimum 0); ensure names match the Rust fields, mark them nullable if
the Rust type is optional, and place them alongside the existing
startIndex/limit fields so the OpenAPI spec aligns with the Rust
GetQueueElementsRequest.

Comment on lines +3802 to +3821
outputQueueElements:
type: array
nullable: true
items:
$ref: '#/components/schemas/GetQueueElementsResponseValue'
outputQueueIndex:
type: integer
format: uint64
nullable: true
minimum: 0
value:
inputQueueElements:
type: array
nullable: true
items:
$ref: '#/components/schemas/GetQueueElementsResponseValue'
inputQueueIndex:
type: integer
format: uint64
nullable: true
minimum: 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Response schema doesn't match Rust implementation.

The OpenAPI response schema defines outputQueueElements, outputQueueIndex, inputQueueElements, and inputQueueIndex as top-level fields, but the Rust GetQueueElementsResponse structure uses state_queue: Option<StateQueueData> and address_queue: Option<AddressQueueData> instead. This will cause API documentation to be inconsistent with actual API behavior.

The response schema should reflect the actual Rust structure:

                  result:
                    type: object
                    required:
                    - context
                    properties:
                      context:
                        $ref: '#/components/schemas/Context'
-                      outputQueueElements:
-                        type: array
-                        nullable: true
-                        items:
-                          $ref: '#/components/schemas/GetQueueElementsResponseValue'
-                      outputQueueIndex:
-                        type: integer
-                        format: uint64
-                        nullable: true
-                        minimum: 0
-                      inputQueueElements:
-                        type: array
-                        nullable: true
-                        items:
-                          $ref: '#/components/schemas/GetQueueElementsResponseValue'
-                      inputQueueIndex:
-                        type: integer
-                        format: uint64
-                        nullable: true
-                        minimum: 0
+                      stateQueue:
+                        $ref: '#/components/schemas/StateQueueData'
+                        nullable: true
+                      addressQueue:
+                        $ref: '#/components/schemas/AddressQueueData'
+                        nullable: true

Additionally, you'll need to add StateQueueData, OutputQueueData, InputQueueData, and AddressQueueData schemas to the components section.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/openapi/specs/api.yaml around lines 3802 to 3821, the response schema
currently exposes outputQueueElements/outputQueueIndex and
inputQueueElements/inputQueueIndex as top-level fields but the Rust type returns
state_queue: Option<StateQueueData> and address_queue: Option<AddressQueueData>;
update the response object to match the Rust structure by replacing those four
top-level fields with two optional objects: state_queue and address_queue, where
state_queue contains output: OutputQueueData and input: InputQueueData (or
similarly named properties matching Rust field names) and address_queue contains
the address-specific queue data; also add component schema definitions for
StateQueueData, OutputQueueData, InputQueueData, and AddressQueueData under
components/schemas with fields/types that mirror the Rust structs (arrays of
GetQueueElementsResponseValue, uint64 indexes nullable with minimum 0, etc.).

…renamed get_queue_elements_v2 to get_queue_elements.

- Deleted the `get_queue_elements_v2.rs` file, which contained the implementation for the `get_queue_elements_v2` API method.
- Removed the module declaration for `get_queue_elements_v2` in `mod.rs`.
- Eliminated the registration of the `getQueueElementsV2` RPC method in `rpc_server.rs`.
- Cleaned up unused imports and references to `get_queue_elements_v2` in the OpenAPI module.
@SwenSchaeferjohann SwenSchaeferjohann force-pushed the sergey/get_queue_elements_v2_rpc branch from 6fc9d0a to 03d2c74 Compare December 5, 2025 08:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/openapi/specs/api.yaml (2)

3739-3765: GetQueueElementsRequest OpenAPI schema still missing fields present in Rust request.

The Rust GetQueueElementsRequest includes additional optional parameters that aren’t documented here:

  • output_queue_zkp_batch_size: Option<u16>
  • input_queue_zkp_batch_size: Option<u16>
  • address_queue_start_index: Option<u64>
  • address_queue_limit: Option<u16>
  • address_queue_zkp_batch_size: Option<u16>

To keep the spec aligned with the API, please add these properties (nullable integers with the appropriate formats and minimum: 0) next to the existing per-queue start/limit fields, e.g.:

                 properties:
                   tree:
                     $ref: '#/components/schemas/Hash'
                   outputQueueStartIndex:
                     type: integer
                     format: uint64
                     nullable: true
                     minimum: 0
                   outputQueueLimit:
                     type: integer
                     format: uint16
                     nullable: true
                     minimum: 0
+                  outputQueueZkpBatchSize:
+                    type: integer
+                    format: uint16
+                    nullable: true
+                    minimum: 0
                   inputQueueStartIndex:
                     type: integer
                     format: uint64
                     nullable: true
                     minimum: 0
                   inputQueueLimit:
                     type: integer
                     format: uint16
                     nullable: true
                     minimum: 0
+                  inputQueueZkpBatchSize:
+                    type: integer
+                    format: uint16
+                    nullable: true
+                    minimum: 0
+                  addressQueueStartIndex:
+                    type: integer
+                    format: uint64
+                    nullable: true
+                    minimum: 0
+                  addressQueueLimit:
+                    type: integer
+                    format: uint16
+                    nullable: true
+                    minimum: 0
+                  addressQueueZkpBatchSize:
+                    type: integer
+                    format: uint16
+                    nullable: true
+                    minimum: 0

3796-3821: GetQueueElementsResponse OpenAPI shape does not match Rust GetQueueElementsResponse.

The Rust response type now exposes:

  • context: Context
  • state_queue: Option<StateQueueData>
  • address_queue: Option<AddressQueueData>

but the OpenAPI response here still documents:

  • outputQueueElements / outputQueueIndex
  • inputQueueElements / inputQueueIndex
  • and references GetQueueElementsResponseValue in components.

This will confuse integrators and diverges from the new queue data model. Please update the response to match the Rust type and use the new component schemas you added (StateQueueData, OutputQueueData, InputQueueData, AddressQueueData), e.g.:

                   result:
                     type: object
                     required:
                       - context
                     properties:
                       context:
                         $ref: '#/components/schemas/Context'
-                      outputQueueElements:
-                        type: array
-                        nullable: true
-                        items:
-                          $ref: '#/components/schemas/GetQueueElementsResponseValue'
-                      outputQueueIndex:
-                        type: integer
-                        format: uint64
-                        nullable: true
-                        minimum: 0
-                      inputQueueElements:
-                        type: array
-                        nullable: true
-                        items:
-                          $ref: '#/components/schemas/GetQueueElementsResponseValue'
-                      inputQueueIndex:
-                        type: integer
-                        format: uint64
-                        nullable: true
-                        minimum: 0
+                      stateQueue:
+                        allOf:
+                          - $ref: '#/components/schemas/StateQueueData'
+                        nullable: true
+                      addressQueue:
+                        allOf:
+                          - $ref: '#/components/schemas/AddressQueueData'
+                        nullable: true

Once you do that, GetQueueElementsResponseValue should no longer be referenced; if it’s truly unused after the refactor, consider removing that schema from components to keep the spec clean.

Also applies to: 4761-4794

src/api/method/get_queue_elements.rs (1)

312-462: Avoid unwrap() on Hash::new and clarify ZKP batch truncation semantics.

Two points in fetch_queue:

  1. Potential panic on malformed hashes (repeat of earlier review):
let account_hashes: Vec<Hash> = queue_elements
    .iter()
    .map(|e| Hash::new(e.hash.as_slice()).unwrap())
    .collect();

If any row has a non-32-byte hash, this will panic. You already handle malformed tx_hash/nullifier with proper error propagation; doing the same here would make this path consistent and safer. For example:

-    let account_hashes: Vec<Hash> = queue_elements
-        .iter()
-        .map(|e| Hash::new(e.hash.as_slice()).unwrap())
-        .collect();
+    let account_hashes: Vec<Hash> = queue_elements
+        .iter()
+        .map(|e| {
+            Hash::new(e.hash.as_slice()).map_err(|err| {
+                PhotonApiError::UnexpectedError(format!(
+                    "Invalid account hash (leaf_index={}): {}",
+                    e.leaf_index, err
+                ))
+            })
+        })
+        .collect::<Result<Vec<_>, _>>()?;
  1. Behavior when there are fewer elements than a full ZKP batch:

When indices.len() < zkp_batch_size, full_batches == 0 and allowed == 0, so you immediately return an empty/default queue (no elements, no proofs). If that’s the intended contract (“only full batches are ever returned”), it’d be good to document it clearly; otherwise, you may want to either:

  • Return the partial batch without ZKP-related data, or
  • Tighten the API to make it explicit that callers must request limits that are multiples of the batch size.

Also, note that start_index: u64 is cast to i64 for DB filtering; if you ever expect queue indices approaching i64::MAX, some validation (or a checked cast) would avoid surprising wraparound.

🧹 Nitpick comments (6)
src/ingester/persist/indexed_merkle_tree/helpers.rs (1)

8-8: Remove redundant local imports.

bigint_to_be_bytes_array is imported at module level (line 8), but is also imported locally in get_zeroeth_exclusion_range_v1 (line 118) and get_top_element (line 133). The local imports are redundant.

 pub fn get_zeroeth_exclusion_range_v1(tree: Vec<u8>) -> indexed_trees::Model {
-    use light_hasher::bigint::bigint_to_be_bytes_array;
-
     indexed_trees::Model {
 pub fn get_top_element(tree: Vec<u8>) -> indexed_trees::Model {
-    use light_hasher::bigint::bigint_to_be_bytes_array;
-
     indexed_trees::Model {

Also applies to: 118-118, 133-133

tests/integration_tests/utils.rs (1)

243-257: SET SESSION only affects one connection; consider configuring per-connection or adjust the comment

This pattern:

let pool = PgPoolOptions::new()
    .min_connections(1)
    .connect_with(options)
    .await
    .unwrap();

sqlx::query("SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED")
    .execute(&pool)
    .await
    .unwrap();

runs SET SESSION ... on a single connection checked out from the pool. If the pool ever opens additional connections (e.g., under concurrent load), those new connections will still use the server’s default isolation level, so the comment “for all connections in the pool” is not strictly true.

For tests where you likely stay within a single connection this is probably fine, but to make the behavior robust and future‑proof you could either:

  • Configure this per-connection using PgPoolOptions::after_connect, or
  • Keep the current code but relax the comment to avoid implying all present/future pool connections are guaranteed to be READ COMMITTED.

Conceptual example using after_connect:

 pub async fn setup_pg_pool(database_url: String) -> PgPool {
     let options: PgConnectOptions = database_url.parse().unwrap();
-    let pool = PgPoolOptions::new()
-        .min_connections(1)
-        .connect_with(options)
-        .await
-        .unwrap();
-
-    // Set default isolation level to READ COMMITTED for all connections in the pool
-    // This ensures each statement sees the latest committed data
-    sqlx::query("SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED")
-        .execute(&pool)
-        .await
-        .unwrap();
-
-    pool
+    PgPoolOptions::new()
+        .min_connections(1)
+        .after_connect(|conn, _meta| {
+            Box::pin(async move {
+                sqlx::query(
+                    "SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED",
+                )
+                .execute(conn)
+                .await?;
+                Ok(())
+            })
+        })
+        .connect_with(options)
+        .await
+        .unwrap()
 }

You may need to tweak the exact after_connect signature to match your sqlx version, but the idea is to ensure every connection created by the pool gets the same isolation‑level configuration.

src/ingester/parser/state_update.rs (1)

188-191: Including in_accounts in kept_account_hashes looks correct but deserves a regression test

Extending kept_account_hashes with self.in_accounts ensures account_transactions are now retained for spent/input accounts even when those accounts don’t have associated tree metadata in this update. That matches the comment and fixes the previous blind spot.

It would be valuable to add a focused test where:

  • an account hash appears only in in_accounts (no corresponding out_accounts), and
  • there’s an AccountTransaction for that hash,

and verify that filter_by_known_trees keeps the AccountTransaction after filtering.

src/ingester/parser/tx_event_parser_v2.rs (1)

87-87: Moving pubkey_array into PublicTransactionEvent is safe and slightly more efficient

pubkey_array: public_transaction_event.event.pubkey_array now moves the field instead of cloning/indirect access. Since public_transaction_event.event isn’t used after this and only other top‑level fields of public_transaction_event are read, this is borrow‑checker safe and avoids an extra allocation/copy.

src/api/method/get_queue_elements.rs (2)

235-251: Inconsistent handling of 0 ZKP batch size hints for address queues and cache semantics.

For state queues, you treat a 0 ZKP batch size hint as “use the default”:

let zkp_batch_size = zkp_batch_size_hint
    .filter(|v| *v > 0)
    .unwrap_or(DEFAULT_ZKP_BATCH_SIZE as u16) as usize;

For address queues, address_queue_zkp_batch_size.unwrap_or(DEFAULT_ADDRESS_ZKP_BATCH_SIZE as u16) propagates Some(0) as 0, and fetch_address_queue_v2 then fails with a validation error ("Address queue ZKP batch size must be greater than zero"). That asymmetry could surprise callers using a shared client config.

Consider either:

  • Treating 0 as “use default” for address queues as well, or
  • Validating address_queue_zkp_batch_size at the API boundary and rejecting 0 consistently for all queues.

Separately, in fetch_address_queue_v2 you’ll reuse any cached hash chains whenever cached.len() >= expected_batch_count, including the case where expected_batch_count == 0 but cached entries exist. If that situation is possible, you may want an explicit expected_batch_count > 0 guard before preferring cache, to avoid returning unrelated cached chains when the current request doesn’t actually form any full batches.

Also applies to: 640-917


937-998: Use % 2 == 0 for consistency with other code in this file; verify encoding convention aligns with documented levels.

In deduplicate_nodes_from_refs you use:

let sibling_pos = if pos.is_multiple_of(2) {};let parent_hash = if pos.is_multiple_of(2) {} else {};

While is_multiple_of(2) is available on u64 in stable Rust 1.74+, elsewhere in this file (e.g., the address-queue proof loop) you use the more idiomatic pos % 2 == 0 pattern. For consistency, switch to:

-        for (level, sibling_hash) in proof_ctx.proof.iter().enumerate() {
-            let sibling_pos = if pos.is_multiple_of(2) {
+        for (level, sibling_hash) in proof_ctx.proof.iter().enumerate() {
+            let sibling_pos = if pos % 2 == 0 {
                 pos + 1
             } else {
                 pos - 1
             };
…
-            let parent_hash = if pos.is_multiple_of(2) {
+            let parent_hash = if pos % 2 == 0 {
                 Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
             } else {
                 Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
             };

Additionally, verify that the encoding convention is correct: the doc comment for encode_node_index states "Level tree_height-1 = root," but in this function you store parents at (level + 1), which would place the root at level tree_height. Confirm this matches the intended behavior or update the documentation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc9d0a and 03d2c74.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • .gitignore (1 hunks)
  • Cargo.toml (0 hunks)
  • src/api/api.rs (2 hunks)
  • src/api/method/get_batch_address_update_info.rs (0 hunks)
  • src/api/method/get_multiple_new_address_proofs.rs (1 hunks)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/api/method/get_queue_info.rs (1 hunks)
  • src/api/method/mod.rs (1 hunks)
  • src/api/rpc_server.rs (1 hunks)
  • src/ingester/fetchers/grpc.rs (1 hunks)
  • src/ingester/parser/indexer_events.rs (1 hunks)
  • src/ingester/parser/state_update.rs (2 hunks)
  • src/ingester/parser/tx_event_parser_v2.rs (3 hunks)
  • src/ingester/persist/indexed_merkle_tree/helpers.rs (4 hunks)
  • src/ingester/persist/indexed_merkle_tree/mod.rs (1 hunks)
  • src/ingester/persist/indexed_merkle_tree/proof.rs (0 hunks)
  • src/ingester/persist/leaf_node.rs (2 hunks)
  • src/ingester/persist/leaf_node_proof.rs (2 hunks)
  • src/ingester/persist/mod.rs (1 hunks)
  • src/ingester/persist/persisted_indexed_merkle_tree.rs (3 hunks)
  • src/ingester/persist/persisted_state_tree.rs (1 hunks)
  • src/ingester/persist/spend.rs (2 hunks)
  • src/main.rs (2 hunks)
  • src/monitor/mod.rs (1 hunks)
  • src/monitor/queue_hash_cache.rs (3 hunks)
  • src/monitor/queue_monitor.rs (2 hunks)
  • src/openapi/mod.rs (2 hunks)
  • src/openapi/specs/api.yaml (2 hunks)
  • tests/integration_tests/batched_address_tree_tests.rs (3 hunks)
  • tests/integration_tests/batched_state_tree_tests.rs (5 hunks)
  • tests/integration_tests/utils.rs (2 hunks)
💤 Files with no reviewable changes (3)
  • src/ingester/persist/indexed_merkle_tree/proof.rs
  • src/api/method/get_batch_address_update_info.rs
  • Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (17)
  • src/api/method/get_multiple_new_address_proofs.rs
  • src/ingester/persist/persisted_state_tree.rs
  • src/monitor/queue_hash_cache.rs
  • src/ingester/fetchers/grpc.rs
  • src/main.rs
  • .gitignore
  • src/ingester/persist/leaf_node.rs
  • tests/integration_tests/batched_address_tree_tests.rs
  • src/api/rpc_server.rs
  • src/ingester/persist/mod.rs
  • src/monitor/queue_monitor.rs
  • src/ingester/persist/indexed_merkle_tree/mod.rs
  • src/api/api.rs
  • src/ingester/persist/leaf_node_proof.rs
  • src/ingester/parser/indexer_events.rs
  • src/api/method/mod.rs
  • src/ingester/persist/spend.rs
🧰 Additional context used
🧬 Code graph analysis (4)
src/ingester/persist/persisted_indexed_merkle_tree.rs (1)
src/ingester/persist/indexed_merkle_tree/helpers.rs (6)
  • compute_hash_with_cache (52-69)
  • compute_range_node_hash_v1 (73-86)
  • compute_range_node_hash_v2 (92-102)
  • get_top_element (132-145)
  • get_zeroeth_exclusion_range (104-115)
  • get_zeroeth_exclusion_range_v1 (117-130)
src/openapi/mod.rs (2)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (161-260)
src/api/api.rs (1)
  • get_queue_elements (269-274)
src/api/method/get_queue_info.rs (2)
src/api/api.rs (1)
  • get_queue_info (276-281)
src/api/method/utils.rs (1)
  • bs58 (567-567)
tests/integration_tests/batched_state_tree_tests.rs (1)
tests/integration_tests/utils.rs (1)
  • setup (230-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (15)
src/monitor/mod.rs (1)

1-1: Visibility change of queue_hash_cache looks appropriate

Making queue_hash_cache public is consistent with the idea of reusing the queue cache outside queue_monitor and fits the PR summary about generalized queue handling. Just be aware this makes monitor::queue_hash_cache part of the crate’s public surface (if monitor is re-exported); if you want to keep the API narrower long term, you could instead re-export only specific types/functions from a higher-level module later.

src/ingester/persist/persisted_indexed_merkle_tree.rs (3)

6-10: LGTM!

The imports are correctly updated to include both v1 and v2 hash computation functions, matching the refactored helpers module.


39-60: LGTM!

The branching logic correctly applies v1 hashing for AddressV1 trees and v2 hashing for all other tree types. This is consistent with the helpers module where AddressV2 uses the 2-field hash.


495-511: LGTM!

The test correctly uses compute_range_node_hash_v2 with get_zeroeth_exclusion_range (the V2 variant), maintaining consistency with the production code refactoring.

src/ingester/persist/indexed_merkle_tree/helpers.rs (4)

14-33: LGTM!

The dispatch logic is clear with helpful documentation explaining the difference between V1 (3-field hash) and V2 (2-field hash). The commit reference in the comment provides good traceability for the design decision.


71-86: LGTM!

Clear documentation explaining the 3-field Poseidon hash structure for AddressV1. The implementation correctly pads next_index to 32 bytes (big-endian) before hashing.


88-102: LGTM!

Well-documented new function with clear explanation of the design rationale. The implementation correctly uses Poseidon::<Fr>::new_circom(2) for the 2-field hash (value, next_value).


104-145: LGTM!

The helper functions are consistently updated to use bigint_to_be_bytes_array for constructing next_value and value fields. The unwrap() calls are safe since HIGHEST_ADDRESS_PLUS_ONE is a known constant guaranteed to fit in 32 bytes.

tests/integration_tests/utils.rs (1)

34-34: Import alias for SolanaAccount is clear and avoids type-name collisions

Moving the solana_account::Account import up and aliasing it as SolanaAccount keeps usages unambiguous next to the existing Account/AccountV2 types and removes duplication further down the file. Looks good.

src/ingester/parser/state_update.rs (1)

10-10: Import reordering for NewAddress is safe

This just adjusts the light_event::event import ordering (and still matches the impl From<NewAddress> for AddressQueueUpdate> below), so there’s no behavioral change here.

src/ingester/parser/tx_event_parser_v2.rs (2)

14-14: Updated event_from_light_transaction import is consistent with usage

The new light_event::parse::event_from_light_transaction import matches the call at line 40 and doesn’t change behavior, assuming the function signature is unchanged in the dependency.


28-38: Refactor of Light pubkey conversion for programs/accounts is behavior‑preserving

The reshaped light_program_ids and light_accounts mappings are equivalent to the previous logic (iterating and converting each Pubkey through to_light_pubkey), just with clearer formatting. No correctness or allocation changes beyond style.

src/openapi/mod.rs (1)

23-25: New queue data schemas correctly wired into OpenAPI components.

The addition of AddressQueueData, InputQueueData, OutputQueueData, and StateQueueData to the OpenAPI components list matches the new Rust types in get_queue_elements and looks consistent.

Also applies to: 84-90

tests/integration_tests/batched_state_tree_tests.rs (1)

153-168: Updated integration test correctly targets new queue API shape.

The test’s use of GetQueueElementsRequest (with explicit per-queue limits) and the new state_queue.{output_queue,input_queue} accessors looks consistent with the refactored get_queue_elements response. The map_or(0, |v| v.leaves.len()) pattern also makes the length assertions robust when queues are empty.

Also applies to: 238-310, 311-419, 461-511

src/api/method/get_queue_elements.rs (1)

54-86: New request/response shapes and high-level flow look coherent.

The expanded GetQueueElementsRequest (per-queue start/limit/ZKP hints) and GetQueueElementsResponse (context + optional state_queue/address_queue) align with the refactored behavior and how the integration test consumes them. The transactional flow in get_queue_elements—fetching output/input queues, merging proofs, and optionally loading the address queue—looks logically sound.

Also applies to: 161-260

Comment on lines +119 to +154
let queue_sizes = fetch_queue_sizes(db, tree_filter).await?;

let tree_pubkeys: Vec<Vec<u8>> = queue_sizes
.keys()
.map(|(tree, _)| tree.clone())
.collect::<std::collections::HashSet<_>>()
.into_iter()
.collect();

let tree_metadata_list = tree_metadata::Entity::find()
.filter(tree_metadata::Column::TreePubkey.is_in(tree_pubkeys))
.all(db)
.await
.map_err(|e| PhotonApiError::UnexpectedError(format!("DB error: {}", e)))?;

let tree_to_queue: HashMap<Vec<u8>, Vec<u8>> = tree_metadata_list
.into_iter()
.map(|t| (t.tree_pubkey, t.queue_pubkey))
.collect();

let queues: Vec<QueueInfo> = queue_sizes
.into_iter()
.map(|((tree_bytes, queue_type), size)| {
let queue_bytes = tree_to_queue
.get(&tree_bytes)
.cloned()
.unwrap_or_else(|| vec![0u8; 32]);

QueueInfo {
tree: bs58::encode(&tree_bytes).into_string(),
queue: bs58::encode(&queue_bytes).into_string(),
queue_type,
queue_size: size,
}
})
.collect();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty queue_sizes early and avoid silent fallback queue pubkeys.

Two things worth tightening up here:

  1. If queue_sizes is empty, the follow-up tree_metadata::Entity::find().filter(TreePubkey.is_in(tree_pubkeys)) is unnecessary and may rely on how the ORM encodes an empty IN list. A simple early return (Ok(GetQueueInfoResponse { queues: vec![], slot })) after fetch_queue_sizes would be clearer and avoid a redundant query.

  2. When a tree pubkey is missing from tree_to_queue, you currently fall back to vec![0u8; 32] and return that as a base58 string. That silently turns a data inconsistency into an apparently valid queue pubkey (111...). Consider either:

    • Skipping such entries, or
    • Surfacing this as an error, or
    • At least logging a warning so missing metadata isn’t hidden.
🤖 Prompt for AI Agents
In src/api/method/get_queue_info.rs around lines 119 to 154, after calling
fetch_queue_sizes handle two issues: if queue_sizes.is_empty() return early with
Ok(GetQueueInfoResponse { queues: vec![], slot }) to avoid the unnecessary DB IN
query; and when mapping queue_sizes to QueueInfo, do not silently substitute
vec![0u8;32] for missing tree pubkeys — instead log a warning (e.g.
warn!("missing tree metadata for pubkey {}", bs58::encode(&tree_bytes))) and
skip that entry (do not include it in the queues vector) or return an error if
you prefer to fail fast; implement the chosen behavior consistently and remove
the zeroed fallback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/api/method/get_multiple_new_address_proofs.rs (1)

96-128: Consider reporting all queued addresses in the error message.

The current logic fast-fails on the first queued address found but doesn't report if multiple requested addresses are already in the queue. While this prevents the operation correctly, reporting all queued addresses would provide better feedback to the caller.

You could collect all queued addresses and include them in the error message:

     if !queue_results.is_empty() {
-        let queued_address: Vec<u8> =
-            queue_results[0].try_get("", "address").map_err(|e| {
-                PhotonApiError::UnexpectedError(format!("Failed to get address: {}", e))
-            })?;
-        let queued_address = SerializablePubkey::try_from(queued_address)?;
-        return Err(PhotonApiError::ValidationError(format!(
-            "Address {} already exists",
-            queued_address
-        )));
+        let queued_addresses: Result<Vec<SerializablePubkey>, _> = queue_results
+            .iter()
+            .map(|row| {
+                let addr_bytes: Vec<u8> = row.try_get("", "address")?;
+                SerializablePubkey::try_from(addr_bytes)
+            })
+            .collect();
+        let queued_addresses = queued_addresses.map_err(|e| {
+            PhotonApiError::UnexpectedError(format!("Failed to get address: {}", e))
+        })?;
+        return Err(PhotonApiError::ValidationError(format!(
+            "Address(es) already exist: {:?}",
+            queued_addresses
+        )));
     }
src/api/method/get_queue_elements.rs (2)

189-226: Consider parallelizing independent queue fetches.

Output and input queue fetches are independent and could run concurrently using tokio::join! to reduce latency.

Apply this diff:

-    let (output_queue, output_proof_data) = if let Some(limit) = request.output_queue_limit {
+    let output_future = async {
+        if let Some(limit) = request.output_queue_limit {
+            let zkp_hint = request.output_queue_zkp_batch_size;
+            match fetch_queue(
+                &tx,
+                &request.tree,
+                QueueType::OutputStateV2,
+                request.output_queue_start_index,
+                limit,
+                zkp_hint,
+            )
+            .await?
+            {
+                QueueData::Output(data, proof_data) => Ok((Some(data), proof_data)),
+                QueueData::Input(_, _) => unreachable!("OutputState should return Output"),
+            }
+        } else {
+            Ok((None, None))
+        }
+    };
+
+    let input_future = async {
+        if let Some(limit) = request.input_queue_limit {
+            let zkp_hint = request.input_queue_zkp_batch_size;
+            match fetch_queue(
+                &tx,
+                &request.tree,
+                QueueType::InputStateV2,
+                request.input_queue_start_index,
+                limit,
+                zkp_hint,
+            )
+            .await?
+            {
+                QueueData::Input(data, proof_data) => Ok((Some(data), proof_data)),
+                QueueData::Output(_, _) => unreachable!("InputState should return Input"),
+            }
+        } else {
+            Ok((None, None))
+        }
+    };
+
+    let ((output_queue, output_proof_data), (input_queue, input_proof_data)) = 
+        tokio::join!(output_future, input_future);
+    let (output_queue, output_proof_data) = output_queue?;
+    let (input_queue, input_proof_data) = input_future?;
-        let zkp_hint = request.output_queue_zkp_batch_size;
-        match fetch_queue(
-            &tx,
-            &request.tree,
-            QueueType::OutputStateV2,
-            request.output_queue_start_index,
-            limit,
-            zkp_hint,
-        )
-        .await?
-        {
-            QueueData::Output(data, proof_data) => (Some(data), proof_data),
-            QueueData::Input(_, _) => unreachable!("OutputState should return Output"),
-        }
-    } else {
-        (None, None)
-    };
-
-    let (input_queue, input_proof_data) = if let Some(limit) = request.input_queue_limit {
-        let zkp_hint = request.input_queue_zkp_batch_size;
-        match fetch_queue(
-            &tx,
-            &request.tree,
-            QueueType::InputStateV2,
-            request.input_queue_start_index,
-            limit,
-            zkp_hint,
-        )
-        .await?
-        {
-            QueueData::Input(data, proof_data) => (Some(data), proof_data),
-            QueueData::Output(_, _) => unreachable!("InputState should return Input"),
-        }
-    } else {
-        (None, None)
-    };

779-828: Consider logging hash computation failures.

On lines 821-823, when Poseidon::hashv fails, the code silently breaks without logging or propagating the error. This could make debugging difficult if hash computation fails unexpectedly.

Apply this diff:

             match parent_hash {
                 Ok(hash) => {
                     current_hash = Hash::from(hash);
                     let parent_pos = pos / 2;
                     let parent_idx =
                         encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
                     nodes_map.insert(parent_idx, current_hash.clone());
                 }
                 Err(_) => {
+                    log::warn!(
+                        "Failed to compute parent hash at level {} for proof at index {}",
+                        level,
+                        proof.lowElementLeafIndex
+                    );
                     break;
                 }
             }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2b0ee4 and a9fa924.

📒 Files selected for processing (6)
  • src/api/api.rs (3 hunks)
  • src/api/method/get_multiple_new_address_proofs.rs (1 hunks)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/ingester/persist/leaf_node.rs (2 hunks)
  • src/main.rs (1 hunks)
  • src/openapi/mod.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/main.rs
🧰 Additional context used
🧬 Code graph analysis (4)
src/openapi/mod.rs (1)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (169-268)
src/ingester/persist/leaf_node.rs (1)
tests/integration_tests/mock_tests.rs (1)
  • map (1507-1514)
src/api/api.rs (1)
src/api/method/get_queue_info.rs (1)
  • get_queue_info (101-159)
src/api/method/get_queue_elements.rs (3)
src/api/method/get_multiple_new_address_proofs.rs (2)
  • get_multiple_new_address_proofs (179-195)
  • get_multiple_new_address_proofs_helper (52-174)
src/ingester/persist/persisted_state_tree.rs (1)
  • get_subtrees (132-209)
src/monitor/queue_hash_cache.rs (2)
  • get_cached_hash_chains (69-102)
  • store_hash_chains_batch (14-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (17)
src/api/method/get_multiple_new_address_proofs.rs (2)

81-81: LGTM! Idiomatic Rust improvement.

Using .or_default() is more idiomatic than .or_insert_with(Vec::new) and functionally equivalent for types implementing Default.


18-18: Previous review concern appears addressed.

The MAX_ADDRESSES constant is now 1000, down from the 30_000 value flagged in the previous review. This resolves the concerns about excessive memory usage, large IN clauses, and query planner stress mentioned in earlier feedback.

src/ingester/persist/leaf_node.rs (2)

172-190: LGTM! The implementation safely addresses the past review concern.

The new implementation correctly handles ActiveValue extraction using pattern matching instead of .unwrap(), eliminating the panic risk noted in the previous review. The approach is actually more defensive than the suggested fix, as it explicitly handles all ActiveValue variants.

The sequence extraction logic is sound:

  • Pattern matching on &m.seq with Set(opt) safely extracts the inner Option<i64>
  • filter_map automatically filters out None values, producing Vec<i64>
  • Min/max computation handles the empty case gracefully (both would be None)

The debug logging provides useful observability for tracking persistence operations, including update counts and sequence ranges.


210-215: LGTM! Post-persistence logging enhances observability.

The debug log after successful persistence complements the pre-persistence log, providing confirmation that the operation completed successfully and helping trace the full lifecycle of tree node persistence.

src/api/method/get_queue_elements.rs (11)

270-321: LGTM!

The proof merging logic correctly combines output and input queue proofs, deduplicates nodes, and handles the case where no proofs are available.


323-408: LGTM!

The validation logic is comprehensive:

  • Enforces maximum queue element limits
  • Properly filters by queue type
  • Handles empty results gracefully
  • Includes a pruning detection check to alert users when requested data has been pruned

471-482: Excellent fix for the previous unwrap() issue!

The code now properly handles Hash::new errors without panicking, including helpful error messages with context (index and leaf_index). This addresses the concern raised in previous reviews.


426-443: Verify intentional return of empty data when batch size requirements aren't met.

When zkp_batch_size > 0 and fewer than zkp_batch_size elements are available, the function returns empty data (lines 432-437). Ensure this is the intended behavior and that callers are aware that partial batches are discarded.

If this is expected, consider documenting this behavior in the function's doc comment.


492-517: LGTM! Good cache-with-fallback pattern.

The code gracefully degrades to local computation when cache data is unavailable, with appropriate logging. This ensures the API remains functional even if the monitor hasn't populated the cache yet.


532-570: LGTM! Excellent error handling.

The extraction of tx_hashes and nullifiers includes comprehensive error handling with detailed context (index, leaf_index) to aid debugging.


589-663: LGTM!

The hash chain computation logic is well-structured with:

  • Early validation for edge cases
  • Clear separation of output vs. input queue logic
  • Proper error handling with descriptive messages
  • Debug logging for observability

665-729: LGTM!

The validation, tree info retrieval, and initial data fetching are properly structured with comprehensive error handling.


843-942: LGTM!

The address queue hash chain caching follows the same robust cache-with-fallback pattern used for state queues, with appropriate validation and error handling.


965-1006: LGTM!

The node deduplication logic efficiently combines proof nodes with pre-fetched path nodes from the database, avoiding redundant hash computations.


1017-1072: LGTM!

The path node fetching logic efficiently builds a deduplicated set of node indices and retrieves them in a single database query.

src/openapi/mod.rs (1)

23-26: LGTM! OpenAPI schema properly reflects API changes.

The imports and component schema declarations correctly expose the new queue data structures (AddressQueueData, InputQueueData, OutputQueueData, StateQueueData) and QueueInfo, replacing the removed legacy types.

Also applies to: 87-91

src/api/api.rs (1)

80-82: LGTM! Queue info endpoint properly integrated.

The get_queue_info endpoint is correctly wired through the API layer with appropriate imports, method implementation, and OpenAPI registration, following the established patterns in this file.

Also applies to: 276-281, 397-401

Comment on lines +34 to +42
fn encode_node_index(level: u8, position: u64, tree_height: u8) -> u64 {
debug_assert!(
level <= tree_height,
"level {} > tree_height {}",
level,
tree_height
);
((level as u64) << 56) | position
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consider replacing debug_assert! with runtime validation.

The debug_assert! is stripped in release builds, so invalid inputs (level > tree_height) could silently corrupt node indices in production.

Apply this diff to add runtime validation:

 #[inline]
 fn encode_node_index(level: u8, position: u64, tree_height: u8) -> u64 {
-    debug_assert!(
-        level <= tree_height,
-        "level {} > tree_height {}",
-        level,
-        tree_height
-    );
+    assert!(
+        level <= tree_height,
+        "level {} > tree_height {}",
+        level,
+        tree_height
+    );
     ((level as u64) << 56) | position
 }

Alternatively, return a Result and propagate the error:

fn encode_node_index(level: u8, position: u64, tree_height: u8) -> Result<u64, PhotonApiError> {
    if level > tree_height {
        return Err(PhotonApiError::UnexpectedError(
            format!("level {} > tree_height {}", level, tree_height)
        ));
    }
    Ok(((level as u64) << 56) | position)
}
🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 34 to 42, replace the
debug-only assertion with real runtime validation by changing the function to
return Result<u64, PhotonApiError>; check if level > tree_height and return
Err(PhotonApiError::UnexpectedError(format!("level {} > tree_height {}", level,
tree_height))) when invalid, otherwise return Ok(((level as u64) << 56) |
position), and update all callers to propagate the Result (use ? or map_err as
appropriate).

Comment on lines +46 to +48
fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
2_i64.pow(tree_height - 1) + leaf_index as i64
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add overflow protection for tree height.

If tree_height >= 32, the expression 2_i64.pow(tree_height - 1) will overflow i64, causing a panic in debug mode or wrap-around in release mode.

Apply this diff to add validation:

 #[inline]
 fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
+    assert!(
+        tree_height < 32,
+        "tree_height {} would overflow i64",
+        tree_height
+    );
     2_i64.pow(tree_height - 1) + leaf_index as i64
 }

Or return a Result to propagate the error gracefully.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 46-48, the function
computing node index uses 2_i64.pow(tree_height - 1) which can overflow for
large tree_height; change the function to validate tree_height (e.g., ensure
tree_height > 0 and below a safe limit such as 63 or the project-specific max —
or per reviewer suggestion, < 32 if that's the intended constraint) or make the
function return a Result and use checked arithmetic (checked_shl/checked_pow or
compute in i128 and verify it fits in i64) so you return an Err on overflow
instead of panicking/wrapping. Ensure the caller is updated to handle the Result
if you choose that approach.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/api/method/get_queue_elements.rs (2)

34-42: Past review issue not addressed: Replace debug_assert! with runtime validation.

The debug_assert! is stripped in release builds, so invalid inputs (level > tree_height) could silently corrupt node indices in production.

Based on past review comments, this should use runtime validation. Consider returning a Result or using a regular assert!:

 #[inline]
 fn encode_node_index(level: u8, position: u64, tree_height: u8) -> u64 {
-    debug_assert!(
+    assert!(
         level <= tree_height,
         "level {} > tree_height {}",
         level,
         tree_height
     );
     ((level as u64) << 56) | position
 }

46-48: Past review issue not addressed: Add overflow protection for tree height.

If tree_height >= 32, the expression 2_i64.pow(tree_height - 1) will overflow i64, causing a panic in debug mode or wrap-around in release mode.

Based on past review comments, add validation:

 #[inline]
 fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
+    assert!(
+        tree_height > 0 && tree_height < 32,
+        "tree_height {} would overflow i64",
+        tree_height
+    );
     2_i64.pow(tree_height - 1) + leaf_index as i64
 }
🧹 Nitpick comments (1)
src/api/method/get_queue_elements.rs (1)

670-967: Consider refactoring fetch_address_queue_v2 for maintainability.

This function is ~300 lines long and handles multiple concerns: DB queries, proof generation, node deduplication, hash chain computation, and caching. This complexity makes it difficult to test and maintain.

Consider extracting smaller focused functions:

  • query_address_queue_from_db - lines 694-728
  • generate_address_non_inclusion_proofs - lines 764-833
  • compute_and_cache_address_hash_chains - lines 851-950

This would improve testability and readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9fa924 and a82c13d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/openapi/mod.rs (2 hunks)
  • tests/integration_tests/batched_address_tree_tests.rs (3 hunks)
  • tests/integration_tests/batched_state_tree_tests.rs (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/integration_tests/batched_state_tree_tests.rs (1)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (178-274)
tests/integration_tests/batched_address_tree_tests.rs (1)
src/api/method/get_queue_elements.rs (1)
  • get_queue_elements (178-274)
src/api/method/get_queue_elements.rs (6)
src/common/mod.rs (1)
  • format_bytes (157-164)
src/ingester/persist/leaf_node_proof.rs (2)
  • get_multiple_compressed_leaf_proofs_by_indices (14-90)
  • indices (42-42)
src/ingester/persist/persisted_state_tree.rs (2)
  • get_subtrees (132-209)
  • bytes (571-574)
src/ingester/persist/leaf_node.rs (4)
  • leaf_index_to_node_index (30-32)
  • from (35-42)
  • from (46-53)
  • from (57-64)
src/ingester/parser/tree_info.rs (2)
  • height (67-73)
  • get (18-26)
src/monitor/queue_hash_cache.rs (1)
  • get_cached_hash_chains (69-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (9)
src/openapi/mod.rs (1)

23-26: LGTM! OpenAPI schema updates align with new per-queue API.

The addition of new queue-related types (InputQueueData, OutputQueueData, AddressQueueData, StateQueueData, Node, QueueRequest, QueueInfo) to both imports and OpenAPI schema components is consistent with the broader API refactoring to support per-queue requests.

Also applies to: 87-93

tests/integration_tests/batched_address_tree_tests.rs (2)

166-177: LGTM! Test correctly migrated to new per-queue API.

The test now properly constructs GetQueueElementsRequest with per-queue fields (output_queue, input_queue, address_queue) instead of the previous single-queue approach. The use of QueueRequest for per-queue parameters is consistent with the new API design.

Also applies to: 223-234


179-202: LGTM! Proper handling of new address queue structure.

The test correctly accesses the address queue data from the response and uses the new structure (address_queue_before.addresses with element.0.to_bytes()) for iterating and comparing addresses.

src/api/method/get_queue_elements.rs (3)

476-487: LGTM! Proper error handling for Hash::new.

The previous unwrap() issue has been properly addressed. The code now uses map_err with descriptive error messages and collects into Result<Vec<Hash>, PhotonApiError>, which is then propagated with ?.


178-274: LGTM! Well-structured per-queue request handling.

The main function properly:

  • Validates that at least one queue is requested (lines 186-190)
  • Fetches output and input queues conditionally with proper error handling
  • Merges proof data for state queues
  • Handles address queue independently
  • Returns a well-structured response

328-592: Verify the zkp_batch_size truncation logic aligns with client expectations.

Lines 434-448 truncate the results to full batches when zkp_batch_size is provided. If the returned data is truncated (e.g., 15 elements requested with batch_size=10 returns only 10), clients may not realize they received fewer elements than requested.

Consider:

  1. Documenting this behavior clearly in the API specification
  2. Adding a field to the response indicating truncation occurred
  3. Or returning a validation error if the limit is not a multiple of zkp_batch_size

Does the client expect truncation or should this be an error? Please clarify the intended behavior.

tests/integration_tests/batched_state_tree_tests.rs (3)

240-263: LGTM! Test correctly uses new per-queue API structure.

The test properly constructs GetQueueElementsRequest with per-queue fields (output_queue, input_queue, address_queue) and sets unused queues to None. The use of QueueRequest with explicit limit, start_index, and zkp_batch_size parameters is consistent with the new API design.

Also applies to: 277-300


306-326: LGTM! Safe navigation of nested queue structures.

The test correctly accesses queue data using safe Option chaining (.as_ref(), .and_then(), .map_or()), properly handling the nested structure where queues are now wrapped in state_queue. The default value of 0 for missing queues is appropriate for length calculations.

Also applies to: 337-356, 359-379


394-410: LGTM! Proper handling of output queue leaf indices.

The test correctly extracts the output queue from state_queue, accesses leaf_indices and leaves, and iterates through them for Merkle tree updates. The slice length check (min(10)) prevents index out of bounds.

low_element_proofs.push(proof.proof.clone());

let leaf_idx =
encode_node_index(0, proof.lowElementLeafIndex as u64, tree_info.height as u8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential data loss from u8 cast of tree height.

Lines 799, 809, and 823 (and similarly at lines 982, 985, 993, 1002) cast tree_info.height to u8. If the tree height exceeds 255, this will silently truncate the value, leading to incorrect node index encoding.

Apply validation before calling encode_node_index, or change the function signature to accept u32 and validate internally:

+    if tree_info.height > u8::MAX as u32 {
+        return Err(PhotonApiError::ValidationError(format!(
+            "Tree height {} exceeds maximum supported height of {}",
+            tree_info.height, u8::MAX
+        )));
+    }
+    let tree_height_u8 = tree_info.height as u8;
     let leaf_idx =
-        encode_node_index(0, proof.lowElementLeafIndex as u64, tree_info.height as u8);
+        encode_node_index(0, proof.lowElementLeafIndex as u64, tree_height_u8);

Or refactor encode_node_index to accept u32 and validate internally.

Also applies to: 809-809, 823-823

🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 799, 809, 823 (and
similarly 982, 985, 993, 1002), tree_info.height is being cast to u8 before
calling encode_node_index which can silently truncate values >255; either
validate tree_info.height fits in u8 before the cast and return/error if it
doesn’t, or preferably change encode_node_index to accept a wider integer (u32)
and perform an internal bounds check there (returning an error if the height is
out of allowed range) and then call it with the uncasted value; ensure all call
sites are updated to the new signature or include the pre-checks to prevent
silent truncation.

Comment on lines +813 to +829
Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
} else {
Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
};

match parent_hash {
Ok(hash) => {
current_hash = Hash::from(hash);
let parent_pos = pos / 2;
let parent_idx =
encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
nodes_map.insert(parent_idx, current_hash.clone());
}
Err(_) => {
break;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent error suppression in hash computation loop.

Line 827 uses break when Poseidon::hashv returns an error, silently stopping the proof path computation without propagating the error. This could lead to incomplete node maps and incorrect proofs.

Consider propagating the error instead:

             match parent_hash {
                 Ok(hash) => {
                     current_hash = Hash::from(hash);
                     let parent_pos = pos / 2;
                     let parent_idx =
                         encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
                     nodes_map.insert(parent_idx, current_hash.clone());
                 }
-                Err(_) => {
-                    break;
+                Err(e) => {
+                    return Err(PhotonApiError::UnexpectedError(format!(
+                        "Failed to compute parent hash at level {}: {}", level, e
+                    )));
                 }
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
} else {
Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
};
match parent_hash {
Ok(hash) => {
current_hash = Hash::from(hash);
let parent_pos = pos / 2;
let parent_idx =
encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
nodes_map.insert(parent_idx, current_hash.clone());
}
Err(_) => {
break;
}
}
Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
} else {
Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
};
match parent_hash {
Ok(hash) => {
current_hash = Hash::from(hash);
let parent_pos = pos / 2;
let parent_idx =
encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
nodes_map.insert(parent_idx, current_hash.clone());
}
Err(e) => {
return Err(PhotonApiError::UnexpectedError(format!(
"Failed to compute parent hash at level {}: {}", level, e
)));
}
}
🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 813 to 829, the code
silently stops the hash loop by using `break` when `Poseidon::hashv` returns an
error, which can produce incomplete node maps and bad proofs; replace the
`break` with proper error propagation (e.g., return Err(...) or use the `?`
operator) so the calling function receives the failure, and if needed add a
descriptive context message when converting the Poseidon error into your
function's error type; ensure the function signature returns a compatible Result
and update callers if necessary.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/api/method/get_queue_elements.rs (3)

48-52: Protect leaf_index_to_node_index from overflow/underflow and keep in sync with other implementation.

leaf_index_to_node_index uses 2_i64.pow(tree_height - 1) with no bounds checks. For large tree_height this will overflow i64, and for tree_height == 0 it will underflow the exponent (tree_height - 1), causing panics/wrap-around. The same logic exists in src/ingester/persist/leaf_node.rs, so divergence here could be confusing.

Consider adding an explicit range check to keep this safe and make the invariant obvious (and mirror it in the other file):

 #[inline]
-fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
-    2_i64.pow(tree_height - 1) + leaf_index as i64
-}
+fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
+    // Support realistic Merkle heights while avoiding i64 overflow and underflow.
+    assert!(
+        (1..32).contains(&tree_height),
+        "tree_height {} out of supported range [1, 31]",
+        tree_height
+    );
+    2_i64.pow(tree_height - 1) + leaf_index as i64
+}

You may also want to update src/ingester/persist/leaf_node.rs to use the same invariant or centralize this helper.


468-476: Avoid silent truncation when casting tree_info.height to u8.

tree_info.height is cast to u8 in several places (e.g., tree_info.height as u8 for StateQueueProofData.tree_height and encode_node_index calls). If a tree height ever exceeds 255, this will silently truncate, corrupting node indices and proofs. This was called out previously and is still present.

A small helper that validates the range once and reuses the checked u8 eliminates the risk:

@@
 const MAX_QUEUE_ELEMENTS_SQLITE: u16 = 15;
 
+#[inline]
+fn tree_height_to_u8(height: u32) -> Result<u8, PhotonApiError> {
+    if height > u8::MAX as u32 {
+        return Err(PhotonApiError::ValidationError(format!(
+            "Tree height {} exceeds maximum supported height of {}",
+            height,
+            u8::MAX
+        )));
+    }
+    Ok(height as u8)
+}
+
@@
-    let tree_height_u32 = tree_info.height as u32 + 1;
+    let raw_tree_height = tree_info.height as u32;
+    let tree_height_u32 = raw_tree_height + 1;
@@
-    let path_nodes =
-        fetch_path_nodes_from_db(tx, &serializable_tree, &indices, tree_height_u32).await?;
+    let path_nodes =
+        fetch_path_nodes_from_db(tx, &serializable_tree, &indices, tree_height_u32).await?;
@@
-    let proof_data = Some(StateQueueProofData {
-        proofs: generated_proofs.clone(),
-        tree_height: tree_info.height as u8,
-        path_nodes,
-    });
+    let proof_data = Some(StateQueueProofData {
+        proofs: generated_proofs.clone(),
+        tree_height: tree_height_to_u8(raw_tree_height)?,
+        path_nodes,
+    });
@@
-    let tree_info = TreeInfo::get(tx, &serializable_tree.to_string())
+    let tree_info = TreeInfo::get(tx, &serializable_tree.to_string())
         .await?
         .ok_or_else(|| PhotonApiError::UnexpectedError("Failed to get tree info".to_string()))?;
@@
-        let leaf_idx =
-            encode_node_index(0, proof.lowElementLeafIndex as u64, tree_info.height as u8);
+        let tree_height_u8 = tree_height_to_u8(tree_info.height as u32)?;
+        let leaf_idx = encode_node_index(0, proof.lowElementLeafIndex as u64, tree_height_u8);
@@
-            let sibling_idx = encode_node_index(level as u8, sibling_pos, tree_info.height as u8);
+            let sibling_idx = encode_node_index(level as u8, sibling_pos, tree_height_u8);
@@
-                    let parent_idx =
-                        encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
+                    let parent_idx =
+                        encode_node_index((level + 1) as u8, parent_pos, tree_height_u8);

You might instead compute tree_height_u8 once right after reading tree_info in fetch_address_queue_v2 and reuse it throughout that function.

Also applies to: 810-836


818-844: Propagate Poseidon::hashv failures instead of silently breaking out of the loop.

On error from Poseidon::hashv, the current code just breaks the loop, which produces an incomplete nodes_map and silently degrades proof correctness without any error to the caller.

It’s safer to surface this as an API error:

-        for (level, sibling_hash) in proof.proof.iter().enumerate() {
+        for (level, sibling_hash) in proof.proof.iter().enumerate() {
@@
-            let parent_hash = if pos % 2 == 0 {
-                Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
-            } else {
-                Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
-            };
-
-            match parent_hash {
-                Ok(hash) => {
-                    current_hash = Hash::from(hash);
-                    let parent_pos = pos / 2;
-                    let parent_idx =
-                        encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
-                    nodes_map.insert(parent_idx, current_hash.clone());
-                }
-                Err(_) => {
-                    break;
-                }
-            }
+            let parent_hash = if pos % 2 == 0 {
+                Poseidon::hashv(&[&current_hash.0, &sibling_hash.0])
+            } else {
+                Poseidon::hashv(&[&sibling_hash.0, &current_hash.0])
+            }
+            .map_err(|e| {
+                PhotonApiError::UnexpectedError(format!(
+                    "Failed to compute parent hash at level {} for lowElementLeafIndex {}: {}",
+                    level, proof.lowElementLeafIndex, e
+                ))
+            })?;
+
+            current_hash = Hash::from(parent_hash);
+            let parent_pos = pos / 2;
+            let parent_idx =
+                encode_node_index((level + 1) as u8, parent_pos, tree_height_u8);
+            nodes_map.insert(parent_idx, current_hash.clone());
@@
-            pos /= 2;
+            pos /= 2;

(Assuming you introduce tree_height_u8 as suggested in the previous comment and compute it once before this loop.)

This way any hashing failure produces a clear PhotonApiError instead of an undetected partial proof.

🧹 Nitpick comments (5)
src/snapshot/mod.rs (1)

261-265: Unnecessary clone() on line 262.

The path variable is not used after this assignment, so .clone() is unnecessary when the prefix is empty.

         let full_path = if self.gcs_prefix.is_empty() {
-            path.clone()
+            path
         } else {
             format!("{}/{}", self.gcs_prefix, path)
         };
src/monitor/queue_hash_cache.rs (1)

88-98: Consider logging when hash chain length is invalid.

The current implementation silently skips entries where result.hash_chain.len() != 32. This could mask data corruption issues. A debug/warn log would help with troubleshooting.

     for result in results {
         if result.hash_chain.len() == 32 {
             let mut hash_array = [0u8; 32];
             hash_array.copy_from_slice(&result.hash_chain);

             chains.push(CachedHashChain {
                 zkp_batch_index: result.zkp_batch_index,
                 hash_chain: hash_array,
             });
+        } else {
+            debug!(
+                "Skipping cached hash chain with invalid length: {} (expected 32)",
+                result.hash_chain.len()
+            );
         }
     }
src/monitor/queue_monitor.rs (1)

260-261: Consider using HashSet for used_cached_indices.

The used_cached_indices vector is checked with contains() at line 313, which is O(n). For typical batch sizes this is acceptable, but a HashSet would provide O(1) lookup.

src/api/method/get_queue_elements.rs (2)

332-452: Consider SQLite parameter limits when combining MAX_QUEUE_ELEMENTS with fetch_path_nodes_from_db.

fetch_queue allows up to MAX_QUEUE_ELEMENTS (30,000) for all backends. For each leaf index, fetch_path_nodes_from_db builds the full path of node indices and feeds them into state_trees::Column::NodeIdx.is_in(all_path_indices). On SQLite this can easily exceed the 999-parameter limit even for modest tree heights, causing runtime SQL errors.

Given you already special-case SQLite for MAX_QUEUE_ELEMENTS_SQLITE in the address queue, you may want to either:

  • Use a lower per-request limit for state queues on SQLite, or
  • Chunk all_path_indices into multiple smaller IN queries and merge the results.

No functional change is required immediately, but this is a potential source of backend-specific failures under larger loads.


280-330: Optionally validate consistency between output and input state proofs in merge_state_queue_proofs.

merge_state_queue_proofs trusts that all proofs share the same tree height, root, and root sequence. If, due to misconfiguration or data corruption, output and input queues reference different roots/heights, you could merge incompatible paths without noticing.

If you want stronger guarantees, consider:

  • Checking that any second tree_height matches the first and returning an error if not.
  • Verifying that all proofs agree on root and root_seq (or at least asserting this invariant in debug).

This would make it easier to detect cross-tree mixups early.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a82c13d and 950ccdd.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • Cargo.toml (2 hunks)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/monitor/queue_hash_cache.rs (4 hunks)
  • src/monitor/queue_monitor.rs (6 hunks)
  • src/snapshot/gcs_utils/mod.rs (1 hunks)
  • src/snapshot/gcs_utils/resumable_upload.rs (1 hunks)
  • src/snapshot/mod.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Cargo.toml
🧰 Additional context used
🧬 Code graph analysis (4)
src/snapshot/gcs_utils/mod.rs (1)
src/snapshot/gcs_utils/resumable_upload.rs (1)
  • resumable_upload (17-38)
src/monitor/queue_monitor.rs (1)
src/monitor/queue_hash_cache.rs (2)
  • delete_hash_chains (104-130)
  • store_hash_chains_batch (14-66)
src/snapshot/mod.rs (1)
src/snapshot/gcs_utils/resumable_upload.rs (2)
  • resumable_upload (17-38)
  • get_access_token (321-335)
src/api/method/get_queue_elements.rs (6)
src/common/mod.rs (1)
  • format_bytes (157-164)
src/ingester/persist/leaf_node_proof.rs (2)
  • get_multiple_compressed_leaf_proofs_by_indices (14-90)
  • indices (42-42)
src/ingester/persist/persisted_state_tree.rs (2)
  • get_subtrees (132-209)
  • bytes (571-574)
src/ingester/persist/leaf_node.rs (4)
  • leaf_index_to_node_index (30-32)
  • from (35-42)
  • from (46-53)
  • from (57-64)
src/ingester/parser/tree_info.rs (2)
  • height (67-73)
  • get (18-26)
src/monitor/queue_hash_cache.rs (2)
  • get_cached_hash_chains (69-102)
  • store_hash_chains_batch (14-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (21)
src/snapshot/gcs_utils/mod.rs (1)

1-1: LGTM!

Simple module declaration correctly exposes the resumable_upload submodule.

src/snapshot/mod.rs (2)

32-32: LGTM!

The new gcs_utils module is correctly declared and exposed.


267-279: LGTM!

The resumable upload integration is well-structured with proper error context for both token acquisition and upload operations.

src/snapshot/gcs_utils/resumable_upload.rs (8)

1-14: LGTM!

Imports are appropriate, and constants are well-documented. The 8MB chunk size aligns with GCS recommendations for resumable uploads.


17-38: LGTM!

Clean orchestration of the resumable upload flow with appropriate logging.


41-109: LGTM!

Solid retry logic with exponential backoff for server errors and rate limits. The Location header extraction is correctly implemented.


160-275: LGTM with minor observation.

The chunked upload logic with retry and resume-from-offset is well-implemented. The continue 'outer pattern correctly handles resume scenarios by recalculating chunks from the new position.


280-318: LGTM!

The upload status query correctly parses the Range header to determine resume position, with appropriate fallback behavior when the header is missing or the upload is already complete.


320-335: LGTM!

Good fallback pattern: tries GCE metadata service first (for cloud environments), then falls back to service account credentials file for local development.


337-355: LGTM!

Correct implementation with appropriate timeout for non-GCP environments and proper Metadata-Flavor header.


357-424: LGTM!

Well-implemented JWT-based service account authentication flow. The unwrap() on duration_since(UNIX_EPOCH) (line 385) is safe since system time is always after the Unix epoch.

src/monitor/queue_hash_cache.rs (2)

14-23: Good use of generics for connection abstraction.

Making the function generic over ConnectionTrait improves flexibility and testability, allowing callers to pass transactions or mock connections.


104-130: LGTM!

The deletion function correctly filters by all columns of the composite key and returns the affected row count for observability. The early return for empty indices is efficient.

src/monitor/queue_monitor.rs (4)

299-310: Verify zkp_batch_index value in divergence struct.

The zkp_batch_index field is set to zkp_batch_idx (the loop-relative index starting from 0), while actual_zkp_idx is used elsewhere for cache operations. When logged via log_divergence, this will show the relative offset rather than the actual batch index.

If the intent is to show the actual ZKP batch index for debugging, consider using actual_zkp_idx:

             divergences.push(HashChainDivergence {
                 queue_info: QueueHashChainInfo {
                     queue_type,
                     tree_pubkey,
                     batch_index: pending_batch_index,
                     current_index: start_offset + (zkp_batch_idx as u64 * zkp_batch_size),
                 },
                 expected_hash_chain: *computed,
                 actual_hash_chain: *on_chain,
-                zkp_batch_index: zkp_batch_idx,
+                zkp_batch_index: actual_zkp_idx,
             });

327-346: Good cache invalidation strategy.

The two-phase validation approach ensures only verified chains are cached, and invalid cached entries are cleaned up. Logging errors without propagating is appropriate since cache operations shouldn't block the main verification flow.


404-413: LGTM - appropriate log level adjustment.

Incomplete batches are expected during normal operation, so debug level is appropriate to reduce noise while maintaining observability when needed.


237-245: Good documentation of the index offset discrepancy.

The comment clearly explains the 1-based vs 0-based indexing difference between on-chain batch.start_index and address_queues.queue_index. Using saturating_sub(1) is defensive against underflow. The referenced explanation in address.rs lines 51-55 confirms this pattern: address queues are 0-based while batch updates use 1-based indices because address trees have a pre-initialized zeroth element.

src/api/method/get_queue_elements.rs (4)

479-492: Nice improvement on hash decoding and error messages.

The new Hash::new(...).map_err(...) logic for account_hashes, tx_hashes, and nullifiers removes the previous unwrap() panics and adds precise error context (index and leaf_index). This is a solid robustness improvement for corrupted DB rows.

No changes requested.

Also applies to: 541-579


598-672: compute_state_queue_hash_chains batch semantics and validation look sound.

The function cleanly bails out on empty input or incomplete batches, validates hash/nullifier length, and differentiates queue types correctly. Together with the earlier truncation logic in fetch_queue, this keeps hash-chain computation aligned to full ZK batches.

No further changes needed here.


674-979: Overall address-queue flow is solid; main concerns are already covered above.

The address queue logic (limit checks, raw SQL construction via format_bytes, proof count validation, node deduplication, and hash-chain caching/persistence) is coherent and defensive. Apart from the tree-height casting and Poseidon::hashv error handling already called out, the rest of the function looks good.

No additional changes requested.


182-278: Top-level get_queue_elements orchestration looks correct and composable.

Input validation (at least one queue required), transaction scoping, per-queue fetching, and the merged StateQueueData/AddressQueueData response structure are all consistent and easy to extend. Error propagation via ? is clean.

No issues from this layer beyond the lower-level concerns already noted.

Comment on lines +123 to +136
// First, we need to collect all data to know total size
// For very large files, we could use unknown size (*) but that's more complex
let mut all_data = Vec::new();
while let Some(chunk_result) = byte_stream.next().await {
let chunk = chunk_result?;
all_data.extend_from_slice(&chunk);
}

let total_size = all_data.len() as u64;
info!(
"Total upload size: {} bytes ({:.2} MB)",
total_size,
total_size as f64 / 1024.0 / 1024.0
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Collecting entire stream into memory defeats streaming benefits for large files.

The resumable upload is designed for large files, but collecting all data into memory first (line 125-128) could cause OOM for multi-GB uploads. Consider implementing true streaming with unknown total size (* in Content-Range) or chunked streaming with size tracking.

For true streaming without knowing total size upfront, GCS supports:

Content-Range: bytes 0-CHUNK_END/*

for intermediate chunks, and:

Content-Range: bytes OFFSET-LAST/*

for the final chunk (where the response confirms completion).

Alternatively, if the total size can be determined without buffering (e.g., from file metadata), pass it as a parameter.

🤖 Prompt for AI Agents
In src/snapshot/gcs_utils/resumable_upload.rs around lines 123-136, the current
code buffers the entire byte stream into a Vec (lines 125-128) which can OOM for
multi-GB uploads; instead implement true streaming: do not collect all_data —
read chunks from byte_stream and upload them incrementally using Resumable
Upload semantics (send intermediate PUTs with Content-Range: bytes
OFFSET-CHUNK_END/* and for the final chunk send Content-Range with the final
total or confirm completion based on the GCS response), or if total size is
available from metadata pass it and use full Content-Range; ensure you track the
current offset, mark the final chunk appropriately, handle partial-resume
responses and retries, and never hold the whole payload in memory.

Comment on lines +138 to +158
if total_size == 0 {
// Handle empty file case
let response = client
.put(upload_uri)
.header(CONTENT_LENGTH, "0")
.header(CONTENT_RANGE, "bytes */*")
.send()
.await
.context("Failed to upload empty file")?;

if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
return Err(anyhow!(
"Failed to upload empty file: {} - {}",
status,
body
));
}
return Ok(());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Content-Range header for empty files should specify size.

For empty files where the total size is known to be 0, the Content-Range header should be bytes */0 rather than bytes */*. The */* format indicates unknown total size.

         let response = client
             .put(upload_uri)
             .header(CONTENT_LENGTH, "0")
-            .header(CONTENT_RANGE, "bytes */*")
+            .header(CONTENT_RANGE, "bytes */0")
             .send()
             .await
             .context("Failed to upload empty file")?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if total_size == 0 {
// Handle empty file case
let response = client
.put(upload_uri)
.header(CONTENT_LENGTH, "0")
.header(CONTENT_RANGE, "bytes */*")
.send()
.await
.context("Failed to upload empty file")?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
return Err(anyhow!(
"Failed to upload empty file: {} - {}",
status,
body
));
}
return Ok(());
}
if total_size == 0 {
// Handle empty file case
let response = client
.put(upload_uri)
.header(CONTENT_LENGTH, "0")
.header(CONTENT_RANGE, "bytes */0")
.send()
.await
.context("Failed to upload empty file")?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
return Err(anyhow!(
"Failed to upload empty file: {} - {}",
status,
body
));
}
return Ok(());
}
🤖 Prompt for AI Agents
In src/snapshot/gcs_utils/resumable_upload.rs around lines 138 to 158, the code
handling uploads for empty files sets the Content-Range header to "bytes */*"
which denotes unknown total size; change it to use the known total size "bytes
*/0" (or format as bytes */{total_size}) so the header explicitly specifies the
zero length, i.e. construct the Content-Range value from total_size (0) and send
"bytes */0" instead of "bytes */*".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/snapshot/gcs_utils/resumable_upload.rs (2)

123-136: Collecting entire stream into memory defeats streaming benefits for large files.

This issue was previously flagged and remains unaddressed. The resumable upload is designed for large files, but collecting all data into memory first (lines 125-128) could cause OOM for multi-GB uploads.


138-158: Content-Range header for empty files should specify size.

This issue was previously flagged and remains unaddressed. For empty files where the total size is known to be 0, the Content-Range header should be bytes */0 rather than bytes */*.

🧹 Nitpick comments (3)
src/snapshot/gcs_utils/resumable_upload.rs (3)

177-183: Avoid unnecessary allocation.

The .to_vec() call creates an unnecessary copy of the chunk data. Since reqwest::Body can be created from a slice, you can avoid this allocation.

Apply this diff:

             let response = client
                 .put(upload_uri)
                 .header(CONTENT_LENGTH, chunk_data.len().to_string())
                 .header(CONTENT_RANGE, &content_range)
                 .header(CONTENT_TYPE, "application/octet-stream")
-                .body(chunk_data.to_vec())
+                .body(Bytes::copy_from_slice(chunk_data))
                 .send()
                 .await;

266-274: Progress logging might skip the final chunk.

The condition offset % (100 * 1024 * 1024) < CHUNK_SIZE as u64 might not log progress for the final chunk if it doesn't align with the 100MB boundary. Consider adding a final progress log after the loop completes.

Apply this diff after line 275:

         }
     }
 
+    // Log final progress
+    if total_size >= 100 * 1024 * 1024 {
+        info!(
+            "Upload progress: 100.0% ({:.2} MB / {:.2} MB)",
+            total_size as f64 / 1024.0 / 1024.0,
+            total_size as f64 / 1024.0 / 1024.0
+        );
+    }
+
     Ok(())
 }

359-426: LGTM! Correct OAuth2 service account flow implementation.

The JWT creation and token exchange follow GCP's service account authentication flow correctly. The RS256 algorithm and claim structure are appropriate.

Optional: Consider making the OAuth scope configurable.

The scope is currently hardcoded to devstorage.read_write (line 392). If this module might be used for operations requiring different scopes in the future, consider accepting it as a parameter or constant.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 950ccdd and 99b21e7.

📒 Files selected for processing (1)
  • src/snapshot/gcs_utils/resumable_upload.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (6)
src/snapshot/gcs_utils/resumable_upload.rs (6)

1-13: LGTM! Well-chosen constants for GCS resumable uploads.

The 8MB chunk size aligns with GCS recommendations (multiples of 256KB), and the retry configuration is appropriate.


15-38: LGTM! Clean orchestration of the resumable upload workflow.

The function properly delegates to specialized helpers and provides clear logging at each stage.


40-109: LGTM! Robust retry logic with exponential backoff.

The function correctly handles transient failures (5xx, 429) and extracts the upload URI from the Location header.


280-318: LGTM! Correct implementation of upload status query.

The function properly queries GCS for the current upload position and correctly parses the Range header to determine the resume offset.


320-337: LGTM! Appropriate fallback strategy for authentication.

The function correctly prioritizes service account credentials over the metadata service, providing flexibility for different deployment environments.


339-357: LGTM! Standard GCE metadata service token retrieval.

The implementation correctly queries the metadata service with appropriate headers and timeout.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
src/api/method/get_queue_elements.rs (5)

35-43: [Duplicate] Debug assertion insufficient for production.

This debug_assert was previously flagged and remains unaddressed. In release builds, invalid level > tree_height will silently corrupt node indices.


47-49: [Duplicate] Overflow protection still missing.

Previously flagged: if tree_height >= 32, the expression 2_i64.pow(tree_height - 1) will overflow, causing a panic or wrap-around.


807-842: [Duplicate] Silent error suppression remains.

This was previously flagged: the break at line 836 silently stops proof path computation when Poseidon::hashv fails, potentially creating incomplete node maps and incorrect proofs. Error should be propagated instead.


808-808: [Duplicate] Tree height truncation risk persists.

Previously flagged: casting tree_info.height to u8 at lines 808, 818, and 832 can silently truncate values exceeding 255, leading to incorrect node index encoding.

Also applies to: 818-818, 832-832


994-994: [Duplicate] Tree height u8 casts continue.

Previously flagged: lines 994, 1002, and 1011 cast tree_height (u8) when calling encode_node_index. While tree_height is already u8 in this function's parameters, the upstream issue remains if callers pass truncated values.

Also applies to: 1002-1002, 1011-1011

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99b21e7 and 2230141.

📒 Files selected for processing (1)
  • src/api/method/get_queue_elements.rs (2 hunks)
🔇 Additional comments (10)
src/api/method/get_queue_elements.rs (10)

179-275: Well-structured orchestration.

The function clearly orchestrates per-queue fetching, merging, and transaction handling. The validation ensuring at least one queue is requested is appropriate.


277-327: LGTM.

The proof merging logic correctly collects and deduplicates nodes from both output and input queues.


329-593: Well-implemented queue fetching with proper error handling.

The function correctly handles different queue types, validates limits, and properly propagates errors. The Hash::new error handling at lines 477-488 now correctly collects Results instead of using unwrap(), addressing the previous review concern.


595-669: LGTM.

The hash chain computation logic correctly handles both OutputStateV2 and InputStateV2 queue types with proper error handling and fallback behavior.


671-806: Address queue logic is well-structured.

The function properly validates limits based on database backend, fetches queue elements, and retrieves non-inclusion proofs. The raw SQL queries use format_bytes for safe parameter handling.


843-976: Complex but correct hash chain caching.

The caching strategy with fallback computation is appropriate. The logic correctly handles batch boundaries and stores computed chains for future requests.


981-1024: Node deduplication logic is sound.

The function correctly walks proof paths, collects sibling and parent hashes from both proofs and the database, deduplicates by index, and returns a sorted result.


1026-1031: LGTM.

The indexed leaf hash computation is straightforward and properly propagates errors.


1069-1090: Database query and result mapping look correct.

The SeaORM query properly filters by tree and node indices, and the result mapping handles hash conversion errors appropriately.


1-30: Well-organized imports and constants.

The module structure is clear with appropriate external dependencies. The differentiated limits for SQLite vs. other databases (lines 28-29) reflect platform-specific constraints.

Comment on lines +1049 to +1063
let all_path_indices: Vec<i64> = leaf_indices
.iter()
.flat_map(|&leaf_idx| {
let node_idx = leaf_index_to_node_index(leaf_idx as u32, tree_height);
let mut path = vec![node_idx];
let mut current = node_idx;
while current > 1 {
current >>= 1;
path.push(current);
}
path
})
.sorted()
.dedup()
.collect();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: u64 to u32 truncation in leaf index.

Line 1052 casts leaf_idx (u64) to u32 when calling leaf_index_to_node_index. This silently truncates leaf indices exceeding u32::MAX (~4.3 billion), corrupting the node index calculation and path fetching.

Apply this diff to validate the cast:

 let all_path_indices: Vec<i64> = leaf_indices
     .iter()
     .flat_map(|&leaf_idx| {
+        if leaf_idx > u32::MAX as u64 {
+            log::error!("Leaf index {} exceeds u32::MAX", leaf_idx);
+            return vec![];
+        }
         let node_idx = leaf_index_to_node_index(leaf_idx as u32, tree_height);

Or better, refactor leaf_index_to_node_index to accept u64 and validate internally, then update all call sites.

🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 1049–1063 the code unsafely
casts leaf_idx (u64) to u32 before calling leaf_index_to_node_index which
truncates values > u32::MAX; change to avoid the cast by refactoring
leaf_index_to_node_index to accept u64 (or otherwise operate on u64) and perform
internal validation (or explicit try_into with an error return) so oversized
indices are handled rather than silently truncated, then update all call sites
to pass the u64 and handle the validation error where appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/api/method/get_queue_elements.rs (4)

31-49: Harden node index helpers against invalid heights and overflow

encode_node_index relies on debug_assert!, which is stripped in release builds, and leaf_index_to_node_index uses 2_i64.pow(tree_height - 1) with no bounds check. If tree_height is 0 or large (≥64), this can underflow/overflow and either panic (debug) or silently wrap in release. Because these helpers feed DB indices and proof construction, this is worth tightening.

Consider at minimum enforcing a safe range at runtime:

 #[inline]
 fn encode_node_index(level: u8, position: u64, tree_height: u8) -> u64 {
-    debug_assert!(
-        level <= tree_height,
-        "level {} > tree_height {}",
-        level,
-        tree_height
-    );
+    assert!(
+        level <= tree_height,
+        "level {} > tree_height {}",
+        level,
+        tree_height
+    );
     ((level as u64) << 56) | position
 }

 /// Convert leaf_index to node_index in binary tree format (root=1, children of N are 2N and 2N+1)
 #[inline]
 fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
-    2_i64.pow(tree_height - 1) + leaf_index as i64
+    assert!(
+        tree_height > 0 && tree_height <= 63,
+        "tree_height {} out of supported range (1..=63)",
+        tree_height
+    );
+    2_i64
+        .pow(tree_height - 1)
+        .checked_add(leaf_index as i64)
+        .expect("leaf_index_to_node_index overflow")
 }

Or, if you prefer not to panic, make leaf_index_to_node_index return a Result<i64, PhotonApiError> and propagate errors from callers.


479-596: Good handling of proofs and hash‑chain caching; watch tree height cast to u8

The proof handling and hash‑chain cache fallback are well done:

  • You validate generated_proofs.len() == indices.len().
  • StateQueueProofData holds both proofs and DB path nodes, enabling deduplicate_nodes_from_refs.
  • Hash‑chain caching uses expected_batch_count and falls back to local computation, truncating cached chains to the expected count.

One thing to re‑check: tree_height is stored as tree_info.height as u8 in StateQueueProofData and later passed into encode_node_index. If tree_info.height can ever exceed u8::MAX, this will silently truncate and corrupt indices. You may want to either:

  • Enforce a max height at ingest / TreeInfo creation, or
  • Validate here (returning a PhotonApiError::ValidationError if tree_info.height > u8::MAX as u32).

674-990: Address queue v2: fix zkp_batch_size == 0 and hash‑chain cache alignment

This block is generally well‑structured (per‑backend max_allowed, raw SQL with format_bytes, subtrees prefetch, and reuse of the hash‑chain cache), but there are a few correctness edge cases:

  1. zkp_batch_size == 0 semantics are inconsistent

    • In get_queue_elements, zkp_batch_size for the address queue is taken as‑is from the request or DEFAULT_ADDRESS_ZKP_BATCH_SIZE if None.
    • Unlike state queues, a value of 0 is not normalized and only errors in one code path (if zkp_batch_size == 0 inside the recompute branch).
    • If there is any cached data, or if addresses.is_empty(), a zkp_batch_size of 0 will not error, but will still influence expected_batch_count (which becomes 0), leading to odd behavior.

    It would be clearer to either reject zkp_batch_size == 0 up front or normalize it to the default, e.g.:

    let zkp_batch_size = req
  • .zkp_batch_size
  • .unwrap_or(DEFAULT_ADDRESS_ZKP_BATCH_SIZE as u16);
  • .zkp_batch_size
  • .filter(|v| *v > 0)
  • .unwrap_or(DEFAULT_ADDRESS_ZKP_BATCH_SIZE as u16);

and keep the `zkp_batch_size == 0` error path as a hard guard inside `fetch_address_queue_v2`.

2. **Cached hash‑chains may not align with the current request**

In the cached branch:

```rust
if !cached.is_empty() && cached.len() >= expected_batch_count {
    let mut sorted = cached;
    sorted.sort_by_key(|c| c.zkp_batch_index);
    for entry in sorted {
        leaves_hash_chains.push(Hash::from(entry.hash_chain));
    }
}
  • expected_batch_count can be 0 (e.g., addresses.len() < zkp_batch_size), yet the condition still passes and you return all cached chains.
  • Even when expected_batch_count > 0, you return every cached chain rather than just the first expected_batch_count, whereas the state queue path uses .take(expected_batch_count).

This can cause leaves_hash_chains.len() to exceed the number of full batches implied by addresses.len() for this response.

Making this symmetric with the state queue logic would avoid over‑returning:

-    if !cached.is_empty() && cached.len() >= expected_batch_count {
+    if expected_batch_count > 0 && !cached.is_empty() && cached.len() >= expected_batch_count {
      log::debug!( ... );
      let mut sorted = cached;
      sorted.sort_by_key(|c| c.zkp_batch_index);
-        for entry in sorted {
-            leaves_hash_chains.push(Hash::from(entry.hash_chain));
-        }
+        for entry in sorted.into_iter().take(expected_batch_count) {
+            leaves_hash_chains.push(Hash::from(entry.hash_chain));
+        }
  } else if !addresses.is_empty() {
  1. Merkle index helpers and Poseidon errors

    • As in the state path, tree_info.height is cast to u8 in calls to encode_node_index (e.g., lines 821–822, 832, 846). If heights can exceed 255, this silently truncates. Same remediation as noted for tree_height above: validate or bound heights before casting.
    • In the Poseidon loop, Err(_) => break; silently suppresses parent‑hash failures and yields partial nodes_map. Propagating an error instead would prevent subtle proof corruption:
  •    match parent_hash {
    
  •        Ok(hash) => { ... }
    
  •        Err(_) => {
    
  •            break;
    
  •        }
    
  •    }
    
  •    match parent_hash {
    
  •        Ok(hash) => { ... }
    
  •        Err(e) => {
    
  •            return Err(PhotonApiError::UnexpectedError(format!(
    
  •                "Failed to compute parent hash at level {}: {}",
    
  •                level, e
    
  •            )));
    
  •        }
    
  •    }
    
    
    

1040-1104: Fix potential u64u32 truncation when building path indices

In fetch_path_nodes_from_db, the closure:

.flat_map(|&leaf_idx| {
    let node_idx = leaf_index_to_node_index(leaf_idx as u32, tree_height);
    ...
})

blindly casts leaf_idx: u64 to u32. If any leaf index exceeds u32::MAX, this truncates and you’ll compute and query the wrong path indices. Prior review already flagged this as critical; it’s still present.

A safer pattern is to validate and avoid the flat_map/iterator trick, so you can return a PhotonApiError on overflow:

-    let all_path_indices: Vec<i64> = leaf_indices
-        .iter()
-        .flat_map(|&leaf_idx| {
-            let node_idx = leaf_index_to_node_index(leaf_idx as u32, tree_height);
-            let mut path = vec![node_idx];
-            let mut current = node_idx;
-            while current > 1 {
-                current >>= 1;
-                path.push(current);
-            }
-            path
-        })
-        .sorted()
-        .dedup()
-        .collect();
+    let mut all_path_indices: Vec<i64> = Vec::new();
+    for &leaf_idx in leaf_indices {
+        let leaf_u32: u32 = leaf_idx.try_into().map_err(|_| {
+            PhotonApiError::ValidationError(format!(
+                "Leaf index {} exceeds u32::MAX and cannot be converted",
+                leaf_idx
+            ))
+        })?;
+        let mut node_idx = leaf_index_to_node_index(leaf_u32, tree_height);
+        all_path_indices.push(node_idx);
+        while node_idx > 1 {
+            node_idx >>= 1;
+            all_path_indices.push(node_idx);
+        }
+    }
+    all_path_indices.sort();
+    all_path_indices.dedup();

This removes the silent truncation and preserves the existing semantics.

🧹 Nitpick comments (5)
src/ingester/persist/indexed_merkle_tree/proof.rs (1)

259-359: Document duplicate address handling and explain BATCH_SIZE.

Two observations:

  1. Duplicate addresses: If the input values contains duplicates, the HashMap will only retain one entry per unique address (the last one processed). This behavior should be documented in the function's doc comment.

  2. BATCH_SIZE constant: The same unexplained magic number of 100 appears here. Consider documenting why this value was chosen or making it a module-level constant with a comment explaining the choice.

Add documentation:

 /// Optimized version for API use: Query the next smallest element for each input address.
 /// Returns a HashMap mapping INPUT ADDRESS -> range node model.
 /// This is O(1) lookup per address instead of O(n) scan in the caller.
+/// Note: If the input contains duplicate addresses, only one entry per unique address will be returned.
 pub async fn query_next_smallest_elements_by_address<T>(

Consider extracting the constant:

/// Batch size for chunked queries to avoid query plan explosion while maintaining efficiency
const QUERY_BATCH_SIZE: usize = 100;
src/api/method/get_queue_elements.rs (4)

63-99: Request/response schemas look good; clarify Default semantics

The new QueueRequest, GetQueueElementsRequest, and GetQueueElementsResponse shapes look coherent and future‑proof (per‑queue options, deny_unknown_fields, camelCase).

One nuance: GetQueueElementsRequest derives Default, which yields an all‑zero tree and no queues, and will always hit the "At least one queue must be requested" validation. If Default is only for serde this is harmless, but if it might be used programmatically, consider either:

  • Dropping Default for GetQueueElementsRequest, or
  • Providing a manual Default that is clearly invalid only in tests (e.g., tree: Hash::default() but documented as such).

101-171: Data structures are well‑factored; consider documenting invariants

Node, StateQueueData, OutputQueueData, InputQueueData, and AddressQueueData neatly separate shared nodes from per‑queue payloads, and the serde configuration matches the API shape.

Given how tightly leaf_indices, account_hashes, leaves, and leaves_hash_chains must align by position for verification, it would help future maintainers/clients if you document the index invariants (e.g., “all vectors here are aligned by index”) in struct‑level doc comments.


332-477: fetch_queue logic is solid; nice defensive hash handling

The per‑queue query construction and validation look good:

  • Limits are enforced (limit > MAX_QUEUE_ELEMENTS).
  • Input queue filters (NullifierQueueIndex.is_not_null, NullifiedInTree = false, Spent = true) are clear.
  • Start index + pruning check (first_queue_index > start_index) is explicit.

Also, the rework around Hash::new for account_hashes is much safer than the previous unwrap, with precise error messages including idx and leaf_index.

No issues here; just one minor thought: zkp_batch_size is always >0 given the .filter(|v| *v > 0).unwrap_or(DEFAULT_ZKP_BATCH_SIZE) logic, so the if zkp_batch_size > 0 guard is redundant and could be removed for clarity, but that’s purely cosmetic.


598-672: compute_state_queue_hash_chains behavior is correct; small cleanups possible

The batch computation logic is consistent with how you truncate queues:

  • Early‑returns on empty input or when queue_elements.len() / zkp_batch_size == 0.
  • Enforces 32‑byte hash/nullifier lengths with clear error messages.
  • Uses create_hash_chain_from_slice and wraps any failure into PhotonApiError.

Two minor nits:

  • You re‑use create_hash_chain_from_slice inside the function even though it’s already imported at the top of the file.
  • Since this helper assumes zkp_batch_size > 0 and queue_elements.len() is a multiple of that (as enforced by fetch_queue), adding an internal debug_assert!(zkp_batch_size > 0) would make those assumptions explicit.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2230141 and 711c47b.

📒 Files selected for processing (2)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/ingester/persist/indexed_merkle_tree/proof.rs (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/ingester/persist/indexed_merkle_tree/proof.rs (2)
src/common/mod.rs (1)
  • format_bytes (157-164)
tests/integration_tests/utils.rs (1)
  • format (269-270)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (4)
src/ingester/persist/indexed_merkle_tree/proof.rs (2)

182-184: LGTM! Good optimization.

Early return for empty input avoids unnecessary database operations.


374-390: LGTM! Efficient optimization.

The refactor correctly uses the new query_next_smallest_elements_by_address function to achieve O(1) address lookup instead of scanning through results. The comments clearly explain the performance benefit.

src/api/method/get_queue_elements.rs (2)

182-278: get_queue_elements orchestration is clear; transaction handling looks sound

The top‑level orchestration reads clean: you validate that at least one queue is requested, open a transaction, fetch per‑queue data, merge state proofs, then commit. Error propagation (especially around DB and hash‑chain cache access) is consistent, and skip_serializing_if on optional queues keeps the payloads lean.

No functional concerns here from my side.


1040-1045: compute_indexed_leaf_hash is clean and well‑wrapped

The indexed‑leaf hashing helper is nicely encapsulated:

  • Uses Poseidon::hashv directly on the inner [u8; 32] data.
  • Wraps any Poseidon error into PhotonApiError::UnexpectedError with context.
  • Returns a Hash newtype, keeping the rest of the code clean.

No changes needed here.

Comment on lines +280 to +330
fn merge_state_queue_proofs(
output_proof_data: &Option<StateQueueProofData>,
input_proof_data: &Option<StateQueueProofData>,
) -> Result<(Vec<Node>, Hash, u64), PhotonApiError> {
let mut all_proofs: Vec<&crate::ingester::persist::MerkleProofWithContext> = Vec::new();
let mut all_path_nodes: HashMap<i64, Hash> = HashMap::new();
let mut tree_height: Option<u8> = None;
let mut initial_root: Option<Hash> = None;
let mut root_seq: Option<u64> = None;

// Collect proofs and path nodes from output queue
if let Some(ref proof_data) = output_proof_data {
tree_height = Some(proof_data.tree_height);
all_path_nodes.extend(proof_data.path_nodes.clone());
for proof in &proof_data.proofs {
if initial_root.is_none() {
initial_root = Some(proof.root.clone());
root_seq = Some(proof.root_seq);
}
all_proofs.push(proof);
}
}

// Collect proofs and path nodes from input queue
if let Some(ref proof_data) = input_proof_data {
if tree_height.is_none() {
tree_height = Some(proof_data.tree_height);
}
all_path_nodes.extend(proof_data.path_nodes.clone());
for proof in &proof_data.proofs {
if initial_root.is_none() {
initial_root = Some(proof.root.clone());
root_seq = Some(proof.root_seq);
}
all_proofs.push(proof);
}
}

if all_proofs.is_empty() || tree_height.is_none() {
return Ok((Vec::new(), Hash::default(), 0));
}

let height = tree_height.unwrap();
let nodes = deduplicate_nodes_from_refs(&all_proofs, height, &all_path_nodes);

Ok((
nodes,
initial_root.unwrap_or_default(),
root_seq.unwrap_or_default(),
))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file size and read the function in context
wc -l src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 105


🏁 Script executed:

# Read the relevant section of the file
cat -n src/api/method/get_queue_elements.rs | sed -n '270,340p'

Repository: Lightprotocol/photon

Length of output: 2793


🏁 Script executed:

# Search for StateQueueProofData definition
rg -A 10 'struct StateQueueProofData|pub struct StateQueueProofData' --type rs

Repository: Lightprotocol/photon

Length of output: 90


🏁 Script executed:

# Find callers of merge_state_queue_proofs
rg 'merge_state_queue_proofs' --type rs -B 2 -A 2

Repository: Lightprotocol/photon

Length of output: 90


🏁 Script executed:

# Check the PhotonApiError type definition
rg -A 20 'enum PhotonApiError|pub enum PhotonApiError' --type rs

Repository: Lightprotocol/photon

Length of output: 90


🏁 Script executed:

# Search for StateQueueProofData definition without type filter
rg 'struct StateQueueProofData|pub struct StateQueueProofData' -A 10

Repository: Lightprotocol/photon

Length of output: 824


🏁 Script executed:

# Find callers of merge_state_queue_proofs
rg 'merge_state_queue_proofs'

Repository: Lightprotocol/photon

Length of output: 244


🏁 Script executed:

# Check PhotonApiError definition
rg 'enum PhotonApiError|pub enum PhotonApiError' -A 30

Repository: Lightprotocol/photon

Length of output: 1656


🏁 Script executed:

# Search for how this function is used in context
rg -B 5 -A 5 'merge_state_queue_proofs\('

Repository: Lightprotocol/photon

Length of output: 1598


🏁 Script executed:

# Search for MerkleProofWithContext structure and its root field
rg 'struct MerkleProofWithContext|pub struct MerkleProofWithContext' -A 15

Repository: Lightprotocol/photon

Length of output: 1320


🏁 Script executed:

# Search for deduplicate_nodes_from_refs to see its assumptions
rg 'fn deduplicate_nodes_from_refs|pub fn deduplicate_nodes_from_refs' -A 20

Repository: Lightprotocol/photon

Length of output: 1742


🏁 Script executed:

# Look for any documentation or comments about tree_height invariants
rg 'tree_height|root_seq' -B 2 -A 2 | head -100

Repository: Lightprotocol/photon

Length of output: 6489


🏁 Script executed:

# Check if there are any comments or documentation about output/input queue expectations
rg 'output.*input|input.*output' -B 3 -A 3 src/api/method/get_queue_elements.rs | head -80

Repository: Lightprotocol/photon

Length of output: 1545


🏁 Script executed:

# Look at how output_proof_data and input_proof_data are created to understand their source
rg 'output_proof_data|input_proof_data' -B 5 src/api/method/get_queue_elements.rs | head -100

Repository: Lightprotocol/photon

Length of output: 1257


Add explicit consistency checks for tree_height, root, and root_seq in merge_state_queue_proofs

The function's state queue data documentation explicitly states that initial_root is "shared by output and input queues," yet the current implementation silently uses first-seen values when merging output and input proof data. If these independently-fetched queues represent different state versions (mismatched tree_height, root, or root_seq), the result would be an incoherent StateQueueData.

Since deduplicate_nodes_from_refs uses a single tree_height for all proofs, mismatched heights could produce incorrect node calculations. Add explicit validation to fail fast:

    if let Some(ref proof_data) = input_proof_data {
-        if tree_height.is_none() {
-            tree_height = Some(proof_data.tree_height);
-        }
+        if let Some(h) = tree_height {
+            if h != proof_data.tree_height {
+                return Err(PhotonApiError::UnexpectedError(format!(
+                    "Mismatched tree heights between output and input proofs: {} vs {}",
+                    h, proof_data.tree_height
+                )));
+            }
+        } else {
+            tree_height = Some(proof_data.tree_height);
+        }
        ...
        for proof in &proof_data.proofs {
-            if initial_root.is_none() {
-                initial_root = Some(proof.root.clone());
-                root_seq = Some(proof.root_seq);
-            }
+            if let Some(ref root) = initial_root {
+                if *root != proof.root {
+                    return Err(PhotonApiError::UnexpectedError(
+                        "Mismatched initial_root across state queue proofs".to_string(),
+                    ));
+                }
+            } else {
+                initial_root = Some(proof.root.clone());
+                root_seq = Some(proof.root_seq);
+            }
            all_proofs.push(proof);
        }
    }
🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 280 to 330, the
merge_state_queue_proofs function currently picks the first-seen tree_height,
initial_root and root_seq from output/input without validating they match; add
explicit consistency checks so that when both output_proof_data and
input_proof_data are Some you compare their tree_height values and their
roots/root_seq (or compare each incoming proof's root/root_seq against the
chosen initial_root) and return an Err(PhotonApiError::... /* choose an
appropriate error variant with a clear message */) if any mismatch is found;
ensure the function only proceeds to call deduplicate_nodes_from_refs when a
single consistent tree_height is established and both queues agree on
initial_root and root_seq, otherwise bail fast with the error.

Comment on lines +992 to +1038
/// Deduplicate nodes across all merkle proofs using pre-fetched path nodes from DB.
/// Returns a Vec<Node> sorted by index.
/// Uses path_nodes (DB node_idx -> hash) for parent hashes instead of computing them.
fn deduplicate_nodes_from_refs(
proofs: &[&crate::ingester::persist::MerkleProofWithContext],
tree_height: u8,
path_nodes: &HashMap<i64, Hash>,
) -> Vec<Node> {
let mut nodes_map: HashMap<u64, Hash> = HashMap::new();
let tree_height_u32 = tree_height as u32 + 1;

for proof_ctx in proofs {
let mut pos = proof_ctx.leaf_index as u64;
let mut db_node_idx = leaf_index_to_node_index(proof_ctx.leaf_index, tree_height_u32);

// Store the leaf itself
let leaf_idx = encode_node_index(0, pos, tree_height);
nodes_map.insert(leaf_idx, proof_ctx.hash.clone());

// Walk up the proof path, storing sibling hashes and path node hashes from DB
for (level, sibling_hash) in proof_ctx.proof.iter().enumerate() {
let sibling_pos = if pos % 2 == 0 { pos + 1 } else { pos - 1 };

// Store the sibling (from proof)
let sibling_idx = encode_node_index(level as u8, sibling_pos, tree_height);
nodes_map.insert(sibling_idx, sibling_hash.clone());

// Move to parent
db_node_idx >>= 1;
pos /= 2;

// Store the parent hash from DB (if available)
if let Some(parent_hash) = path_nodes.get(&db_node_idx) {
let parent_idx = encode_node_index((level + 1) as u8, pos, tree_height);
nodes_map.insert(parent_idx, parent_hash.clone());
}
}
}

let mut sorted_nodes: Vec<(u64, Hash)> = nodes_map.into_iter().collect();
sorted_nodes.sort_by_key(|(idx, _)| *idx);

sorted_nodes
.into_iter()
.map(|(index, hash)| Node { index, hash })
.collect()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "get_queue_elements.rs" -type f

Repository: Lightprotocol/photon

Length of output: 102


🏁 Script executed:

git ls-files | grep -i "get_queue_elements"

Repository: Lightprotocol/photon

Length of output: 100


🏁 Script executed:

git ls-files | grep "src/api" | head -20

Repository: Lightprotocol/photon

Length of output: 1070


🏁 Script executed:

git ls-files | head -30

Repository: Lightprotocol/photon

Length of output: 1146


🏁 Script executed:

find . -name "*.rs" -path "*/api/*" -type f 2>/dev/null | head -20

Repository: Lightprotocol/photon

Length of output: 1150


🏁 Script executed:

find . -name "*.rs" -type f 2>/dev/null | grep -E "(api|queue)" | head -20

Repository: Lightprotocol/photon

Length of output: 1150


🏁 Script executed:

ls -la . && find . -maxdepth 2 -type d

Repository: Lightprotocol/photon

Length of output: 1254


🏁 Script executed:

rg "deduplicate_nodes_from_refs" 2>/dev/null

Repository: Lightprotocol/photon

Length of output: 252


🏁 Script executed:

rg "leaf_index_to_node_index" 2>/dev/null

Repository: Lightprotocol/photon

Length of output: 1726


🏁 Script executed:

find . -type f -name "*.rs" 2>/dev/null | head -10

Repository: Lightprotocol/photon

Length of output: 576


🏁 Script executed:

test -f "src/api/method/get_queue_elements.rs" && echo "File exists" || echo "File does not exist"

Repository: Lightprotocol/photon

Length of output: 75


🏁 Script executed:

pwd && ls -la && git status

Repository: Lightprotocol/photon

Length of output: 1076


🏁 Script executed:

git rev-parse --show-toplevel

Repository: Lightprotocol/photon

Length of output: 82


🏁 Script executed:

wc -l src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 105


🏁 Script executed:

sed -n '992,1038p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 1969


🏁 Script executed:

sed -n '980,1050p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 2757


🏁 Script executed:

grep -n "deduplicate_nodes_from_refs" src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 186


🏁 Script executed:

sed -n '900,950p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 1923


🏁 Script executed:

grep -B 50 "deduplicate_nodes_from_refs" src/api/method/get_queue_elements.rs | head -80

Repository: Lightprotocol/photon

Length of output: 2987


🏁 Script executed:

grep -n "fn leaf_index_to_node_index" src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 138


🏁 Script executed:

sed -n '1039,1080p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 1396


🏁 Script executed:

grep -n "fn encode_node_index" src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 139


🏁 Script executed:

sed -n '1041,1055p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 643


🏁 Script executed:

tail -150 src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 5016


🏁 Script executed:

grep -n "merge_state_queue_proofs" src/api/method/get_queue_elements.rs
</parameter>
</invoke>
</function_calls>

```shell
sed -n '850,900p' src/api/method/get_queue_elements.rs
</parameter>
</invoke>
</function_calls>

Let me get a more complete view by reading the entire file in sections:
<function_calls>
<invoke name="shell">
<parameter name="code">
cat -n src/api/method/get_queue_elements.rs | grep -A 2 -B 2 "merge_state_queue_proofs"
</parameter>
</invoke>
</function_calls>

```shell
cat -n src/api/method/get_queue_elements.rs | grep -A 2 -B 2 "merge_state_queue_proofs"
```

Let me check the error types defined in the error module:
<function_calls>
<invoke name="shell">
<parameter name="code">
cat src/api/error.rs | head -50
</parameter>
</invoke>
</function_calls>

```shell
cat src/api/error.rs
```

Let me also check the leaf_index_to_node_index function in the ingester module:
<function_calls>
<invoke name="shell">
<parameter name="code">
grep -A 10 "pub fn leaf_index_to_node_index" src/ingester/persist/leaf_node.rs
</parameter>
</invoke>
</function_calls>

```shell
grep -A 10 "pub fn leaf_index_to_node_index" src/ingester/persist/leaf_node.rs
```

Now let me get a complete picture of deduplicate_nodes_from_refs and the context of its call:
<function_calls>
<invoke name="shell">
<parameter name="code">
python3 << 'EOF'
with open('src/api/method/get_queue_elements.rs', 'r') as f:
    lines = f.readlines()
    
# Find deduplicate_nodes_from_refs call
for i, line in enumerate(lines):
    if 'deduplicate_nodes_from_refs' in line and i < 900:
        print(f"Call site at line {i+1}:")
        start = max(0, i - 10)
        end = min(len(lines), i + 5)
        for j in range(start, end):
            print(f"{j+1:4d}: {lines[j]}", end='')
        print("\n---\n")
        break

# Find the function definition
for i, line in enumerate(lines):
    if 'fn deduplicate_nodes_from_refs' in line:
        print(f"Function definition at line {i+1}:")
        # Print function signature and first 20 lines
        for j in range(i, min(i + 50, len(lines))):
            print(f"{j+1:4d}: {lines[j]}", end='')
        break
EOF
</parameter>
</invoke>
</function_calls>

```shell
python3 << 'EOF'
with open('src/api/method/get_queue_elements.rs', 'r') as f:
    lines = f.readlines()
    
# Find all occurrences
for i, line in enumerate(lines):
    if 'deduplicate_nodes_from_refs' in line:
        print(f"Line {i+1}: {line.strip()}")
EOF
```

Let me check what functions call deduplicate_nodes_from_refs and their return types:
<function_calls>
<invoke name="shell">
<parameter name="code">
python3 << 'EOF'
with open('src/api/method/get_queue_elements.rs', 'r') as f:
    lines = f.readlines()

# Find the calling function
call_line_idx = None
for i, line in enumerate(lines):
    if 'deduplicate_nodes_from_refs' in line and 'fn deduplicate_nodes_from_refs' not in line:
        call_line_idx = i
        break

if call_line_idx:
    # Find the function definition that contains this call
    for i in range(call_line_idx, -1, -1):
        if lines[i].strip().startswith('fn ') or (lines[i].strip().startswith('pub ') and 'fn ' in lines[i]):
            print(f"Calling function at line {i+1}:")
            for j in range(i, min(i + 30, len(lines))):
                print(f"{j+1:4d}: {lines[j]}", end='')
            break

print("\n--- FUNCTION DEFINITION ---\n")

# Find the function definition
for i, line in enumerate(lines):
    if 'fn deduplicate_nodes_from_refs' in line:
        print(f"Function definition at line {i+1}:")
        for j in range(i, min(i + 50, len(lines))):
            print(f"{j+1:4d}: {lines[j]}", end='')
        break
EOF
```

Let me check the actual line count first and then read the relevant sections properly:
<function_calls>
<invoke name="shell">
<parameter name="code">
wc -l src/api/method/get_queue_elements.rs && echo "---" && sed -n '992,1055p' src/api/method/get_queue_elements.rs | cat -n
</parameter>
</invoke>
</function_calls>

```shell
sed -n '992,1055p' src/api/method/get_queue_elements.rs | cat -n
```

Perfect! Now let me look for the function that calls deduplicate_nodes_from_refs:
<function_calls>
<invoke name="shell">
<parameter name="code">
grep -n "deduplicate_nodes_from_refs\|fn merge_state_queue_proofs" src/api/method/get_queue_elements.rs
</parameter>
</invoke>
</function_calls>

```shell
grep -n "deduplicate_nodes_from_refs\|fn merge_state_queue_proofs" src/api/method/get_queue_elements.rs
```

Now let me read the function that calls deduplicate_nodes_from_refs:
<function_calls>
<invoke name="shell">
<parameter name="code">
sed -n '870,950p' src/api/method/get_queue_elements.rs | cat -n
</parameter>
</invoke>
</function_calls>

```shell
sed -n '870,950p' src/api/method/get_queue_elements.rs | cat -n
```

Now let me check the complete merge_state_queue_proofs function:
<function_calls>
<invoke name="shell">
<parameter name="code">
sed -n '830,880p' src/api/method/get_queue_elements.rs | cat -n
</parameter>
</invoke>
</function_calls>

```shell
sed -n '800,900p' src/api/method/get_queue_elements.rs | cat -n
```

Let me check what the leaf_index_to_node_index function does in both locations:
<function_calls>
<invoke name="shell">
<parameter name="code">
sed -n '1039,1050p' src/api/method/get_queue_elements.rs | cat -n
</parameter>
</invoke>
</function_calls>

```shell
sed -n '1039,1070p' src/api/method/get_queue_elements.rs | cat -n
```

Let me also check how leaf_index_to_node_index is implemented in the ingester module to see if there are any fallibility considerations:
<function_calls>
<invoke name="shell">
<parameter name="code">
grep -A 15 "pub fn leaf_index_to_node_index" src/ingester/persist/leaf_node.rs
</parameter>
</invoke>
</function_calls>

```shell
grep -A 15 "pub fn leaf_index_to_node_index" src/ingester/persist/leaf_node.rs
```

Let me check the entire leaf_node.rs file structure to understand the leaf_index_to_node_index better:
<function_calls>
<invoke name="shell">
<parameter name="code">
cat src/ingester/persist/leaf_node.rs | grep -A 20 "pub fn leaf_index_to_node_index"
</parameter>
</invoke>
</function_calls>

```shell
cat src/ingester/persist/leaf_node.rs | head -100 | tail -50
```

Now let me verify what error types are available:
<function_calls>
<invoke name="shell">
<parameter name="code">
cat src/api/error.rs | head -100
</parameter>
</invoke>
</function_calls>

```shell
head -50 src/api/error.rs
```

Let me now look at the encode_node_index function:
<function_calls>
<invoke name="shell">
<parameter name="code">
grep -B 2 -A 10 "fn encode_node_index\|fn leaf_index_to_node_index" src/api/method/get_queue_elements.rs | tail -30
</parameter>
</invoke>
</function_calls>

```shell
grep -B 2 -A 10 "fn encode_node_index\|fn leaf_index_to_node_index" src/api/method/get_queue_elements.rs
```

Now, let me check whether leaf_index_to_node_index could fail (e.g., due to overflow):
<function_calls>
<invoke name="shell">
<parameter name="code">
python3 << 'EOF'
# Let's understand if leaf_index_to_node_index can fail with overflow
# leaf_index: u32, tree_height: u32
# Operation: (leaf_index + 2^tree_height) as i64
# This could overflow if tree_height is large enough

# Max u32 = 4,294,967,295
# Max tree_height that fits: log2(i64::MAX - u32::MAX) ≈ 32

leaf_index_max = 2**32 - 1  # 4,294,967,295
tree_height_example = 32
calculation = leaf_index_max + 2**tree_height_example
i64_max = 2**63 - 1

print(f"leaf_index max: {leaf_index_max}")
print(f"2^32: {2**32}")
print(f"Calculation with tree_height=32: {calculation}")
print(f"i64::MAX: {i64_max}")
print(f"Would overflow: {calculation > i64_max}")

# With tree_height = 31
tree_height_31 = 31
calc_31 = leaf_index_max + 2**tree_height_31
print(f"\nWith tree_height=31: {calc_31}")
print(f"Would overflow: {calc_31 > i64_max}")
EOF

Repository: Lightprotocol/photon

Length of output: 150


🏁 Script executed:

sed -n '900,990p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 3149


🏁 Script executed:

grep -B 5 "fn merge_state_queue_proofs" src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 146


🏁 Script executed:

sed -n '820,850p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 1393


🏁 Script executed:

sed -n '920,990p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 2458


🏁 Script executed:

sed -n '1041,1055p' src/api/method/get_queue_elements.rs

Repository: Lightprotocol/photon

Length of output: 643


Make deduplicate_nodes_from_refs fallible to propagate index calculation errors

deduplicate_nodes_from_refs correctly deduplicates nodes across proofs using DB-fetched path nodes, but currently ignores potential overflow in leaf_index_to_node_index and encode_node_index. The former shifts 1i64 by tree_height bits (unbounded u32), and the latter shifts by level bits; both can overflow with invalid inputs.

Change the signature to return Result<Vec<Node>, PhotonApiError> and propagate errors from leaf_index_to_node_index with ? operators. Since merge_state_queue_proofs already returns Result, error propagation integrates naturally.

🤖 Prompt for AI Agents
In src/api/method/get_queue_elements.rs around lines 992 to 1038, make
deduplicate_nodes_from_refs return Result<Vec<Node>, PhotonApiError> and
propagate index-calculation errors instead of ignoring potential overflows:
update the function signature to return Result<_, PhotonApiError>, replace
direct calls to leaf_index_to_node_index and encode_node_index with their
fallible variants (or call them and use ?), and propagate any returned errors
with the ? operator; ensure all internal uses that can fail
(leaf_index_to_node_index, encode_node_index) are handled via ? and update
callers (e.g. merge_state_queue_proofs) to accept the Result return so errors
bubble up.

Comment on lines 186 to 216
let response = match txn_or_conn.get_database_backend() {
// HACK: I am executing SQL queries one by one in a loop because I am getting a weird syntax
// error when I am using parentheses.
DatabaseBackend::Postgres => {
let sql_statements = values.iter().map(|value| {
format!(
"( SELECT * FROM indexed_trees WHERE tree = {} AND value < {} ORDER BY value DESC LIMIT 1 )",
format_bytes(tree.clone(), txn_or_conn.get_database_backend()),
format_bytes(value.clone(), txn_or_conn.get_database_backend())
)
});
let full_query = sql_statements.collect::<Vec<String>>().join(" UNION ALL ");
txn_or_conn
.query_all(Statement::from_string(
txn_or_conn.get_database_backend(),
full_query,
))
.await
.map_err(|e| {
IngesterError::DatabaseError(format!("Failed to execute indexed query: {e}"))
})?
// Batch queries in chunks to avoid query plan explosion
// Each chunk uses UNION ALL which PostgreSQL optimizes well with index scans
const BATCH_SIZE: usize = 100;
let tree_bytes = format_bytes(tree.clone(), txn_or_conn.get_database_backend());
let mut all_results = vec![];

for chunk in values.chunks(BATCH_SIZE) {
let sql_statements = chunk.iter().map(|value| {
format!(
"(SELECT * FROM indexed_trees WHERE tree = {} AND value < {} ORDER BY value DESC LIMIT 1)",
tree_bytes,
format_bytes(value.clone(), txn_or_conn.get_database_backend())
)
});
let full_query = sql_statements.collect::<Vec<String>>().join(" UNION ALL ");

let chunk_results = txn_or_conn
.query_all(Statement::from_string(
txn_or_conn.get_database_backend(),
full_query,
))
.await
.map_err(|e| {
IngesterError::DatabaseError(format!("Failed to execute indexed query: {e}"))
})?;
all_results.extend(chunk_results);
}
all_results
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for similar batching patterns, UNION ALL usage, and BATCH_SIZE references
rg -n "UNION ALL|BATCH_SIZE|= ANY" --type rust -B 3 -A 3

Repository: Lightprotocol/photon

Length of output: 8481


🏁 Script executed:

# Check for comments or documentation about performance decisions in the file
rg -n "batch|performance|query plan|PostgreSQL" src/ingester/persist/indexed_merkle_tree/proof.rs -i -B 2 -A 2

Repository: Lightprotocol/photon

Length of output: 1566


🏁 Script executed:

# Look for git history or comments explaining why BATCH_SIZE=100 was chosen
rg -n "100" src/ingester/persist/indexed_merkle_tree/proof.rs -B 5 -A 5

Repository: Lightprotocol/photon

Length of output: 1352


🏁 Script executed:

# Search for similar query construction patterns in the codebase
rg -n "query_all|Statement::from_string" --type rust -B 2 -A 2 | head -50

Repository: Lightprotocol/photon

Length of output: 3714


Document the BATCH_SIZE=100 rationale and verify UNION ALL is optimal.

The comment about avoiding "query plan explosion" alludes to a real PostgreSQL concern (query planning complexity grows with UNION size), but the specific BATCH_SIZE=100 choice lacks justification. Add a comment explaining:

  • Why 100 is the threshold (query plan size limits, parameter constraints, or performance testing results)
  • Why UNION ALL is preferred over alternatives like = ANY(ARRAY[...]) or VALUES clauses with JOINs

This pattern appears in at least two functions in this file (lines 190 and 282), so documenting the rationale once would clarify the design decision across the codebase.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
src/api/method/get_queue_elements.rs (7)

35-43: Critical: debug_assert! stripped in release builds, allowing silent data corruption.

This issue was previously flagged but remains unaddressed. The debug_assert! at lines 36-41 is removed in release builds, so invalid inputs (level > tree_height) can silently corrupt node indices in production.

Apply this diff to add runtime validation:

 #[inline]
 fn encode_node_index(level: u8, position: u64, tree_height: u8) -> u64 {
-    debug_assert!(
+    assert!(
         level <= tree_height,
         "level {} > tree_height {}",
         level,
         tree_height
     );
     ((level as u64) << 56) | position
 }

47-49: Critical: Integer overflow risk for tree_height >= 32.

This issue was previously flagged but remains unaddressed. If tree_height >= 32, the expression 2_i64.pow(tree_height - 1) will overflow i64, causing a panic in debug mode or wrap-around in release mode.

Apply this diff to add validation:

 #[inline]
 fn leaf_index_to_node_index(leaf_index: u32, tree_height: u32) -> i64 {
+    assert!(
+        tree_height > 0 && tree_height < 32,
+        "tree_height {} would overflow i64 or is invalid",
+        tree_height
+    );
     2_i64.pow(tree_height - 1) + leaf_index as i64
 }

277-327: Major: Missing consistency checks when merging output and input queue proofs.

This issue was previously flagged but remains unaddressed. The function silently uses first-seen values for tree_height, initial_root, and root_seq without validating that output and input queues agree. If independently-fetched queues represent different state versions, the result is an incoherent StateQueueData.

Since deduplicate_nodes_from_refs uses a single tree_height for all proofs, mismatched heights produce incorrect node calculations.

Apply this diff to add explicit validation:

     if let Some(ref proof_data) = input_proof_data {
-        if tree_height.is_none() {
-            tree_height = Some(proof_data.tree_height);
-        }
+        if let Some(h) = tree_height {
+            if h != proof_data.tree_height {
+                return Err(PhotonApiError::UnexpectedError(format!(
+                    "Mismatched tree heights between output and input proofs: {} vs {}",
+                    h, proof_data.tree_height
+                )));
+            }
+        } else {
+            tree_height = Some(proof_data.tree_height);
+        }
         all_path_nodes.extend(proof_data.path_nodes.clone());
         for proof in &proof_data.proofs {
-            if initial_root.is_none() {
-                initial_root = Some(proof.root.clone());
-                root_seq = Some(proof.root_seq);
-            }
+            if let Some(ref root) = initial_root {
+                if *root != proof.root {
+                    return Err(PhotonApiError::UnexpectedError(
+                        "Mismatched initial_root across state queue proofs".to_string(),
+                    ));
+                }
+            } else {
+                initial_root = Some(proof.root.clone());
+                root_seq = Some(proof.root_seq);
+            }
             all_proofs.push(proof);
         }
     }

815-816: Critical: Potential data loss from u8 cast of tree height.

This issue was previously flagged but remains unaddressed. Lines 816, 826, and 840 cast tree_info.height to u8. If the tree height exceeds 255, this silently truncates the value, leading to incorrect node index encoding.

Apply validation before calling encode_node_index:

+    if tree_info.height > u8::MAX as u32 {
+        return Err(PhotonApiError::ValidationError(format!(
+            "Tree height {} exceeds maximum supported height of {}",
+            tree_info.height, u8::MAX
+        )));
+    }
+    let tree_height_u8 = tree_info.height as u8;
+
     // ... later in the code:
     let leaf_idx =
-        encode_node_index(0, proof.lowElementLeafIndex as u64, tree_info.height as u8);
+        encode_node_index(0, proof.lowElementLeafIndex as u64, tree_height_u8);
     
     // ... and update all other call sites similarly

Also applies to: 826-826, 839-840


843-846: Minor: Silent error suppression in hash computation loop.

This issue was previously flagged but remains unaddressed. Line 843 uses break when Poseidon::hashv returns an error, silently stopping the proof path computation without propagating the error. This could lead to incomplete node maps and incorrect proofs.

Consider propagating the error instead:

             match parent_hash {
                 Ok(hash) => {
                     current_hash = Hash::from(hash);
                     let parent_pos = pos / 2;
                     let parent_idx =
                         encode_node_index((level + 1) as u8, parent_pos, tree_info.height as u8);
                     nodes_map.insert(parent_idx, current_hash.clone());
                 }
-                Err(_) => {
-                    break;
+                Err(e) => {
+                    return Err(PhotonApiError::UnexpectedError(format!(
+                        "Failed to compute parent hash at level {}: {}", level, e
+                    )));
                 }
             }

988-1031: Major: deduplicate_nodes_from_refs should be fallible to propagate index calculation errors.

This issue was previously flagged but remains unaddressed. The function calls leaf_index_to_node_index and encode_node_index, both of which can overflow with invalid inputs. The current signature returns Vec<Node> directly, preventing error propagation.

Change the signature to return Result<Vec<Node>, PhotonApiError> and propagate errors from index calculations.

Apply this diff:

-fn deduplicate_nodes_from_refs(
+fn deduplicate_nodes_from_refs(
     proofs: &[&crate::ingester::persist::MerkleProofWithContext],
     tree_height: u8,
     path_nodes: &HashMap<i64, Hash>,
-) -> Vec<Node> {
+) -> Result<Vec<Node>, PhotonApiError> {
     let mut nodes_map: HashMap<u64, Hash> = HashMap::new();
     let tree_height_u32 = tree_height as u32 + 1;
 
     for proof_ctx in proofs {
         let mut pos = proof_ctx.leaf_index as u64;
         let mut db_node_idx = leaf_index_to_node_index(proof_ctx.leaf_index, tree_height_u32);
         // ... rest of function
     }
 
-    sorted_nodes
+    Ok(sorted_nodes
         .into_iter()
         .map(|(index, hash)| Node { index, hash })
-        .collect()
+        .collect())
 }

Then update the caller at line 320:

-    let nodes = deduplicate_nodes_from_refs(&all_proofs, height, &all_path_nodes);
+    let nodes = deduplicate_nodes_from_refs(&all_proofs, height, &all_path_nodes)?;

1056-1070: Critical: u64 to u32 truncation in leaf index.

This issue was previously flagged but remains unaddressed. Line 1059 casts leaf_idx (u64) to u32 when calling leaf_index_to_node_index. This silently truncates leaf indices exceeding u32::MAX (~4.3 billion), corrupting the node index calculation and path fetching.

Apply this diff to validate the cast:

 let all_path_indices: Vec<i64> = leaf_indices
     .iter()
     .flat_map(|&leaf_idx| {
+        if leaf_idx > u32::MAX as u64 {
+            log::error!("Leaf index {} exceeds u32::MAX", leaf_idx);
+            return vec![];
+        }
         let node_idx = leaf_index_to_node_index(leaf_idx as u32, tree_height);

Or better, refactor leaf_index_to_node_index to accept u64 and validate internally, then update all call sites.

src/ingester/persist/indexed_merkle_tree/proof.rs (1)

182-217: Early-return + batched Postgres path look good; BATCH_SIZE rationale still undocumented.

The empty-input guard (Lines 182–184) is a nice win: it avoids building SQL when there’s nothing to query and keeps behavior consistent with the previous “returns empty map” semantics.

The Postgres batching logic (Lines 188–217) looks correct: it reuses a preformatted tree literal, chunks inputs, and UNION‑ALLs per‑value SELECT … ORDER BY value DESC LIMIT 1, then materializes models from the combined rows. Error handling is also consistent with the SQLite branch.

The remaining gap from the earlier review is that const BATCH_SIZE: usize = 100; is still a magic number, and the UNION‑ALL approach isn’t documented anywhere central. Consider:

  • Adding a brief comment somewhere (or a module‑level doc) explaining why 100 was chosen and why UNION ALL is preferred here.
  • Optionally hoisting BATCH_SIZE to a shared const so both this function and the by‑address variant can be tuned together.
🧹 Nitpick comments (2)
src/ingester/persist/indexed_merkle_tree/proof.rs (2)

261-364: New query_next_smallest_elements_by_address helper is functionally sound; consider minor robustness tweaks.

The helper correctly:

  • Short‑circuits on empty input.
  • Uses backend‑aware hex formatting for tree/value and tags each row with an input_address column so results can be keyed by the original address.
  • Mirrors behavior across Postgres ({val}::bytea as input_address) and SQLite (CAST({val} AS BLOB) as input_address), and populates a HashMap<Vec<u8>, indexed_trees::Model> from the result rows.

Two small follow‑ups you might consider:

  • If the caller ever passes duplicate addresses, later entries will silently overwrite earlier ones in the HashMap. If that’s undesirable, either dedupe values up front or document that behavior in the doc comment.
  • BATCH_SIZE here duplicates the constant in query_next_smallest_elements; if you keep both, a shared const at module scope would reduce the chance of them drifting.

Overall, the implementation is consistent and readable.


379-388: Batched exclusion‑range v2 flow is correct; proof lookup could be made more direct.

Using query_next_smallest_elements_by_address to build address_to_range (Lines 379–388) and then iterating the original addresses slice (Lines 394–397) preserves stable address ordering and correctly handles:

  • Addresses with a concrete range node (hashed into leaf_nodes_with_indices and address_to_model).
  • Addresses without a range node (later picked up by the empty‑proof path).

The final join between address_to_model and leaf_proofs is done by searching leaf_proofs.iter().find(|p| p.leaf_index == model.leaf_index as u32) for each address. That’s perfectly fine at current scales (DB work dominates), but if this ever runs on large batches, you might want to prebuild a HashMap<u32, &MerkleProofWithContext> keyed by leaf_index to avoid repeated linear scans.

Functionally, the new flow looks consistent and maintains correct empty‑tree handling via proof_for_empty_tree_with_seq.

Also applies to: 394-397

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 711c47b and 690dd5a.

📒 Files selected for processing (2)
  • src/api/method/get_queue_elements.rs (2 hunks)
  • src/ingester/persist/indexed_merkle_tree/proof.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/ingester/persist/indexed_merkle_tree/proof.rs (1)
src/common/mod.rs (1)
  • format_bytes (157-164)
src/api/method/get_queue_elements.rs (4)
src/common/mod.rs (1)
  • format_bytes (157-164)
src/ingester/persist/leaf_node_proof.rs (2)
  • get_multiple_compressed_leaf_proofs_by_indices (14-90)
  • indices (42-42)
src/ingester/persist/persisted_state_tree.rs (2)
  • get_subtrees (132-209)
  • bytes (571-574)
src/ingester/persist/leaf_node.rs (4)
  • leaf_index_to_node_index (30-32)
  • from (35-42)
  • from (46-53)
  • from (57-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (3)
src/api/method/get_queue_elements.rs (3)

179-275: LGTM: Well-structured orchestration with proper validation.

The main entry point correctly validates that at least one queue is requested, orchestrates per-queue fetching, merges proofs for state queues, and handles address queues separately. Error propagation is appropriate.


595-669: LGTM: Robust hash chain computation with proper error handling.

The function correctly handles empty inputs, performs batch processing, extracts per-queue-type values with descriptive error messages, and propagates hash chain creation errors appropriately.


329-593: LGTM: Comprehensive fetch logic with excellent error handling.

The function properly validates inputs, detects pruning scenarios, handles batch size truncation, and includes robust error handling for hash conversions. Notably, the previous unwrap() issue at lines 477-488 has been correctly addressed with proper map_err and collect::<Result<...>>()? patterns.

The hash chain caching with fallback to local computation is well-designed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants