Skip to content

Conversation

@SwenSchaeferjohann
Copy link

@SwenSchaeferjohann SwenSchaeferjohann commented Nov 23, 2025

Overview

  • Summary of changes

Testing

  • Testing performed to validate the changes

Summary by CodeRabbit

  • Patch Release

    • Version updated to 0.51.1
  • Improvements

    • Better token recognition: handles multiple token formats.
    • Large-integer safety: key numeric fields (balances, amounts, lamports, discriminators) now serialize as strings to avoid precision loss.
    • SQLite precision migration: database migration added to preserve integer precision on SQLite backends.
  • Chores

    • Updated .gitignore to ignore cursor state and photon.log files.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 23, 2025

Caution

Review failed

The head commit changed during the review from 62fbb6a to 7f9e0ab.

Walkthrough

Added string-based serialization for several u64 fields, broadened token parsing to accept multiple discriminator forms, introduced SQLite-precision storage and SQL changes, added a SQLite-specific migration to convert numeric columns to TEXT, and bumped the package version and .gitignore.

Changes

Cohort / File(s) Summary
Config & Version
\.gitignore, Cargo.toml
Added ignore rules for .cursor and **/photon.log. Bumped crate version 0.51.0 → 0.51.1.
UnsignedInteger serializer
src/common/typedefs/unsigned_integer.rs
Added pub fn serialize_u64_as_string to serialize UnsignedInteger as a string for serde.
Serde string serialization attributes
src/common/typedefs/account/v1.rs, src/common/typedefs/account/v2.rs, src/common/typedefs/token_data.rs, src/api/method/utils.rs
Annotated u64-backed fields (lamports, discriminator, amount, AccountBalanceResponse.value) with #[serde(serialize_with = "serialize_u64_as_string")] and imported the new serializer where used.
Token parsing logic
src/common/typedefs/account/v1.rs, src/common/typedefs/account/v2.rs
Broadened parse_token_data to compute three discriminator-based flags (v1, v2, sha_flat) from data.discriminator and parse token data when owner == COMPRESSED_TOKEN_PROGRAM AND any flag is true; preserves existing parse-error handling.
TokenData imports
src/common/typedefs/token_data.rs
Adjusted imports to include serialize_u64_as_string and applied serializer to TokenData.amount.
API utils
src/api/method/utils.rs
Imported serialize_u64_as_string and added serde serializer attribute to AccountBalanceResponse.value.
Ingester SQLite precision handling
src/ingester/persist/mod.rs
Backend-specific handling: SQLite reads balances as String → parses to u64/Decimal, emits quoted TEXT values for INSERTs, and generates SQLite-specific ON CONFLICT arithmetic using CAST; Postgres path unchanged.
SQLite precision migration
src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs, src/migration/migrations/standard/mod.rs
Added new Migration implementing MigrationTrait that, on SQLite only, converts numeric REAL columns to TEXT across several tables (accounts, token_accounts, owner_balances, token_owner_balances) with up/down methods; registered migration in standard migrations list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Reviewers should focus on:
    • Migration SQL correctness and idempotence, and proper index recreation in m20251127_000001_sqlite_precision_fix.rs.
    • SQLite-specific read/insert/ON CONFLICT arithmetic changes in src/ingester/persist/mod.rs for correctness and SQL injection risk.
    • Consistency and test coverage for the new serialize_u64_as_string usage across types and any serde/deserialization implications.
    • Token discriminator logic in both v1.rs and v2.rs to ensure flags are derived identically and edge cases handled.

Poem

🐰 I nibble bytes and spin a tune,
Lamports as strings beneath the moon,
Parsers stretched to greet three kinds,
Migrations hop through careful lines,
Version bumped — let's celebrate soon! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description follows the required template structure with both 'Overview' and 'Testing' sections present. However, both sections contain only placeholder text ('Summary of changes' and 'Testing performed to validate the changes') with no actual details filled in, making the description entirely vacant of meaningful content. Fill in the 'Overview' section with specific details about the changes (serialization fixes, SQLite precision handling, token parsing logic, migration). Describe the actual testing performed to validate these changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: parse ctoken accounts explicitly' is partially related to the changeset. While the PR does add logic to explicitly parse compressed token (ctoken) accounts in parse_token_data functions, the changes are much broader, including serialization adjustments, version bumps, SQLite precision fixes, and database schema migrations—making the title incomplete but not misleading.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@SwenSchaeferjohann SwenSchaeferjohann changed the title parse ctoken accounts explicitly chore: parse ctoken accounts explicitly Nov 23, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/common/typedefs/account/v2.rs (1)

44-57: Consider extracting discriminator constants and shared parsing logic.

The magic byte arrays for token discriminators are duplicated here and in v1.rs. Extracting these as named constants would improve readability and maintainability:

// In a shared module (e.g., token_data.rs or constants.rs)
pub const TOKEN_V1_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0];
pub const TOKEN_V2_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3];
pub const TOKEN_SHA_FLAT_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4];

Additionally, you could consider extracting the parsing logic into a shared helper function since Account::parse_token_data and AccountV2::parse_token_data are nearly identical, reducing duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de9080f and 21c40cb.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • Cargo.toml (1 hunks)
  • src/common/typedefs/account/v1.rs (1 hunks)
  • src/common/typedefs/account/v2.rs (1 hunks)
🔇 Additional comments (4)
Cargo.toml (1)

10-10: LGTM!

Patch version bump is appropriate for the token parsing changes introduced in this PR.

.gitignore (1)

16-18: LGTM!

Appropriate additions to ignore Cursor IDE configuration and log files across all directories.

src/common/typedefs/account/v1.rs (1)

36-49: Implementation is consistent with v2.rs.

The parsing logic correctly gates token data parsing on both owner verification and discriminator matching. This explicit approach is clearer about which token types are supported.

Note: As mentioned in the v2.rs review, consider extracting shared constants and parsing logic to reduce duplication.

src/common/typedefs/account/v2.rs (1)

51-54: I'll help you verify the TokenData compatibility concern across all token types. Let me search the codebase for relevant information.
<function_calls>

#!/bin/bash

Search for TokenData struct definition

rg -n "struct TokenData" --type rust -A 10

Also search for any enum or variant definitions related to tokens

rg -n "enum.*Token" --type rust -C 2


</function_calls>

Now let me examine the file under review to understand the context:
<function_calls>

#!/bin/bash

Read the file to understand the context around lines 51-54

cat -n src/common/typedefs/account/v2.rs | head -80


</function_calls>

Let me also search for the discriminator values and token type handling:
<function_calls>

#!/bin/bash

Search for discriminator definitions and v1, v2, sha_flat references

rg -n "discriminator|v1|v2|sha_flat" --type rust src/common/typedefs/account/ -B 2 -A 2


</function_calls>

<function_calls>

#!/bin/bash

Search for token data parsing and deserialization patterns

rg -n "try_from_slice|TokenData::try_from" --type rust -C 5


</function_calls>

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/common/typedefs/account/v1.rs (1)

38-44: Consider extracting discriminator constants for clarity.

The magic byte arrays are used inline without documentation. Named constants would improve readability and make protocol versioning easier to track.

+const V1_TOKEN_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0];
+const V2_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3];
+const SHA_FLAT_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4];
+
 pub fn parse_token_data(&self) -> Result<Option<TokenData>, IngesterError> {
     match self.data.as_ref() {
         Some(data) => {
-            let is_v1_token = data.discriminator.0.to_le_bytes() == [2, 0, 0, 0, 0, 0, 0, 0];
-            let is_v2_token = data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 3];
-            let is_sha_flat_token =
-                data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 4];
+            let bytes = data.discriminator.0.to_le_bytes();
+            let is_v1_token = bytes == V1_TOKEN_DISCRIMINATOR;
+            let is_v2_token = bytes == V2_TOKEN_DISCRIMINATOR;
+            let is_sha_flat_token = bytes == SHA_FLAT_TOKEN_DISCRIMINATOR;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21c40cb and 7f9e0ab.

📒 Files selected for processing (8)
  • src/api/method/utils.rs (2 hunks)
  • src/common/typedefs/account/v1.rs (4 hunks)
  • src/common/typedefs/account/v2.rs (3 hunks)
  • src/common/typedefs/token_data.rs (2 hunks)
  • src/common/typedefs/unsigned_integer.rs (1 hunks)
  • src/ingester/persist/mod.rs (2 hunks)
  • src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs (1 hunks)
  • src/migration/migrations/standard/mod.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/common/typedefs/account/v2.rs
🧰 Additional context used
🧬 Code graph analysis (5)
src/api/method/utils.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
  • serialize_u64_as_string (95-100)
src/ingester/persist/mod.rs (2)
src/api/method/get_batch_address_update_info.rs (1)
  • row (89-89)
src/common/typedefs/unsigned_integer.rs (1)
  • value (55-56)
src/common/typedefs/unsigned_integer.rs (1)
src/api/method/utils.rs (1)
  • value (32-34)
src/common/typedefs/token_data.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
  • serialize_u64_as_string (95-100)
src/common/typedefs/account/v1.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
  • serialize_u64_as_string (95-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (13)
src/ingester/persist/mod.rs (2)

347-362: LGTM! Backend-specific balance parsing is well-structured.

The SQLite branch correctly reads the TEXT column as a String and parses to u64 before converting to Decimal, with appropriate error handling.


388-399: Consistent handling of value formatting across backends.

The SQLite branch correctly quotes values as strings for TEXT columns while Postgres uses numeric literals. This aligns with the migration that converts columns to TEXT for SQLite.

src/migration/migrations/standard/mod.rs (1)

17-17: LGTM! Migration properly registered.

The new SQLite precision fix migration follows the established naming convention and is correctly appended to the migrations list.

Also applies to: 35-35

src/common/typedefs/token_data.rs (1)

6-10: LGTM! String serialization for token amounts prevents JavaScript precision loss.

The amount field correctly uses the custom serializer to emit string values, which is consistent with the pattern applied across other large integer fields in the PR.

Also applies to: 44-45

src/common/typedefs/unsigned_integer.rs (1)

93-100: LGTM! Well-documented serialization helper.

The function correctly converts the inner u64 to a string representation, preventing precision loss when serializing to JSON for clients that may have limited integer precision (e.g., JavaScript's 53-bit safe integers).

src/api/method/utils.rs (1)

7-7: LGTM! API response now serializes balance as string.

The AccountBalanceResponse.value field correctly uses the custom serializer, ensuring consistent string-based serialization for large integer values across all API responses.

Also applies to: 615-616

src/common/typedefs/account/v1.rs (2)

7-7: LGTM on serialization changes.

The serialize_u64_as_string annotations on lamports and discriminator correctly address potential precision loss when serializing large u64 values to JSON. This aligns with the companion SQLite migration that stores these values as TEXT.

Also applies to: 22-23, 63-64


43-53: LGTM on token parsing logic.

The conditional correctly gates parsing on both owner == COMPRESSED_TOKEN_PROGRAM and a matching discriminator, preventing unnecessary parse attempts for non-token accounts.

src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs (5)

9-18: LGTM on helper function.

The execute_sql helper cleanly wraps raw SQL execution with proper backend selection.


63-67: Pre-existing precision loss cannot be recovered.

CAST(CAST(lamports AS INTEGER) AS TEXT) converts the current REAL values, but if precision was already lost when the data was originally stored (values > 2^53), this migration cannot recover those bits. Consider:

  1. Documenting this limitation in release notes
  2. For critical deployments, verifying data integrity post-migration

This is acceptable as a forward-fix to prevent future precision loss.


84-159: Consistent migration pattern for remaining tables.

The same REAL-to-TEXT conversion pattern is correctly applied to token_accounts.amount, owner_balances.lamports, and token_owner_balances.amount. The previously noted concerns about pre-existing precision loss and transaction atomicity apply here as well.


164-170: LGTM on down migration structure.

The rollback correctly converts TEXT back to REAL with an appropriate warning that precision will be lost again for large values. The symmetric pattern ensures clean reversibility.


75-79: All recreated indexes match the original schema definitions.

Verification confirms that all five indexes are correctly recreated:

  • accounts_address_spent_idx on (address, seq) - original from m20220101
  • accounts_owner_hash_idx on (spent, owner, hash) UNIQUE - original from m20220101
  • accounts_queue_idx on (tree, in_output_queue, leaf_index) WHERE in_output_queue = 1 - original from m20250206
  • idx_accounts_tree_nullified_spent on (tree, nullified_in_tree, spent) - added in m20250822
  • idx_accounts_tree_nullifier_queue on (tree, nullifier_queue_index) - added in m20250822

Both up() and down() methods preserve all indexes identically, preventing performance regressions or constraint violations.

Comment on lines 33 to 82
execute_sql(
manager,
r#"
CREATE TABLE accounts_precision_fix (
hash BLOB NOT NULL PRIMARY KEY,
data BLOB,
data_hash BLOB,
address BLOB,
owner BLOB NOT NULL,
tree BLOB NOT NULL,
queue BLOB NULL,
leaf_index BIGINT NOT NULL,
seq BIGINT,
slot_created BIGINT NOT NULL,
spent BOOLEAN NOT NULL,
prev_spent BOOLEAN,
lamports TEXT NOT NULL,
discriminator TEXT,
in_output_queue BOOLEAN NOT NULL DEFAULT TRUE,
nullifier BLOB,
tx_hash BLOB,
nullifier_queue_index BIGINT NULL,
nullified_in_tree BOOLEAN NOT NULL DEFAULT FALSE,
tree_type INTEGER NULL
);
INSERT INTO accounts_precision_fix
SELECT
hash, data, data_hash, address, owner, tree, queue, leaf_index, seq,
slot_created, spent, prev_spent,
CAST(CAST(lamports AS INTEGER) AS TEXT) as lamports,
CASE WHEN discriminator IS NOT NULL
THEN CAST(CAST(discriminator AS INTEGER) AS TEXT)
ELSE NULL
END as discriminator,
in_output_queue, nullifier, tx_hash, nullifier_queue_index,
nullified_in_tree, tree_type
FROM accounts;
DROP TABLE accounts;
ALTER TABLE accounts_precision_fix RENAME TO accounts;
CREATE INDEX accounts_address_spent_idx ON accounts (address, seq);
CREATE UNIQUE INDEX accounts_owner_hash_idx ON accounts (spent, owner, hash);
CREATE INDEX accounts_queue_idx ON accounts (tree, in_output_queue, leaf_index) WHERE in_output_queue = 1;
CREATE INDEX idx_accounts_tree_nullified_spent ON accounts (tree, nullified_in_tree, spent);
CREATE INDEX idx_accounts_tree_nullifier_queue ON accounts (tree, nullifier_queue_index);
"#,
)
.await?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Transaction atomicity concern for multi-statement execution.

SQLite implicitly commits after each DDL statement. If the migration fails between DROP TABLE accounts (line 72) and ALTER TABLE ... RENAME (line 73), the original table is lost with only the temporary table remaining.

Consider splitting the raw SQL into individual statements and wrapping them in explicit transaction control, or document that database backups are required before running this migration.

🤖 Prompt for AI Agents
In src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs
around lines 33 to 82, the migration issues DDL across multiple statements
without an explicit transaction which risks losing the original table if
execution fails between DROP and RENAME; wrap the entire multi-statement SQL
sequence in an explicit transaction (BEGIN TRANSACTION; ... COMMIT;) (or BEGIN
IMMEDIATE if you need to lock), so all statements succeed or none do, and ensure
the code invokes rollback on error; alternatively split into individually
executed statements and run them inside a managed transaction so failures
trigger a rollback rather than leaving the DB in a partial state.

@SwenSchaeferjohann SwenSchaeferjohann force-pushed the swen/parse-token branch 2 times, most recently from 7f9e0ab to 21c40cb Compare December 2, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants