chore: parse ctoken accounts explicitly #44

SwenSchaeferjohann · 2025-11-23T18:06:34Z

Overview

Summary of changes

Testing

Testing performed to validate the changes

Summary by CodeRabbit

Patch Release
- Version updated to 0.51.1
Improvements
- Better token recognition: handles multiple token formats.
- Large-integer safety: key numeric fields (balances, amounts, lamports, discriminators) now serialize as strings to avoid precision loss.
- SQLite precision migration: database migration added to preserve integer precision on SQLite backends.
Chores
- Updated .gitignore to ignore cursor state and photon.log files.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-23T18:06:39Z

Caution

Review failed

The head commit changed during the review from 62fbb6a to 7f9e0ab.

Walkthrough

Added string-based serialization for several u64 fields, broadened token parsing to accept multiple discriminator forms, introduced SQLite-precision storage and SQL changes, added a SQLite-specific migration to convert numeric columns to TEXT, and bumped the package version and .gitignore.

Changes

Cohort / File(s)	Summary
Config & Version `\.gitignore`, `Cargo.toml`	Added ignore rules for `.cursor` and `**/photon.log`. Bumped crate version 0.51.0 → 0.51.1.
UnsignedInteger serializer `src/common/typedefs/unsigned_integer.rs`	Added `pub fn serialize_u64_as_string` to serialize `UnsignedInteger` as a string for serde.
Serde string serialization attributes `src/common/typedefs/account/v1.rs`, `src/common/typedefs/account/v2.rs`, `src/common/typedefs/token_data.rs`, `src/api/method/utils.rs`	Annotated u64-backed fields (`lamports`, `discriminator`, `amount`, `AccountBalanceResponse.value`) with `#[serde(serialize_with = "serialize_u64_as_string")]` and imported the new serializer where used.
Token parsing logic `src/common/typedefs/account/v1.rs`, `src/common/typedefs/account/v2.rs`	Broadened `parse_token_data` to compute three discriminator-based flags (v1, v2, sha_flat) from `data.discriminator` and parse token data when `owner == COMPRESSED_TOKEN_PROGRAM` AND any flag is true; preserves existing parse-error handling.
TokenData imports `src/common/typedefs/token_data.rs`	Adjusted imports to include `serialize_u64_as_string` and applied serializer to `TokenData.amount`.
API utils `src/api/method/utils.rs`	Imported `serialize_u64_as_string` and added serde serializer attribute to `AccountBalanceResponse.value`.
Ingester SQLite precision handling `src/ingester/persist/mod.rs`	Backend-specific handling: SQLite reads balances as `String` → parses to u64/Decimal, emits quoted TEXT values for INSERTs, and generates SQLite-specific ON CONFLICT arithmetic using CAST; Postgres path unchanged.
SQLite precision migration `src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs`, `src/migration/migrations/standard/mod.rs`	Added new `Migration` implementing `MigrationTrait` that, on SQLite only, converts numeric REAL columns to TEXT across several tables (accounts, token_accounts, owner_balances, token_owner_balances) with up/down methods; registered migration in standard migrations list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Reviewers should focus on:
- Migration SQL correctness and idempotence, and proper index recreation in m20251127_000001_sqlite_precision_fix.rs.
- SQLite-specific read/insert/ON CONFLICT arithmetic changes in src/ingester/persist/mod.rs for correctness and SQL injection risk.
- Consistency and test coverage for the new serialize_u64_as_string usage across types and any serde/deserialization implications.
- Token discriminator logic in both v1.rs and v2.rs to ensure flags are derived identically and edge cases handled.

Poem

🐰 I nibble bytes and spin a tune,
Lamports as strings beneath the moon,
Parsers stretched to greet three kinds,
Migrations hop through careful lines,
Version bumped — let's celebrate soon! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description follows the required template structure with both 'Overview' and 'Testing' sections present. However, both sections contain only placeholder text ('Summary of changes' and 'Testing performed to validate the changes') with no actual details filled in, making the description entirely vacant of meaningful content.	Fill in the 'Overview' section with specific details about the changes (serialization fixes, SQLite precision handling, token parsing logic, migration). Describe the actual testing performed to validate these changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: parse ctoken accounts explicitly' is partially related to the changeset. While the PR does add logic to explicitly parse compressed token (ctoken) accounts in parse_token_data functions, the changes are much broader, including serialization adjustments, version bumps, SQLite precision fixes, and database schema migrations—making the title incomplete but not misleading.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/common/typedefs/account/v2.rs (1)
44-57: Consider extracting discriminator constants and shared parsing logic.

The magic byte arrays for token discriminators are duplicated here and in v1.rs. Extracting these as named constants would improve readability and maintainability:
// In a shared module (e.g., token_data.rs or constants.rs)
pub const TOKEN_V1_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0];
pub const TOKEN_V2_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3];
pub const TOKEN_SHA_FLAT_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4];
Additionally, you could consider extracting the parsing logic into a shared helper function since Account::parse_token_data and AccountV2::parse_token_data are nearly identical, reducing duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de9080f and 21c40cb.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

.gitignore (1 hunks)
Cargo.toml (1 hunks)
src/common/typedefs/account/v1.rs (1 hunks)
src/common/typedefs/account/v2.rs (1 hunks)

🔇 Additional comments (4)

Cargo.toml (1)

10-10: LGTM!

Patch version bump is appropriate for the token parsing changes introduced in this PR.

.gitignore (1)

16-18: LGTM!

Appropriate additions to ignore Cursor IDE configuration and log files across all directories.

src/common/typedefs/account/v1.rs (1)

36-49: Implementation is consistent with v2.rs.

The parsing logic correctly gates token data parsing on both owner verification and discriminator matching. This explicit approach is clearer about which token types are supported.

Note: As mentioned in the v2.rs review, consider extracting shared constants and parsing logic to reduce duplication.

src/common/typedefs/account/v2.rs (1)

51-54: I'll help you verify the TokenData compatibility concern across all token types. Let me search the codebase for relevant information.
<function_calls>

#!/bin/bash

Search for TokenData struct definition

rg -n "struct TokenData" --type rust -A 10

Also search for any enum or variant definitions related to tokens

rg -n "enum.*Token" --type rust -C 2

</function_calls>

Now let me examine the file under review to understand the context:
<function_calls>

#!/bin/bash

Read the file to understand the context around lines 51-54

cat -n src/common/typedefs/account/v2.rs | head -80

</function_calls>

Let me also search for the discriminator values and token type handling:
<function_calls>

#!/bin/bash

Search for discriminator definitions and v1, v2, sha_flat references

rg -n "discriminator|v1|v2|sha_flat" --type rust src/common/typedefs/account/ -B 2 -A 2

</function_calls>

<function_calls>

#!/bin/bash

Search for token data parsing and deserialization patterns

rg -n "try_from_slice|TokenData::try_from" --type rust -C 5

</function_calls>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/common/typedefs/account/v1.rs (1)

38-44: Consider extracting discriminator constants for clarity.

The magic byte arrays are used inline without documentation. Named constants would improve readability and make protocol versioning easier to track.

+const V1_TOKEN_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0];
+const V2_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3];
+const SHA_FLAT_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4];
+
 pub fn parse_token_data(&self) -> Result<Option<TokenData>, IngesterError> {
     match self.data.as_ref() {
         Some(data) => {
-            let is_v1_token = data.discriminator.0.to_le_bytes() == [2, 0, 0, 0, 0, 0, 0, 0];
-            let is_v2_token = data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 3];
-            let is_sha_flat_token =
-                data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 4];
+            let bytes = data.discriminator.0.to_le_bytes();
+            let is_v1_token = bytes == V1_TOKEN_DISCRIMINATOR;
+            let is_v2_token = bytes == V2_TOKEN_DISCRIMINATOR;
+            let is_sha_flat_token = bytes == SHA_FLAT_TOKEN_DISCRIMINATOR;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21c40cb and 7f9e0ab.

📒 Files selected for processing (8)

src/api/method/utils.rs (2 hunks)
src/common/typedefs/account/v1.rs (4 hunks)
src/common/typedefs/account/v2.rs (3 hunks)
src/common/typedefs/token_data.rs (2 hunks)
src/common/typedefs/unsigned_integer.rs (1 hunks)
src/ingester/persist/mod.rs (2 hunks)
src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs (1 hunks)
src/migration/migrations/standard/mod.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/common/typedefs/account/v2.rs

🧰 Additional context used

🧬 Code graph analysis (5)

src/api/method/utils.rs (1)

src/common/typedefs/unsigned_integer.rs (1)

serialize_u64_as_string (95-100)

src/ingester/persist/mod.rs (2)

src/api/method/get_batch_address_update_info.rs (1)

row (89-89)

src/common/typedefs/unsigned_integer.rs (1)

value (55-56)

src/common/typedefs/unsigned_integer.rs (1)

src/api/method/utils.rs (1)

value (32-34)

src/common/typedefs/token_data.rs (1)

src/common/typedefs/unsigned_integer.rs (1)

serialize_u64_as_string (95-100)

src/common/typedefs/account/v1.rs (1)

src/common/typedefs/unsigned_integer.rs (1)

serialize_u64_as_string (95-100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (13)

src/ingester/persist/mod.rs (2)

347-362: LGTM! Backend-specific balance parsing is well-structured.

The SQLite branch correctly reads the TEXT column as a String and parses to u64 before converting to Decimal, with appropriate error handling.

388-399: Consistent handling of value formatting across backends.

The SQLite branch correctly quotes values as strings for TEXT columns while Postgres uses numeric literals. This aligns with the migration that converts columns to TEXT for SQLite.

src/migration/migrations/standard/mod.rs (1)

17-17: LGTM! Migration properly registered.

The new SQLite precision fix migration follows the established naming convention and is correctly appended to the migrations list.

Also applies to: 35-35

src/common/typedefs/token_data.rs (1)

6-10: LGTM! String serialization for token amounts prevents JavaScript precision loss.

The amount field correctly uses the custom serializer to emit string values, which is consistent with the pattern applied across other large integer fields in the PR.

Also applies to: 44-45

src/common/typedefs/unsigned_integer.rs (1)

93-100: LGTM! Well-documented serialization helper.

The function correctly converts the inner u64 to a string representation, preventing precision loss when serializing to JSON for clients that may have limited integer precision (e.g., JavaScript's 53-bit safe integers).

src/api/method/utils.rs (1)

7-7: LGTM! API response now serializes balance as string.

The AccountBalanceResponse.value field correctly uses the custom serializer, ensuring consistent string-based serialization for large integer values across all API responses.

Also applies to: 615-616

src/common/typedefs/account/v1.rs (2)

7-7: LGTM on serialization changes.

The serialize_u64_as_string annotations on lamports and discriminator correctly address potential precision loss when serializing large u64 values to JSON. This aligns with the companion SQLite migration that stores these values as TEXT.

Also applies to: 22-23, 63-64

43-53: LGTM on token parsing logic.

The conditional correctly gates parsing on both owner == COMPRESSED_TOKEN_PROGRAM and a matching discriminator, preventing unnecessary parse attempts for non-token accounts.

src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs (5)

9-18: LGTM on helper function.

The execute_sql helper cleanly wraps raw SQL execution with proper backend selection.

63-67: Pre-existing precision loss cannot be recovered.

CAST(CAST(lamports AS INTEGER) AS TEXT) converts the current REAL values, but if precision was already lost when the data was originally stored (values > 2^53), this migration cannot recover those bits. Consider:

Documenting this limitation in release notes

For critical deployments, verifying data integrity post-migration

This is acceptable as a forward-fix to prevent future precision loss.

84-159: Consistent migration pattern for remaining tables.

The same REAL-to-TEXT conversion pattern is correctly applied to token_accounts.amount, owner_balances.lamports, and token_owner_balances.amount. The previously noted concerns about pre-existing precision loss and transaction atomicity apply here as well.

164-170: LGTM on down migration structure.

The rollback correctly converts TEXT back to REAL with an appropriate warning that precision will be lost again for large values. The symmetric pattern ensures clean reversibility.

75-79: All recreated indexes match the original schema definitions.

Verification confirms that all five indexes are correctly recreated:

accounts_address_spent_idx on (address, seq) - original from m20220101

accounts_owner_hash_idx on (spent, owner, hash) UNIQUE - original from m20220101

accounts_queue_idx on (tree, in_output_queue, leaf_index) WHERE in_output_queue = 1 - original from m20250206

idx_accounts_tree_nullified_spent on (tree, nullified_in_tree, spent) - added in m20250822

idx_accounts_tree_nullifier_queue on (tree, nullifier_queue_index) - added in m20250822

Both up() and down() methods preserve all indexes identically, preventing performance regressions or constraint violations.

src/ingester/persist/mod.rs

coderabbitai · 2025-11-27T22:32:09Z

src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs

+        execute_sql(
+            manager,
+            r#"
+            CREATE TABLE accounts_precision_fix (
+                hash BLOB NOT NULL PRIMARY KEY,
+                data BLOB,
+                data_hash BLOB,
+                address BLOB,
+                owner BLOB NOT NULL,
+                tree BLOB NOT NULL,
+                queue BLOB NULL,
+                leaf_index BIGINT NOT NULL,
+                seq BIGINT,
+                slot_created BIGINT NOT NULL,
+                spent BOOLEAN NOT NULL,
+                prev_spent BOOLEAN,
+                lamports TEXT NOT NULL,
+                discriminator TEXT,
+                in_output_queue BOOLEAN NOT NULL DEFAULT TRUE,
+                nullifier BLOB,
+                tx_hash BLOB,
+                nullifier_queue_index BIGINT NULL,
+                nullified_in_tree BOOLEAN NOT NULL DEFAULT FALSE,
+                tree_type INTEGER NULL
+            );
+
+            INSERT INTO accounts_precision_fix
+            SELECT
+                hash, data, data_hash, address, owner, tree, queue, leaf_index, seq,
+                slot_created, spent, prev_spent,
+                CAST(CAST(lamports AS INTEGER) AS TEXT) as lamports,
+                CASE WHEN discriminator IS NOT NULL 
+                     THEN CAST(CAST(discriminator AS INTEGER) AS TEXT)
+                     ELSE NULL 
+                END as discriminator,
+                in_output_queue, nullifier, tx_hash, nullifier_queue_index,
+                nullified_in_tree, tree_type
+            FROM accounts;
+
+            DROP TABLE accounts;
+            ALTER TABLE accounts_precision_fix RENAME TO accounts;
+
+            CREATE INDEX accounts_address_spent_idx ON accounts (address, seq);
+            CREATE UNIQUE INDEX accounts_owner_hash_idx ON accounts (spent, owner, hash);
+            CREATE INDEX accounts_queue_idx ON accounts (tree, in_output_queue, leaf_index) WHERE in_output_queue = 1;
+            CREATE INDEX idx_accounts_tree_nullified_spent ON accounts (tree, nullified_in_tree, spent);
+            CREATE INDEX idx_accounts_tree_nullifier_queue ON accounts (tree, nullifier_queue_index);
+            "#,
+        )
+        .await?;


⚠️ Potential issue | 🟡 Minor

Transaction atomicity concern for multi-statement execution.

SQLite implicitly commits after each DDL statement. If the migration fails between DROP TABLE accounts (line 72) and ALTER TABLE ... RENAME (line 73), the original table is lost with only the temporary table remaining.

Consider splitting the raw SQL into individual statements and wrapping them in explicit transaction control, or document that database backups are required before running this migration.

🤖 Prompt for AI Agents

In src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs around lines 33 to 82, the migration issues DDL across multiple statements without an explicit transaction which risks losing the original table if execution fails between DROP and RENAME; wrap the entire multi-statement SQL sequence in an explicit transaction (BEGIN TRANSACTION; ... COMMIT;) (or BEGIN IMMEDIATE if you need to lock), so all statements succeed or none do, and ensure the code invokes rollback on error; alternatively split into individually executed statements and run them inside a managed transaction so failures trigger a rollback rather than leaving the DB in a partial state.

parse ctoken accounts explicitly

2d4a193

SwenSchaeferjohann changed the title ~~parse ctoken accounts explicitly~~ chore: parse ctoken accounts explicitly Nov 23, 2025

bump

21c40cb

SwenSchaeferjohann added the ai-review label Nov 27, 2025

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

SwenSchaeferjohann force-pushed the swen/parse-token branch 2 times, most recently from 7f9e0ab to 21c40cb Compare December 2, 2025 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: parse ctoken accounts explicitly #44

chore: parse ctoken accounts explicitly #44

Uh oh!

SwenSchaeferjohann commented Nov 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 23, 2025 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Search for TokenData struct definition

Also search for any enum or variant definitions related to tokens

Read the file to understand the context around lines 51-54

Search for discriminator definitions and v1, v2, sha_flat references

Search for token data parsing and deserialization patterns

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: parse ctoken accounts explicitly #44

Are you sure you want to change the base?

chore: parse ctoken accounts explicitly #44

Uh oh!

Conversation

SwenSchaeferjohann commented Nov 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Search for TokenData struct definition

Also search for any enum or variant definitions related to tokens

Read the file to understand the context around lines 51-54

Search for discriminator definitions and v1, v2, sha_flat references

Search for token data parsing and deserialization patterns

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SwenSchaeferjohann commented Nov 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 23, 2025 •

edited

Loading