-
Notifications
You must be signed in to change notification settings - Fork 0
chore: parse ctoken accounts explicitly #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdded string-based serialization for several u64 fields, broadened token parsing to accept multiple discriminator forms, introduced SQLite-precision storage and SQL changes, added a SQLite-specific migration to convert numeric columns to TEXT, and bumped the package version and .gitignore. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/common/typedefs/account/v2.rs (1)
44-57: Consider extracting discriminator constants and shared parsing logic.The magic byte arrays for token discriminators are duplicated here and in
v1.rs. Extracting these as named constants would improve readability and maintainability:// In a shared module (e.g., token_data.rs or constants.rs) pub const TOKEN_V1_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0]; pub const TOKEN_V2_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3]; pub const TOKEN_SHA_FLAT_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4];Additionally, you could consider extracting the parsing logic into a shared helper function since
Account::parse_token_dataandAccountV2::parse_token_dataare nearly identical, reducing duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
.gitignore(1 hunks)Cargo.toml(1 hunks)src/common/typedefs/account/v1.rs(1 hunks)src/common/typedefs/account/v2.rs(1 hunks)
🔇 Additional comments (4)
Cargo.toml (1)
10-10: LGTM!Patch version bump is appropriate for the token parsing changes introduced in this PR.
.gitignore (1)
16-18: LGTM!Appropriate additions to ignore Cursor IDE configuration and log files across all directories.
src/common/typedefs/account/v1.rs (1)
36-49: Implementation is consistent with v2.rs.The parsing logic correctly gates token data parsing on both owner verification and discriminator matching. This explicit approach is clearer about which token types are supported.
Note: As mentioned in the v2.rs review, consider extracting shared constants and parsing logic to reduce duplication.
src/common/typedefs/account/v2.rs (1)
51-54: I'll help you verify the TokenData compatibility concern across all token types. Let me search the codebase for relevant information.
<function_calls>
#!/bin/bashSearch for TokenData struct definition
rg -n "struct TokenData" --type rust -A 10
Also search for any enum or variant definitions related to tokens
rg -n "enum.*Token" --type rust -C 2
</function_calls>Now let me examine the file under review to understand the context:
<function_calls>
#!/bin/bashRead the file to understand the context around lines 51-54
cat -n src/common/typedefs/account/v2.rs | head -80
</function_calls>Let me also search for the discriminator values and token type handling:
<function_calls>
#!/bin/bashSearch for discriminator definitions and v1, v2, sha_flat references
rg -n "discriminator|v1|v2|sha_flat" --type rust src/common/typedefs/account/ -B 2 -A 2
</function_calls><function_calls>
#!/bin/bashSearch for token data parsing and deserialization patterns
rg -n "try_from_slice|TokenData::try_from" --type rust -C 5
</function_calls>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/common/typedefs/account/v1.rs (1)
38-44: Consider extracting discriminator constants for clarity.The magic byte arrays are used inline without documentation. Named constants would improve readability and make protocol versioning easier to track.
+const V1_TOKEN_DISCRIMINATOR: [u8; 8] = [2, 0, 0, 0, 0, 0, 0, 0]; +const V2_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 3]; +const SHA_FLAT_TOKEN_DISCRIMINATOR: [u8; 8] = [0, 0, 0, 0, 0, 0, 0, 4]; + pub fn parse_token_data(&self) -> Result<Option<TokenData>, IngesterError> { match self.data.as_ref() { Some(data) => { - let is_v1_token = data.discriminator.0.to_le_bytes() == [2, 0, 0, 0, 0, 0, 0, 0]; - let is_v2_token = data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 3]; - let is_sha_flat_token = - data.discriminator.0.to_le_bytes() == [0, 0, 0, 0, 0, 0, 0, 4]; + let bytes = data.discriminator.0.to_le_bytes(); + let is_v1_token = bytes == V1_TOKEN_DISCRIMINATOR; + let is_v2_token = bytes == V2_TOKEN_DISCRIMINATOR; + let is_sha_flat_token = bytes == SHA_FLAT_TOKEN_DISCRIMINATOR;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/api/method/utils.rs(2 hunks)src/common/typedefs/account/v1.rs(4 hunks)src/common/typedefs/account/v2.rs(3 hunks)src/common/typedefs/token_data.rs(2 hunks)src/common/typedefs/unsigned_integer.rs(1 hunks)src/ingester/persist/mod.rs(2 hunks)src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs(1 hunks)src/migration/migrations/standard/mod.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/common/typedefs/account/v2.rs
🧰 Additional context used
🧬 Code graph analysis (5)
src/api/method/utils.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
serialize_u64_as_string(95-100)
src/ingester/persist/mod.rs (2)
src/api/method/get_batch_address_update_info.rs (1)
row(89-89)src/common/typedefs/unsigned_integer.rs (1)
value(55-56)
src/common/typedefs/unsigned_integer.rs (1)
src/api/method/utils.rs (1)
value(32-34)
src/common/typedefs/token_data.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
serialize_u64_as_string(95-100)
src/common/typedefs/account/v1.rs (1)
src/common/typedefs/unsigned_integer.rs (1)
serialize_u64_as_string(95-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run tests
🔇 Additional comments (13)
src/ingester/persist/mod.rs (2)
347-362: LGTM! Backend-specific balance parsing is well-structured.The SQLite branch correctly reads the TEXT column as a String and parses to u64 before converting to Decimal, with appropriate error handling.
388-399: Consistent handling of value formatting across backends.The SQLite branch correctly quotes values as strings for TEXT columns while Postgres uses numeric literals. This aligns with the migration that converts columns to TEXT for SQLite.
src/migration/migrations/standard/mod.rs (1)
17-17: LGTM! Migration properly registered.The new SQLite precision fix migration follows the established naming convention and is correctly appended to the migrations list.
Also applies to: 35-35
src/common/typedefs/token_data.rs (1)
6-10: LGTM! String serialization for token amounts prevents JavaScript precision loss.The
amountfield correctly uses the custom serializer to emit string values, which is consistent with the pattern applied across other large integer fields in the PR.Also applies to: 44-45
src/common/typedefs/unsigned_integer.rs (1)
93-100: LGTM! Well-documented serialization helper.The function correctly converts the inner
u64to a string representation, preventing precision loss when serializing to JSON for clients that may have limited integer precision (e.g., JavaScript's 53-bit safe integers).src/api/method/utils.rs (1)
7-7: LGTM! API response now serializes balance as string.The
AccountBalanceResponse.valuefield correctly uses the custom serializer, ensuring consistent string-based serialization for large integer values across all API responses.Also applies to: 615-616
src/common/typedefs/account/v1.rs (2)
7-7: LGTM on serialization changes.The
serialize_u64_as_stringannotations onlamportsanddiscriminatorcorrectly address potential precision loss when serializing large u64 values to JSON. This aligns with the companion SQLite migration that stores these values as TEXT.Also applies to: 22-23, 63-64
43-53: LGTM on token parsing logic.The conditional correctly gates parsing on both
owner == COMPRESSED_TOKEN_PROGRAMand a matching discriminator, preventing unnecessary parse attempts for non-token accounts.src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs (5)
9-18: LGTM on helper function.The
execute_sqlhelper cleanly wraps raw SQL execution with proper backend selection.
63-67: Pre-existing precision loss cannot be recovered.
CAST(CAST(lamports AS INTEGER) AS TEXT)converts the current REAL values, but if precision was already lost when the data was originally stored (values > 2^53), this migration cannot recover those bits. Consider:
- Documenting this limitation in release notes
- For critical deployments, verifying data integrity post-migration
This is acceptable as a forward-fix to prevent future precision loss.
84-159: Consistent migration pattern for remaining tables.The same REAL-to-TEXT conversion pattern is correctly applied to
token_accounts.amount,owner_balances.lamports, andtoken_owner_balances.amount. The previously noted concerns about pre-existing precision loss and transaction atomicity apply here as well.
164-170: LGTM on down migration structure.The rollback correctly converts TEXT back to REAL with an appropriate warning that precision will be lost again for large values. The symmetric pattern ensures clean reversibility.
75-79: All recreated indexes match the original schema definitions.Verification confirms that all five indexes are correctly recreated:
accounts_address_spent_idxon(address, seq)- original from m20220101accounts_owner_hash_idxon(spent, owner, hash)UNIQUE - original from m20220101accounts_queue_idxon(tree, in_output_queue, leaf_index) WHERE in_output_queue = 1- original from m20250206idx_accounts_tree_nullified_spenton(tree, nullified_in_tree, spent)- added in m20250822idx_accounts_tree_nullifier_queueon(tree, nullifier_queue_index)- added in m20250822Both up() and down() methods preserve all indexes identically, preventing performance regressions or constraint violations.
| execute_sql( | ||
| manager, | ||
| r#" | ||
| CREATE TABLE accounts_precision_fix ( | ||
| hash BLOB NOT NULL PRIMARY KEY, | ||
| data BLOB, | ||
| data_hash BLOB, | ||
| address BLOB, | ||
| owner BLOB NOT NULL, | ||
| tree BLOB NOT NULL, | ||
| queue BLOB NULL, | ||
| leaf_index BIGINT NOT NULL, | ||
| seq BIGINT, | ||
| slot_created BIGINT NOT NULL, | ||
| spent BOOLEAN NOT NULL, | ||
| prev_spent BOOLEAN, | ||
| lamports TEXT NOT NULL, | ||
| discriminator TEXT, | ||
| in_output_queue BOOLEAN NOT NULL DEFAULT TRUE, | ||
| nullifier BLOB, | ||
| tx_hash BLOB, | ||
| nullifier_queue_index BIGINT NULL, | ||
| nullified_in_tree BOOLEAN NOT NULL DEFAULT FALSE, | ||
| tree_type INTEGER NULL | ||
| ); | ||
| INSERT INTO accounts_precision_fix | ||
| SELECT | ||
| hash, data, data_hash, address, owner, tree, queue, leaf_index, seq, | ||
| slot_created, spent, prev_spent, | ||
| CAST(CAST(lamports AS INTEGER) AS TEXT) as lamports, | ||
| CASE WHEN discriminator IS NOT NULL | ||
| THEN CAST(CAST(discriminator AS INTEGER) AS TEXT) | ||
| ELSE NULL | ||
| END as discriminator, | ||
| in_output_queue, nullifier, tx_hash, nullifier_queue_index, | ||
| nullified_in_tree, tree_type | ||
| FROM accounts; | ||
| DROP TABLE accounts; | ||
| ALTER TABLE accounts_precision_fix RENAME TO accounts; | ||
| CREATE INDEX accounts_address_spent_idx ON accounts (address, seq); | ||
| CREATE UNIQUE INDEX accounts_owner_hash_idx ON accounts (spent, owner, hash); | ||
| CREATE INDEX accounts_queue_idx ON accounts (tree, in_output_queue, leaf_index) WHERE in_output_queue = 1; | ||
| CREATE INDEX idx_accounts_tree_nullified_spent ON accounts (tree, nullified_in_tree, spent); | ||
| CREATE INDEX idx_accounts_tree_nullifier_queue ON accounts (tree, nullifier_queue_index); | ||
| "#, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transaction atomicity concern for multi-statement execution.
SQLite implicitly commits after each DDL statement. If the migration fails between DROP TABLE accounts (line 72) and ALTER TABLE ... RENAME (line 73), the original table is lost with only the temporary table remaining.
Consider splitting the raw SQL into individual statements and wrapping them in explicit transaction control, or document that database backups are required before running this migration.
🤖 Prompt for AI Agents
In src/migration/migrations/standard/m20251127_000001_sqlite_precision_fix.rs
around lines 33 to 82, the migration issues DDL across multiple statements
without an explicit transaction which risks losing the original table if
execution fails between DROP and RENAME; wrap the entire multi-statement SQL
sequence in an explicit transaction (BEGIN TRANSACTION; ... COMMIT;) (or BEGIN
IMMEDIATE if you need to lock), so all statements succeed or none do, and ensure
the code invokes rollback on error; alternatively split into individually
executed statements and run them inside a managed transaction so failures
trigger a rollback rather than leaving the DB in a partial state.
7f9e0ab to
21c40cb
Compare
Overview
Testing
Summary by CodeRabbit
Patch Release
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.