Skip to content

Conversation

@vdiez
Copy link
Contributor

@vdiez vdiez commented Dec 16, 2025

Summary

This PR introduces a new filesystem cache module (packages/shared/src/fs-cache/) that intercepts ALL filesystem operations, including those from dependencies like TypeScript and ESLint. The cache uses a unified tree structure for efficient storage and supports disk persistence via protobuf serialization.

Key Features

  • Global fs interception: Patches Node.js fs module to cache all filesystem operations
  • Unified tree structure: Single FsNode data structure represents all cached path information
  • Negative caching: Caches "file doesn't exist" results to avoid repeated failed lookups
  • Per-project isolation: Separate cache per project with independent lifecycles
  • Disk persistence: Protobuf serialization for saving/loading cache between runs
  • Memory management: Automatic flush to disk when memory threshold is reached

Design Decisions Explored

1. Unified Tree vs Separate Maps

Initially considered separate Maps for each operation type (files, directories, stats, etc.). Chose unified FsNode structure because:

  • Better represents filesystem reality (one node = one path)
  • Avoids redundant storage (stat info stored once, not duplicated)
  • Easier to reason about cache state
  • Natural handling of partial knowledge (e.g., know directory exists but haven't listed children)

2. Cache Lookup Result Pattern

Implemented CacheLookupResult<T> to distinguish three states:

  • undefined returned: path not in cache (cache miss)
  • { exists: false }: cached knowledge that path doesn't exist
  • { exists: true, value: T }: cached data for existing path

3. opendir/opendirSync Caching

fs.opendir returns a Dir object (async iterator). Implemented CachedDir class that:

  • Reads all directory entries on first access
  • Implements full Dir interface including Symbol.asyncDispose/Symbol.dispose
  • Returns cached entries on iteration

4. Type Reuse

Reuses @types/node types where possible via Pick<Stats, ...> instead of maintaining custom type definitions.

Files Created

File Purpose
cache-types.ts Core type definitions (FsNode, FsNodeStat, CacheStats)
cache-utils.ts Path normalization utilities
project-cache.ts Per-project cache implementation with tree structure
cache-manager.ts Singleton managing multiple project caches
fs-patch.ts Monkey-patching for fs module + CachedDir class
index.ts Public API exports
proto/fs-cache.proto Protobuf schema for disk serialization
proto/fs-cache.js Generated protobuf code
proto/fs-cache.d.ts Generated TypeScript definitions

Cached Operations

  • readFileSync / readFile
  • readdirSync / readdir (with withFileTypes support)
  • statSync / stat
  • lstatSync / lstat
  • existsSync
  • realpathSync / realpath
  • accessSync / access
  • opendirSync / opendir

Next Steps for Integration

  1. Initialize at startup: Call initFsCache() when gRPC/HTTP server starts

    import { initFsCache } from '@sonar/shared/fs-cache';
    initFsCache({ memoryThreshold: 500 * 1024 * 1024 });
  2. Set active project before analysis: Call setActiveProject() with project info

    import { setActiveProject } from '@sonar/shared/fs-cache';
    setActiveProject(projectKey, baseDir, cacheDir);
  3. Load cache from previous run (optional):

    import { loadProjectCache } from '@sonar/shared/fs-cache';
    await loadProjectCache(projectKey, cachePath);
  4. Save cache after analysis:

    import { saveProjectCache } from '@sonar/shared/fs-cache';
    await saveProjectCache(projectKey);
  5. Add cache invalidation: Implement file watcher or use Sonar's file change detection to call invalidatePath() when files change

  6. Add tests: Unit tests for cache operations, integration tests for fs patching

  7. Performance metrics: Use getFsCacheStats() to measure cache effectiveness


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Implement a global filesystem cache that intercepts all fs operations,
including those from dependencies like TypeScript and ESLint.

Key features:
- Unified tree structure (FsNode) instead of separate maps per operation
- Negative caching - caches "file doesn't exist" results
- Per-project cache isolation with disk persistence (protobuf)
- Full opendir/opendirSync support with cached Dir implementation
- Reuses @types/node types where possible

Cached operations: readFile, readdir, opendir, stat, lstat, exists,
realpath, realpathSync.native, access (both sync and async variants)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@hashicorp-vault-sonar-prod hashicorp-vault-sonar-prod bot changed the title Add unified filesystem cache module JS-1001 Add unified filesystem cache module Dec 16, 2025
@hashicorp-vault-sonar-prod
Copy link

hashicorp-vault-sonar-prod bot commented Dec 16, 2025

JS-1001

@vdiez
Copy link
Contributor Author

vdiez commented Dec 16, 2025

Additional Next Steps

Complete fs Module Mirror

Currently we only intercept and cache specific fs methods. To get full visibility into filesystem usage, we should create a 1:1 mirror of the entire fs module:

  • Intercept ALL methods: Even methods we don't cache should be wrapped
  • Log non-cached calls: Track when uncached methods are called (e.g., writeFile, unlink, rename, chmod, etc.)
  • Measure frequency: Count calls to each method to identify optimization opportunities
  • Identify missing caches: If a read-only method is called frequently, we can add caching for it

Example metrics to track:

interface FsMethodStats {
  [methodName: string]: {
    calls: number;
    cached: boolean;
    totalTimeMs: number;
  };
}

This will help us:

  1. Understand which fs methods TypeScript/ESLint actually use
  2. Prioritize which methods to add caching for
  3. Detect any unexpected write operations during analysis

Building Protobuf Sources

The protobuf files in packages/shared/src/fs-cache/proto/ are generated from the .proto schema. To regenerate them after modifying fs-cache.proto:

Prerequisites

Ensure protobufjs is installed (already a dependency in the project).

Generate JavaScript code

npx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.proto

Generate TypeScript definitions

npx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.js

Both commands in sequence

npx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.proto && \
npx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.js

Note: The order matters - pbts needs the generated .js file to produce the .d.ts file.

Consider adding an npm script in packages/shared/package.json for convenience:

{
  "scripts": {
    "generate:fs-cache-proto": "pbjs -t static-module -w es6 -o src/fs-cache/proto/fs-cache.js src/fs-cache/proto/fs-cache.proto && pbts -o src/fs-cache/proto/fs-cache.d.ts src/fs-cache/proto/fs-cache.js"
  }
}

@zglicz zglicz changed the base branch from master to typescript-program-caching December 17, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant