Accelerating Unicode Processing with StringZilla: 50x Speedups over ICU

### What is the problem this feature will solve?

Text processing isn't fast in JavaScript, especially when it comes to Unicode handling. Operations like case-folding or case-insensitive substring search are orders of magnitude slower than they can be with modern SIMD kernels.

### What is the feature you are proposing to solve the problem?

Following up on [this eXchange](https://x.com/matteocollina/status/2000970854939717658?s=20) with @mcollina, I'm curious if this latest release of my [StringZilla](https://github.com/ashvardanian/StringZilla) library can be of help to NodeJS and the broader JavaScript community?

> In short: I grouped all Unicode 17 case-folding rules and wrote ~3K lines of AVX-512 kernels around them to enable fully compliant case-insensitive substring search across the full 1M+ Unicode range, directly on the original UTF-8 bytes. It's not only often ~50× faster than ICU, but also "less wrong" than most search tools you'll reach for — from low-level Grep to products like Google Docs, Microsoft Excel, and VS Code.

I already have NodeJS bindings available from NPM, but, given the lack of `NAPI`s for zero-copy access to internal string representations from C, my API is limited to accepting/returning `Buffer`s. It's questionable from an ergonomics perspective and, of course, would be much more usable if integrated adequately with Node's native strings.

### What alternatives have you considered?

ICU4C and ICU4X are the only options for this functionality beyond StringZilla. They are the rock-solid reference implementations, but [often 5-150x slower than StringZilla](https://ashvardanian.com/posts/search-utf8/#stringzilla-against-icu-and-memchr).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Accelerating Unicode Processing with StringZilla: 50x Speedups over ICU #61092

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Accelerating Unicode Processing with StringZilla: 50x Speedups over ICU #61092

Description

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions