Skip to content

Conversation

@RonShakutai
Copy link
Collaborator

@RonShakutai RonShakutai commented Dec 10, 2025

Change Description

This PR introduce GPU optimizations for Gliner, Spacy, stanza and transformers.

Technical Implementation

  • Added DeviceDetector singleton for automatic GPU detection and CUDA initialization
  • Integrated GPU support into all NLP engines (SpacyNlpEngine, TransformersNlpEngine, StanzaNlpEngine, GLiNERRecognizer)
  • Dependency optimization: Using cupy-cuda12x

This PR introduces GPU handling improvements for GLiNER, spaCy, Transformers, and Stanza NLP engines, optimizing GPU detection and utilization.

Reproduce Results

On this branch:

cd presidio-analyzer
poetry run python ../benchmark_presidio.py --engines spacy/transformers/gliner/stanza --sizes 50,500 --json gpu_results.json

Then download the script, switch to main branch, and re-run: pay attention to change the json results file name in the command.

Compare gpu_results.json vs main_results.json.

Results!

GLiNER - Big Improvement

Rows Metric Before Optimization After Optimization Improvement
50 Total Time 13.37s 2.73s 4.9x faster
50 Throughput 3.74 texts/sec 18.31 texts/sec 4.9x faster
50 Entities Found 241 245 +1.7%
500 Total Time 211.21s 31.34s 6.7x faster
500 Throughput 2.37 texts/sec 15.96 texts/sec 6.7x faster
500 Entities Found 2,435 2,469 +1.4%

Transformers - Big Improvement

Comparison of Transformers (StanfordAIMI/stanford-deidentifier-base)

Rows Metric Before Optimization After Optimization Improvement
50 Total Time 3.64s 0.77s 4.7x faster
50 Throughput 13.73 texts/sec 64.85 texts/sec 4.7x faster
50 Entities Found 273 273 0%
500 Total Time 76.78s 7.97s 9.6x faster
500 Throughput 6.51 texts/sec 62.73 texts/sec 9.6x faster
500 Entities Found 2,746 2,746 0%

Stanza - Big Improvement

Rows Metric Before Optimization After Optimization Change
50 Total Time 9.53s 7.57s 1.3x faster
50 Throughput 5.24 texts/sec 6.61 texts/sec 1.3x faster
50 Entities Found 253 253 0%
500 Total Time 141.98s 33.77s 4.2x faster
500 Throughput 3.52 texts/sec 14.81 texts/sec 4.2x faster
500 Entities Found 2,510 2,511 +0.04%

spaCy - No Change

Comparison of spaCy (en_core_web_lg) performance before and after GPU handling improvements.

Rows Metric Before Optimization After Optimization Change
50 Total Time 0.36s 0.54s 1.5x slower
50 Throughput 138.58 texts/sec 93.25 texts/sec 1.5x slower
50 Entities Found 235 235 0%
500 Total Time 2.97s 4.62s 1.6x slower
500 Throughput 168.60 texts/sec 108.17 texts/sec 1.6x slower
500 Entities Found 2,377 2,377 0%

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@RonShakutai RonShakutai self-assigned this Dec 10, 2025
@github-actions
Copy link

Coverage report (presidio-anonymizer)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link

Coverage report (presidio-structured)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link

Coverage report (presidio-cli)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link

Coverage report (presidio-image-redactor)

This PR does not seem to contain any modification to coverable code.

Copy link
Collaborator

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A long due proper support for GPU workloads. Left some comments to consider.

@@ -0,0 +1,606 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider putting the benchmark result files in a dedicated folder under docs, or omit them from the repo. There's a chance for this to become stale very quickly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its only for the GPU tests. i think we will need something more organized later on.


logger.debug(f"Loading SpaCy and transformers models: {self.models}")

# Configure GPU if available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already called in the super

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

"phonenumbers (>=8.12,<10.0.0)",
"pydantic (>=2.0.0,<3.0.0)"
"pydantic (>=2.0.0,<3.0.0)",
"cupy-cuda12x>=13.4.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this install cuda? Will the work for CPU only machines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I believe it should be in the new GPU section to give users more flexibility regarding GPU dependencies. and not install what not in use.

@RonShakutai
Copy link
Collaborator Author

Thanks! A long due proper support for GPU workloads. Left some comments to consider.
Hi @omri374

This PR is not ready for full review yet. I mainly want someone with a different GPU to run it and check whether the GPU-pref optimization improves performance on their setup.

Before finalizing, I’ll remove the benchmark script—I kept it only so others can measure GPU improvements on their machines.

I’m also considering adding an extra gpu section in the pyproject.toml, so people can either install a common set of GPU dependencies or use their own GPU libraries.

@RonShakutai RonShakutai marked this pull request as ready for review December 15, 2025 11:46
@RonShakutai RonShakutai requested a review from a team as a code owner December 15, 2025 11:46
@github-actions
Copy link

github-actions bot commented Dec 15, 2025

Coverage report (presidio-analyzer)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  presidio-analyzer/presidio_analyzer/nlp_engine
  __init__.py
  device_detector.py
  spacy_nlp_engine.py
  stanza_nlp_engine.py 292
  transformers_nlp_engine.py
  presidio-analyzer/presidio_analyzer/predefined_recognizers/ner
  gliner_recognizer.py 116
Project Total  

This report was generated by python-coverage-comment-action

@tamirkamara
Copy link
Contributor

@RonShakutai Is this meant to be support inside the docker image without additional changes?

@RonShakutai RonShakutai requested a review from omri374 December 15, 2025 12:29
@RonShakutai
Copy link
Collaborator Author

RonShakutai commented Dec 15, 2025

@RonShakutai Is this meant to be support inside the docker image without additional changes?

Yes, the device detection is automatic and requires no code changes

@tamirkamara
Copy link
Contributor

@RonShakutai Is this meant to be support inside the docker image without additional changes?

Yes, the device detection is automatic and requires no code changes

Doesn't GPU require drivers on the OS level? Not sure we install those currently.
I also see you have instructions to do pip install which makes we think this won't work unless we do something for it. But this can be an separate issue/PR.

@RonShakutai
Copy link
Collaborator Author

@RonShakutai Is this meant to be support inside the docker image without additional changes?

Yes, the device detection is automatic and requires no code changes

Doesn't GPU require drivers on the OS level? Not sure we install those currently. I also see you have instructions to do pip install which makes we think this won't work unless we do something for it. But this can be an separate issue/PR.

GPU execution still requires OS-level drivers (CUDA, NVIDIA runtime), which are outside the scope of this PR.

This PR focuses on two things:

Correct GPU usage in code paths for Stanza, spaCy, GLiNER, and Transformers, once a GPU is available.

Providing an optional gpu extra that installs commonly used CUDA-compatible Python dependencies, so users can “plug and go” in most setups.

CUDA versions and drivers are highly GPU-specific and must be installed by the GPU owner, who knows their hardware best. For that reason, we do not bundle or enforce CUDA drivers.

We only add a recommended GPU dependency set via an extra in pyproject.toml.
As a result, this change does not affect the Docker image and requires no Docker-level changes.

@omri374
Copy link
Collaborator

omri374 commented Dec 17, 2025

@RonShakutai Is this meant to be support inside the docker image without additional changes?

Yes, the device detection is automatic and requires no code changes

Doesn't GPU require drivers on the OS level? Not sure we install those currently. I also see you have instructions to do pip install which makes we think this won't work unless we do something for it. But this can be an separate issue/PR.

GPU execution still requires OS-level drivers (CUDA, NVIDIA runtime), which are outside the scope of this PR.

This PR focuses on two things:

Correct GPU usage in code paths for Stanza, spaCy, GLiNER, and Transformers, once a GPU is available.

Providing an optional gpu extra that installs commonly used CUDA-compatible Python dependencies, so users can “plug and go” in most setups.

CUDA versions and drivers are highly GPU-specific and must be installed by the GPU owner, who knows their hardware best. For that reason, we do not bundle or enforce CUDA drivers.

We only add a recommended GPU dependency set via an extra in pyproject.toml. As a result, this change does not affect the Docker image and requires no Docker-level changes.

For this to work in Docker, we need two things: (1) code level adjusments, as done in this PR, (2) Dockerfile adjustments, as our current Dockerfile doesn't install cuda/cudnn packages (e.g. FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04).
I would suggest to finalize this PR to make sure that whoever wants to use GPU using the python package can, and later continue to providing GPU compatible Dockerfiles/images. WDYT?

@RonShakutai
Copy link
Collaborator Author

@RonShakutai Is this meant to be support inside the docker image without additional changes?

Yes, the device detection is automatic and requires no code changes

Doesn't GPU require drivers on the OS level? Not sure we install those currently. I also see you have instructions to do pip install which makes we think this won't work unless we do something for it. But this can be an separate issue/PR.

GPU execution still requires OS-level drivers (CUDA, NVIDIA runtime), which are outside the scope of this PR.
This PR focuses on two things:
Correct GPU usage in code paths for Stanza, spaCy, GLiNER, and Transformers, once a GPU is available.
Providing an optional gpu extra that installs commonly used CUDA-compatible Python dependencies, so users can “plug and go” in most setups.
CUDA versions and drivers are highly GPU-specific and must be installed by the GPU owner, who knows their hardware best. For that reason, we do not bundle or enforce CUDA drivers.
We only add a recommended GPU dependency set via an extra in pyproject.toml. As a result, this change does not affect the Docker image and requires no Docker-level changes.

For this to work in Docker, we need two things: (1) code level adjusments, as done in this PR, (2) Dockerfile adjustments, as our current Dockerfile doesn't install cuda/cudnn packages (e.g. FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04). I would suggest to finalize this PR to make sure that whoever wants to use GPU using the python package can, and later continue to providing GPU compatible Dockerfiles/images. WDYT?

Totally agree. @omri374
I think this PR should stay focused on the code-level GPU support (detection and correct applications of it in the different code components).
GPU drivers and CUDA versions are highly hardware-specific and should be installed by the environment owner.

For Docker concern that was raised correctly by @tamirkamara !, this is a separate concern.
We can address it later by providing GPU-enabled Docker images or a solution that allows choosing the appropriate base image at build/run time, depending on the available hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants