Skip to content

A lightweight .NET 8.0 library and CLI tool for unified tokenization, validation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.)

License

Notifications You must be signed in to change notification settings

AndrewClements84/TokenKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

54 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Build NuGet Version NuGet Downloads Codecov License: MIT

๐Ÿง  TokenKit

TokenKit โ€” A professional .NET 8.0 library and CLI for tokenization, validation, cost estimation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.).


โœจ Features

Category Description
๐Ÿ”ข Tokenization Analyze text or files and count tokens using multiple encoder engines (simple, SharpToken, ML.Tokenizers)
๐Ÿ’ฐ Cost Estimation Automatically calculate estimated API cost based on model metadata
โœ… Prompt Validation Validate prompt length against model context limits
๐Ÿงฉ Model Registry Manage model metadata (maxTokens, pricing, encodings, providers) via JSON registry
โš™๏ธ CLI & SDK Use TokenKit as a .NET library or a global CLI tool
๐Ÿงฎ Multi-Encoder Support Dynamically select tokenization engines via --engine flag
๐Ÿ“ฆ Self-contained Data Local registry stored in Registry/models.data.json, auto-updatable
๐Ÿ” Live Model Scraper Optional OpenAI API key support to fetch real-time model data
๐Ÿ“Š Structured Logging All CLI commands logged to tokenkit.log with rotation (1MB max)
๐Ÿคซ Quiet & JSON Modes Machine-readable (--json) and silent (--quiet) output modes for automation
๐ŸŽจ CLI Polish Colorized output, ASCII banner, and improved user experience

โš™๏ธ Installation

๐Ÿ“ฆ As a Library (NuGet)

dotnet add package TokenKit

๐Ÿ’ป As a Global CLI Tool

dotnet tool install -g TokenKit

๐Ÿš€ Usage (All-in-One Guide)

๐Ÿ”น Analyze Inline Text

tokenkit analyze "Hello from TokenKit!" --model gpt-4o

๐Ÿ”น Analyze File Input

tokenkit analyze prompt.txt --model gpt-4o

๐Ÿ”น Pipe Input (stdin)

echo "This is piped text input" | tokenkit analyze --model gpt-4o

Example Output:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 4,
  "EstimatedCost": 0.00002,
  "Valid": true
}

๐Ÿ”น Validate Prompt Length

tokenkit validate "A very long prompt to validate" --model gpt-4o
{
  "IsValid": true,
  "Message": "OK"
}

๐Ÿ”น List Registered Models

tokenkit models list

Filter by Provider

tokenkit models list --provider openai

JSON Output

tokenkit models list --json

๐Ÿ”น Update Model Data

Default Update (Offline Fallback)

tokenkit update-models

Using OpenAI API Key

tokenkit update-models --openai-key sk-xxxx

From JSON (stdin)

cat newmodels.json | tokenkit update-models

Example Input:

[
  {
    "Id": "gpt-4o-mini",
    "Provider": "OpenAI",
    "MaxTokens": 64000,
    "InputPricePer1K": 0.002,
    "OutputPricePer1K": 0.01,
    "Encoding": "cl100k_base"
  }
]

๐Ÿ”น Scrape Latest Model Data (Preview)

tokenkit scrape-models --openai-key sk-xxxx

If no key is provided, TokenKit uses the local offline model registry.

Example Output:

๐Ÿ” Fetching latest OpenAI model data...
โœ… Retrieved 3 models:
  - OpenAI: gpt-4o (128000 tokens)
  - OpenAI: gpt-4o-mini (64000 tokens)
  - OpenAI: gpt-3.5-turbo (4096 tokens)

๐Ÿ”น CLI Output Modes

JSON Mode

tokenkit analyze "Hello" --model gpt-4o --json

Outputs pure JSON:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 7,
  "EstimatedCost": 0.000105,
  "Engine": "simple",
  "Valid": true
}

Quiet Mode

tokenkit analyze "Silent test" --model gpt-4o --quiet

No console output. Log entry saved to tokenkit.log.


๐Ÿงฉ Programmatic SDK Example

using TokenKit.Registry;
using TokenKit.Services;

var model = ModelRegistry.Get("gpt-4o");
var tokenizer = new TokenizerService();

var result = tokenizer.Analyze("Hello from TokenKit!", model!.Id);
var cost = CostEstimator.Estimate(model, result.TokenCount);

Console.WriteLine($"Tokens: {result.TokenCount}, Cost: ${cost}");

๐Ÿ“ฆ Model Registry

TokenKit stores all model metadata in:

Registry/models.data.json

Each entry includes:

{
  "Id": "gpt-4o",
  "Provider": "OpenAI",
  "MaxTokens": 128000,
  "InputPricePer1K": 0.005,
  "OutputPricePer1K": 0.015,
  "Encoding": "cl100k_base"
}

๐Ÿงช Testing & Quality Assurance

TokenKit maintains 100% test coverage using xUnit and Codecov.

Run tests locally:

dotnet test --collect:"XPlat Code Coverage"

๐Ÿงญ Future Enhancements

Feature Description
๐ŸŒ Extended Provider Support Add Gemini, Claude, and Mistral integrations
๐Ÿ’พ Persistent Config Profiles Store model defaults and pricing overrides per project
๐Ÿงฎ Batch Analysis Analyze multiple files or prompts in a single command
๐Ÿ“Š Report Generation Export CSV/JSON summaries of token usage and estimated cost
๐Ÿง  LLM-Aware Cost Planner Simulate conversation cost across multi-turn dialogues
๐Ÿงฉ IDE Integrations VS Code and JetBrains plugins for inline token analysis
โš™๏ธ Custom Encoders Support community-built encoders and language models

๐Ÿ’ก License

Licensed under the MIT License.
ยฉ 2025 Andrew Clements โ€” Flow Labs / TokenKit

About

A lightweight .NET 8.0 library and CLI tool for unified tokenization, validation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages