ai-trainer
AI model training and validation for Kodachi OS command intelligence
Version: 9.0.1 | Size: 4.2MB | Author: Warith Al Maawali warith@digi77.com
License: LicenseRef-Kodachi-SAN-1.0 | Website: https://www.digi77.com
File Information
| Property | Value |
|---|---|
| Binary Name | ai-trainer |
| Version | 9.0.1 |
| Build Date | REDACTED-BUILD-TIME |
| Rust Version | 1.82.0 |
| File Size | 4.2MB |
| Author | Warith Al Maawali warith@digi77.com |
| License | LicenseRef-Kodachi-SAN-1.0 |
| Category | Kodachi Binary |
| Description | AI model training and validation for Kodachi OS command intelligence |
| Git Commit | unknown |
| Metadata Generated | 2026-06-08T16:22:03Z |
| Binary Timestamp | Unknown |
| JSON Data | View Raw JSON |
SHA256 Checksum
Features
| # | Feature |
|---|---|
| 1 | TF-IDF based command embeddings |
| 2 | Incremental model updates |
| 3 | Model validation and accuracy testing |
Security Features
| Feature | Description |
|---|---|
| Input Validation | Argument parsing via clap; per-command validation is the consumer's responsibility |
| Rate Limiting | Not provided by cli-core |
| Authentication | Not provided by cli-core (see online-auth) |
| Encryption | Not provided by cli-core |
System Requirements
| Requirement | Value |
|---|---|
| OS | Linux (Debian-based) |
| Privileges | root/sudo for system operations |
| Dependencies | OpenSSL, libcurl |
Global Options
| Flag | Description |
|---|---|
-h, --help |
Print help information |
-v, --version |
Print version information |
-n, --info |
Display detailed information |
-e, --examples |
Show usage examples |
--json |
Output in JSON format |
-o, --output-format <FORMAT> |
Force output format (text |
--json-pretty |
Pretty-print JSON output with indentation |
--json-human |
Enhanced JSON output with improved formatting (like jq) |
--fields <FIELD_LIST> |
Select specific fields to include in output (comma-separated) |
--limit <NUMBER> |
Limit number of results returned |
--offset <NUMBER> |
Skip first N results (for pagination) |
-d, --work-dir <PATH> |
Working directory (defaults to auto-detected base directory) |
--port <PORT> |
Set custom port number (1024-65535) |
--log-level <LEVEL> |
Set log level (error |
--verbose |
Enable verbose output |
--quiet |
Suppress non-essential output |
--no-color |
Disable colored output |
--config <FILE> |
Use custom configuration file |
--timeout <SECS> |
Set operation timeout in seconds (optional; no default applied) |
--retry <COUNT> |
Retry attempts (optional; no default applied) |
Commands
Model Management
export
Export model embeddings and metadata to JSON file
Usage:
Examples:
snapshot
Save current model as versioned snapshot
Usage:
Examples:
list-snapshots
List all saved model snapshots
Usage:
Examples:
status
Display current model status and statistics
Usage:
Examples:
download-model
Download ONNX model, tokenizer, or GGUF model for AI engine tiers
Usage:
ai-trainer download-model [--llm [default|small|large|xlarge|xlarge-hq]] [--show-models] [--all] [--output-dir <DIR>] [--force] [--allow-unverified-model]
Examples:
Model Training
train
Train AI model from command metadata (full retraining)
Usage:
Examples:
incremental
Update model incrementally with new command data
Usage:
Examples:
Validation & Testing
validate
Validate model accuracy against test dataset
Usage:
Examples:
Operational Scenarios
Scenario-oriented workflows generated from the binary's built-in -e --json examples.
Scenario 1: Model Training
Full model training operations
Step 1: Train model with command data
Expected Output: Training statistics and embeddings metricsNote
Creates new model from scratch
Step 2: Train with custom database
Expected Output: Training results with custom DB locationNote
Allows custom database path specification
Step 3: Train and output results as JSON
Expected Output: JSON-formatted training metricsNote
Structured output for automation
Scenario 2: Incremental Training
Update existing models with new data
Step 1: Incrementally train with new data
Expected Output: New embeddings added to existing modelNote
Requires existing trained model
Step 2: Incremental training with custom DB and JSON output
Expected Output: JSON-formatted incremental training resultsNote
Combines custom DB path with structured output
Scenario 3: Validation
Model accuracy testing and validation
Step 1: Validate model with test data
Expected Output: Validation results with accuracy metricsNote
Tests model against known test cases
Step 2: Validate with custom accuracy threshold
Expected Output: Pass/fail validation with 90% thresholdNote
Default threshold is 0.85
Step 3: Validate with custom DB and JSON output
Expected Output: JSON-formatted validation metricsNote
Structured validation results
Step 4: Validate with all parameters combined
Expected Output: JSON validation with custom test data, 90% threshold, and custom DBNote
Full parameter example for CI/CD pipelines
Scenario 4: Model Export
Export trained models and statistics
Step 1: Export trained model
Expected Output: Complete model export with embeddingsNote
Default format includes all embeddings
Step 2: Export in compact format
Expected Output: Compact model export without full embeddingsNote
Reduces export file size
Step 3: Export statistics as JSON
Expected Output: Model statistics without embeddingsNote
Lightweight statistics export
Step 4: Full export with JSON envelope output
Expected Output: Complete model export with JSON status envelopeNote
Combines full embeddings export with structured output
Scenario 5: Snapshots
Model versioning and snapshot management
Step 1: Create model snapshot with version
Expected Output: Versioned snapshot created successfullyNote
Preserves model state at specific version
Step 2: List all model snapshots
Expected Output: List of saved model versionsNote
Shows snapshot metadata and versions
Step 3: List snapshots as JSON
Expected Output: JSON-formatted snapshot listingNote
Structured snapshot information
Step 4: Create snapshot with JSON output
Expected Output: JSON with snapshot name, version, and embedding countNote
Structured output for automation
Scenario 6: Model Download
Download ONNX and GGUF model files for AI engine tiers
Step 1: Download ONNX embeddings model to default models/ directory
Expected Output: Model files downloaded successfullyNote
Downloads all-MiniLM-L6-v2 ONNX model and tokenizer
Step 2: Download default GGUF model (Qwen3-1.7B Q4_K_M, ~1.1GB)
Expected Output: GGUF model downloaded to models/ directoryNote
Best balance of quality, speed, and size for CPU inference
Step 3: Download small GGUF model (Qwen3-1.7B Q4_K_S, ~1.0GB)
Expected Output: Small GGUF model downloadedNote
For systems with <4GB available RAM
Step 4: Download large GGUF model (Phi-3.5-mini, ~2.3GB)
Expected Output: Large GGUF model downloadedNote
Better reasoning, 128K trained context
Step 5: Download 8B GGUF model tuned for SPEED (Qwen3-8B Q4_K_M, ~4.8GB)
Expected Output: Qwen3-8B Q4_K_M downloadedNote
8-billion-parameter Qwen3 at 4-bit quantization. Use on 8+ GB RAM systems for faster tokens-per-second. Lower quality than xlarge-hq, higher than default.
Step 6: Download 8B GGUF model tuned for QUALITY (Qwen3-8B Q5_K_M, ~5.6GB)
Expected Output: Qwen3-8B Q5_K_M downloadedNote
8-billion-parameter Qwen3 at 5-bit quantization. Recommended on 16+ GB RAM systems. Best local-LLM quality available in the catalog. About 15 percent slower than xlarge.
Step 7: Download both ONNX embeddings and default GGUF model
Expected Output: All model files downloadedNote
Complete setup for all AI tiers
Step 8: List downloaded and available models
Expected Output: Model inventory with sizes and statusNote
Shows what's installed and what can be downloaded
Step 9: Model inventory as JSON
Expected Output: JSON with downloaded and available model detailsStep 10: Force re-download of ONNX model
Expected Output: Model files re-downloadedNote
Overwrites existing files
Scenario 7: Status
Model status and health checks
Step 1: Show training status
Expected Output: Current model status and statisticsNote
Displays model readiness and metrics
Step 2: Show training status as JSON
Expected Output: JSON-formatted status informationNote
Structured status output for automation
Scenario 8: AI Tier Integration
Training operations related to the 6-tier AI engine (TF-IDF, ONNX, Mistral.rs, GenAI/Ollama, Legacy LLM, Claude)
Step 1: Validate model against all tier responses
Expected Output: Validation results covering all active AI tiersNote
Tests model accuracy across available tiers
Step 2: Train model with feedback from all tiers
Expected Output: Training metrics including multi-tier feedback dataNote
Includes feedback from mistral.rs and GenAI tier executions
Scenario 9: ONNX Intent Classifier
Evaluate the ONNX intent classifier used for fast-path routing (12 categories, <5ms inference)
Step 1: Evaluate ONNX intent classifier accuracy
Expected Output: JSON with per-intent precision, recall, and F1-scoreNote
Target: 95%+ accuracy on held-out test set
Step 2: Check if intent classifier model is downloaded
Expected Output: JSON showing classifier model statusNote
Model: kodachi-intent-classifier.onnx (~65MB)
Environment Variables
| Variable | Description | Default | Values |
|---|---|---|---|
RUST_LOG |
Set logging level | info | error |
NO_COLOR |
Disable all colored output when set | unset | 1 |
Exit Codes
| Code | Description |
|---|---|
| 0 | Success |
| 3 | Permission denied |
| 4 | Network error |
| 1 | General error |
| 2 | Invalid arguments |
| 5 | File not found |