move oss_llm/ from learning_ai_2nd_brain

2026-02-28 00:03:37 -08:00 · 2026-02-28 00:03:37 -08:00 · 1a4b3c1fb3
commit 1a4b3c1fb3
parent a32978f9c3
13 changed files with 971 additions and 0 deletions
--- a/__LOCAL_LLMs/oss_llm/README.md
+++ b/__LOCAL_LLMs/oss_llm/README.md
@ -0,0 +1,383 @@
+# Kimi “2.5” local deployment on macOS (what’s реально possible)
+
+## Executive summary
+
+- **Running the real “Kimi 2.5 / Kimi K2-class” model fully locally on a Mac is not practical**: the official open-weight Kimi K2 deployment guidance targets **multi-node NVIDIA GPU clusters** (vLLM/SGLang/TensorRT-LLM) and assumes CUDA.
+- **What _is_ practical on a Mac:** use **Kimi Code CLI** (official) or the **Moonshot API** (official). That’s not “local inference” (weights on your laptop), but it is “local usage” (client runs on your laptop).
+- **VPN / proxy:** yes, clients can usually work through VPN/proxy, but you must be able to reach the provider’s API endpoints. If your network blocks access, you’ll need allowlisting or a different route.
+
+## What GitHub shows (official sources)
+
+### 1) Kimi K2 (open-weight model series)
+
+- Official repo: https://github.com/MoonshotAI/Kimi-K2
+- The repo includes a deployment guide at `docs/deploy_guidance.md`.
+- Key reality check from the guide: **the smallest FP8 128k-seqlen deployment is described as ~16 GPUs (H200/H20-class)** for mainstream setups (vLLM/SGLang). This is fundamentally not macOS-laptop friendly.
+
+### 2) Kimi Code CLI (best option for macOS)
+
+- Official repo: https://github.com/MoonshotAI/kimi-cli
+- Docs: https://moonshotai.github.io/kimi-cli/en/guides/getting-started.html
+- Kimi Code CLI is an **agent client** (terminal tool) that talks to a remote provider.
+- First-run authentication options:
+  - `/login` (browser login; auto-configures models)
+  - `/setup` (API key flow)
+
+## Can we locally deploy “Kimi 2.5” on this Mac?
+
+### Practically: no (for K2 / K2-class)
+
+- **macOS has no CUDA**, so GPU-first inference stacks referenced for K2 deployment (vLLM / TensorRT-LLM) are not an option.
+- Even if you tried CPU inference, the K2-class model scale (MoE, 1T total params / 32B active) is far beyond what is reasonable on a laptop.
+
+### What’s feasible instead
+
+1. **Use Kimi Code CLI on macOS** (recommended)
+2. **Use Moonshot/Kimi APIs from your own scripts** (Python/Node/etc) once your network allows access
+3. If you truly need **local weights on Mac**, use a Mac-friendly model/runtime (e.g., MLX/Ollama) — but that would be **a different model**, not Kimi 2.5/K2.
+
+## Options: smaller, Mac-runnable open models (local inference)
+
+If your goal is “no network calls at runtime”, pick a **local runtime** + a **small-enough model + quantization**.
+
+## If you can only access GitHub
+
+If your enterprise network only allows `github.com`, that severely limits how you obtain model weights because most model hosting is **not** on GitHub.
+
+What still works:
+
+- **GitHub Releases assets**: some projects publish quantized model files (often `.gguf`) as release assets.
+- **Git LFS inside a repo**: occasionally weights are stored in-repo via LFS (still uncommon for large models).
+
+What usually won’t work:
+
+- Downloading weights from external model registries / hosting sites (blocked by policy).
+
+Reality check: GitHub has practical size limits, so **very large models are rarely hosted there**. Your best bet is to use **small models (7B–14B) in quantized form**.
+
+### How to find downloadable models on GitHub
+
+- Use [oss_llm/find_models_on_github.py](oss_llm/find_models_on_github.py) to search GitHub repos and list any release assets that look like model files.
+- Prefer assets ending in `.gguf` if you plan to run with `llama.cpp`.
+
+Example:
+
+```sh
+python3 oss_llm/find_models_on_github.py --query "gguf qwen 2.5" --limit 20
+```
+
+### Recommended enterprise pattern
+
+If you need a specific model but can’t download it directly:
+
+1. Ask security for an **approved internal mirror** / artifact store.
+2. Mirror the model files there.
+3. Point your local runtime (Ollama/llama.cpp/MLX) at the internal location.
+
+### “No security review” learning path (best effort): train a tiny model yourself
+
+If you want to avoid downloading any third-party model weights at all, the cleanest option is to:
+
+- clone training code from GitHub
+- train a small model on a local/public-domain text file
+
+This won’t produce a state-of-the-art assistant, but it’s excellent for learning tokenization, training loops, sampling, and basic eval.
+
+One popular repo for this is **karpathy/nanoGPT**.
+
+High-level steps:
+
+```sh
+git clone https://github.com/karpathy/nanoGPT.git
+cd nanoGPT
+
+# create a python env (choose your preferred method)
+python3 -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+
+# run the built-in Shakespeare example (it downloads a small text file)
+python data/shakespeare_char/prepare.py
+python train.py data/shakespeare_char --device=cpu --compile=False
+
+# sample
+python sample.py --out_dir=out-shakespeare-char --device=cpu --compile=False
+```
+
+If even small downloads are restricted, replace the dataset step with your own local text file and adjust the dataset script accordingly.
+
+### nanoGPT demo (this repo)
+
+This workspace includes a reproducible nanoGPT workflow (CPU and Apple Silicon MPS) plus sampling demo commands:
+
+- See [oss_llm/testNanoGPT/README.md](oss_llm/testNanoGPT/README.md)
+
+### If you do download GGUF weights from GitHub
+
+It can work (some repos commit a `.gguf` directly), but treat it like any third-party binary:
+
+- Prefer repos with **clear licensing** (repo license + explicit model license/provenance)
+- Prefer “original publisher” repos over re-uploads
+- Keep models small (e.g., ~100M–3B) for macOS learning
+
+### Recommended runtimes (macOS)
+
+- **Ollama**: easiest “download + chat + local HTTP API” experience.
+- **LM Studio**: easy GUI; also exposes a local API server.
+- **llama.cpp**: most portable; great for CPU/Metal, quantized GGUF models.
+- **MLX** (Apple): best when you want Python-native workflows on Apple Silicon.
+
+### Model size guidance (rule of thumb)
+
+Quantized models are what make laptops viable.
+
+- **8–10B @ 4-bit**: typically comfortable on 16GB unified memory.
+- **14B @ 4-bit**: better with 24–32GB unified memory.
+- **30B+**: usually needs 64GB+ and will still be slow.
+
+### Good “starter” model families (pick one)
+
+These are widely supported by the runtimes above and have strong general utility:
+
+- **Llama 3.x (8B class)**: strong general chat + coding for the size.
+- **Qwen 2.5 (7B/14B class)**: strong multilingual + coding.
+- **Mistral 7B class**: fast and solid baseline.
+- **Gemma 2 (9B class)**: good general-purpose quality.
+- **Phi-3.x (mini/small class)**: very fast and lightweight.
+
+### Suggested picks by Mac memory
+
+- **16GB unified memory**: start with an 8–9B model at 4-bit.
+- **32GB unified memory**: 14B at 4-bit is a good sweet spot.
+- **64GB unified memory**: 27–34B at 4-bit becomes feasible (still slower).
+
+### Practical setup: Ollama (quickest)
+
+1. Install runtime (choose the install method your enterprise allows; Homebrew is common):
+
+```sh
+brew install ollama
+```
+
+2. Start the local service:
+
+```sh
+ollama serve
+```
+
+3. Pull/run a model (example placeholder):
+
+```sh
+ollama run <model-name>
+```
+
+4. Use the local API (optional):
+
+```sh
+curl http://127.0.0.1:11434/api/tags
+```
+
+### Practical setup: llama.cpp (most controllable)
+
+If you can obtain a quantized GGUF model file via an approved internal mirror:
+
+```sh
+brew install llama.cpp
+llama-cli -m /path/to/model.gguf -p "Hello" -n 256
+```
+
+### Practical setup: MLX (Python-centric on Apple Silicon)
+
+If your environment allows Python packages and you have an MLX-converted model available internally:
+
+```sh
+python3 -m venv .venv && source .venv/bin/activate
+pip install mlx mlx-lm
+python -m mlx_lm.generate --model /path/to/mlx-model --prompt "Hello" --max-tokens 256
+```
+
+### Enterprise note (important)
+
+Because model hosting sites may be blocked in your network category, the usual pattern in enterprise is:
+
+1. Security-approved model list
+2. Internal artifact store / mirror for model files
+3. Local runtime (Ollama/llama.cpp/MLX) pointing to those internally hosted artifacts
+
+## Steps (macOS): run Kimi Code CLI locally (client-side)
+
+Source: Kimi CLI docs.
+
+1. Install
+
+```sh
+# Install via uv (Python package manager)
+uv tool install --python 3.13 kimi-cli
+```
+
+2. Verify
+
+```sh
+kimi --version
+```
+
+3. Start in a project directory
+
+```sh
+cd /Users/sd9235/code/mygh/learning_ai_2nd_brain
+kimi
+```
+
+4. Authenticate
+
+- Preferred:
+  - Run `/login` inside the CLI and complete the browser auth.
+- Alternative:
+  - Run `/setup` and choose an API platform + API key + model.
+
+5. If models don’t show up
+
+- Kimi CLI FAQ: verify network access to your configured provider’s API endpoints.
+
+## Steps (NOT macOS): deploy Kimi K2 weights (server-side)
+
+If you have access to Linux + NVIDIA GPUs, use the official K2 deployment guide:
+
+- vLLM (requires CUDA; the guide notes vLLM v0.10.0rc1+)
+- SGLang
+- TensorRT-LLM
+
+This is the realistic path if you truly need “local” (self-hosted) K2/K2-class inference: **run the model on a GPU box/cluster** and call it from your Mac.
+
+## VPN / proxy: are we able to access through it?
+
+### What you need
+
+You generally need outbound access (through your VPN/proxy) to at least:
+
+- your chosen provider’s API host (varies by provider)
+- and, if downloading open weights, the model hosting site you plan to use.
+
+### Quick connectivity checks
+
+```sh
+# DNS + HTTPS reachability
+curl -I https://<YOUR_PROVIDER_API_HOST>
+
+# If you plan to download weights later
+curl -I https://<YOUR_MODEL_HOSTING_SITE>
+```
+
+### Configure proxy in a shell (typical)
+
+```sh
+export HTTP_PROXY="http://127.0.0.1:7890"
+export HTTPS_PROXY="http://127.0.0.1:7890"
+export ALL_PROXY="socks5://127.0.0.1:7890"
+export NO_PROXY="localhost,127.0.0.1"
+```
+
+### Configure proxy for Git
+
+```sh
+git config --global http.proxy "$HTTPS_PROXY"
+git config --global https.proxy "$HTTPS_PROXY"
+```
+
+### Configure proxy for Python/pip
+
+```sh
+pip config set global.proxy "$HTTPS_PROXY"
+```
+
+### Notes about your current network
+
+From within this VS Code environment, requests to some provider/model-hosting sites were redirected to a corporate/web-filter “blockpage” URL. If you see that on your Mac too, you’ll need one of:
+
+- VPN that routes around the filter
+- proxy that’s allowed
+- allowlist/exception for those domains
+
+## Recommendation
+
+- If your goal is to _use_ Kimi on this Mac: **install Kimi Code CLI** and make sure your VPN/proxy allows access to your configured provider.
+- If your goal is “true local inference”: **host Kimi K2 on a CUDA GPU server** (or use a smaller Mac-native model instead).
+
+## Your Mac (detected)
+
+- macOS: 15.7.3 (24G419)
+- CPU arch: arm64
+- Machine: MacBook Pro (Mac16,7)
+- Chip: Apple M4 Pro (14 cores)
+- Memory: 48 GB
+- Python: 3.13.10
+
+## Will nanoGPT work on this laptop?
+
+Yes for learning, with a couple of caveats.
+
+What will work well:
+
+- **Small CPU/MPS runs** (toy datasets like Shakespeare, short experiments, sampling).
+- With 48GB RAM and Apple Silicon, you have plenty of headroom for nanoGPT-style demos.
+
+What might block you:
+
+- Installing dependencies (notably PyTorch) typically requires access to package indexes that may be blocked in your network.
+  - In this workspace, PyTorch was successfully installed into the venv and `mps_available` is `True`.
+  - If you can’t reach package indexes in a different environment, use an internal Python package mirror, or install from a pre-approved wheelhouse.
+
+Practical suggestion:
+
+- Start with CPU (`--device=cpu`) to keep it simple.
+- If your PyTorch build supports Apple Metal (MPS), you can later try `--device=mps` for speed.
+
+Quick verification (after installing torch):
+
+```sh
+python3 -c "import torch; print(torch.__version__); print('mps', torch.backends.mps.is_available())"
+```
+
+## nanoGPT: validated end-to-end in this workspace
+
+This is a minimal, fast run that was verified on this machine.
+
+### 1) Install deps (workspace venv)
+
+```sh
+# from the workspace root
+/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python -m pip install torch numpy transformers datasets tiktoken wandb tqdm
+```
+
+### 2) Clone nanoGPT
+
+```sh
+git clone https://github.com/karpathy/nanoGPT.git oss_llm/nanoGPT
+```
+
+### 3) Prepare dataset
+
+```sh
+cd oss_llm/nanoGPT
+/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python data/shakespeare_char/prepare.py
+```
+
+### 4) Short CPU training (writes `out-shakespeare-char/ckpt.pt`)
+
+```sh
+/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python train.py config/train_shakespeare_char.py \
+  --device=cpu --compile=False \
+  --eval_interval=10 --eval_iters=10 --log_interval=10 \
+  --block_size=64 --batch_size=12 \
+  --n_layer=4 --n_head=4 --n_embd=128 \
+  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
+  --always_save_checkpoint=True
+```
+
+### 5) Sample
+
+```sh
+/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python sample.py \
+  --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
+```
+
+Tip: for speed on Apple Silicon, try `--device=mps` once you’re comfortable.
--- a/__LOCAL_LLMs/oss_llm/find_models_on_github.py
+++ b/__LOCAL_LLMs/oss_llm/find_models_on_github.py
@ -0,0 +1,251 @@
+#!/usr/bin/env python3
+"""Find model-like files hosted on GitHub.
+
+This script ONLY talks to GitHub (api.github.com) and is designed for
+restricted enterprise networks where only GitHub is reachable.
+
+It searches repositories by keyword, then inspects recent releases to find
+assets that look like model files (e.g., .gguf, .safetensors, .bin, .zip).
+
+Usage:
+    python oss_llm/find_models_on_github.py --query "qwen2.5 gguf" --limit 10
+
+Optional:
+  - Set GITHUB_TOKEN to increase API rate limits.
+
+Notes:
+  - GitHub rarely hosts very large model weights due to size constraints.
+  - Prefer smaller models and quantized artifacts.
+"""
+
+import argparse
+import os
+import sys
+import textwrap
+import urllib.parse
+import json
+import subprocess
+
+MODEL_EXTENSIONS = (
+    ".gguf",
+    ".safetensors",
+    ".bin",
+    ".pt",
+    ".pth",
+    ".onnx",
+    ".zip",
+    ".tar.gz",
+    ".tgz",
+)
+
+
+def _http_get_json(url: str) -> object:
+    token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
+
+    cmd = [
+        "curl",
+        "-L",
+        "--fail",
+        "--silent",
+        "--show-error",
+        "-H",
+        "Accept: application/vnd.github+json",
+    ]
+    if token:
+        cmd += ["-H", f"Authorization: Bearer {token}"]
+    cmd.append(url)
+
+    proc = subprocess.run(cmd, capture_output=True, text=True)
+    if proc.returncode != 0:
+        raise RuntimeError(proc.stderr.strip() or f"curl failed with exit code {proc.returncode}")
+    return json.loads(proc.stdout)
+
+
+def _looks_like_model_asset(name: str) -> bool:
+    lower = name.lower()
+    return any(lower.endswith(ext) for ext in MODEL_EXTENSIONS)
+
+
+def _safe_int(value: object) -> int | None:
+    try:
+        return int(value)  # type: ignore[arg-type]
+    except Exception:
+        return None
+
+
+def _scan_repo_tree_for_models(full_name: str, default_branch: str) -> list[str]:
+    """Return matching file paths from the repo tree.
+
+    This is useful when model files are stored in-repo (often via Git LFS),
+    and therefore don't show up as release assets.
+    """
+
+    # The Trees API can be large; callers should keep repo counts low.
+    url = f"https://api.github.com/repos/{full_name}/git/trees/{urllib.parse.quote(default_branch)}?recursive=1"
+    data = _http_get_json(url)
+    if not isinstance(data, dict):
+        return []
+    tree = data.get("tree")
+    if not isinstance(tree, list):
+        return []
+
+    hits: list[str] = []
+    for node in tree:
+        if not isinstance(node, dict):
+            continue
+        path = node.get("path")
+        ntype = node.get("type")
+        if ntype != "blob" or not isinstance(path, str):
+            continue
+        if _looks_like_model_asset(path):
+            hits.append(path)
+    return hits
+
+
+def _human_size(num_bytes: int) -> str:
+    n = float(num_bytes)
+    for unit in ("B", "KB", "MB", "GB", "TB"):
+        if n < 1024.0:
+            return f"{n:.1f}{unit}"
+        n /= 1024.0
+    return f"{n:.1f}PB"
+
+
+def main() -> int:
+    p = argparse.ArgumentParser(
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        description="Search GitHub releases for model-like assets.",
+        epilog=textwrap.dedent(
+            """
+            Examples:
+                            python oss_llm/find_models_on_github.py --query "llama gguf" --limit 20
+                            python oss_llm/find_models_on_github.py --query "qwen 2.5 gguf" --limit 10
+
+            Tips:
+              - Add qualifiers to narrow results, e.g. `language:python`, `in:name`, `topic:llama-cpp`.
+              - Set GITHUB_TOKEN to avoid low unauthenticated rate limits.
+            """
+        ),
+    )
+    p.add_argument("--query", required=True, help="GitHub search query")
+    p.add_argument("--limit", type=int, default=10, help="Max repositories to inspect")
+    p.add_argument(
+        "--per-page",
+        type=int,
+        default=10,
+        help="Repos per page from search API (max 100)",
+    )
+    p.add_argument(
+        "--max-releases",
+        type=int,
+        default=5,
+        help="Max recent releases per repo to inspect",
+    )
+    p.add_argument(
+        "--scan-tree",
+        action="store_true",
+        help="Also scan repository file trees for model-like files (slower, more API calls)",
+    )
+    args = p.parse_args()
+
+    query = args.query.strip()
+    if not query:
+        print("--query must be non-empty", file=sys.stderr)
+        return 2
+
+    per_page = max(1, min(100, args.per_page))
+    limit = max(1, args.limit)
+
+    search_q = urllib.parse.quote(query)
+    search_url = f"https://api.github.com/search/repositories?q={search_q}&per_page={per_page}"
+
+    try:
+        search = _http_get_json(search_url)
+    except Exception as e:
+        print(f"GitHub search failed: {e}", file=sys.stderr)
+        return 1
+
+    items = search.get("items") if isinstance(search, dict) else None
+    if not items:
+        print("No repositories found.")
+        return 0
+
+    inspected = 0
+    found_any = False
+
+    for repo in items:
+        if inspected >= limit:
+            break
+
+        full_name = repo.get("full_name")
+        html_url = repo.get("html_url")
+        default_branch = repo.get("default_branch")
+        if not full_name or not html_url:
+            continue
+
+        inspected += 1
+        print(f"\n== {full_name} ==")
+        print(html_url)
+
+        releases_url = f"https://api.github.com/repos/{full_name}/releases?per_page={args.max_releases}"
+        try:
+            releases = _http_get_json(releases_url)
+        except Exception as e:
+            print(f"  releases: error: {e}")
+            continue
+
+        if not isinstance(releases, list) or len(releases) == 0:
+            print("  releases: none")
+            releases = []
+
+        repo_hit = False
+        for rel in releases[: args.max_releases]:
+            tag = rel.get("tag_name") or "(no tag)"
+            name = rel.get("name") or "(no name)"
+            assets = rel.get("assets") or []
+            model_assets = [a for a in assets if _looks_like_model_asset(a.get("name", ""))]
+
+            if not model_assets:
+                continue
+
+            repo_hit = True
+            found_any = True
+            print(f"  release: {tag} — {name}")
+            for a in model_assets:
+                aname = a.get("name") or "(no name)"
+                size = a.get("size")
+                url = a.get("browser_download_url") or ""
+                size_str = _human_size(int(size)) if isinstance(size, int) else "?"
+                print(f"    - {aname} ({size_str})")
+                if url:
+                    print(f"      {url}")
+
+        if releases and not repo_hit:
+            print("  releases: present, but no model-like assets found")
+
+        if args.scan_tree and isinstance(default_branch, str) and default_branch:
+            try:
+                paths = _scan_repo_tree_for_models(full_name, default_branch)
+            except Exception as e:
+                print(f"  repo files: error: {e}")
+                continue
+
+            if paths:
+                found_any = True
+                print(f"  repo files: found {len(paths)} model-like paths")
+                for pth in paths[:30]:
+                    print(f"    - {pth}")
+                if len(paths) > 30:
+                    print("    - (more omitted)")
+            else:
+                print("  repo files: no model-like paths found")
+
+    if not found_any:
+        print("\nNo model-like release assets found in inspected repositories.")
+        print("Try refining your query (add 'gguf', model family name, or 'quant').")
+
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/__LOCAL_LLMs/oss_llm/quick_checks.sh
+++ b/__LOCAL_LLMs/oss_llm/quick_checks.sh
@ -0,0 +1,25 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+echo "== Network reachability =="
+
+: "${CHECK_URLS:=}"
+
+if [[ -z "${CHECK_URLS}" ]]; then
+  cat <<'TXT'
+No URLs configured.
+
+Set CHECK_URLS to a space-separated list of HTTPS URLs you want to test, e.g.:
+  CHECK_URLS="https://github.com https://example.com" ./oss_llm/quick_checks.sh
+
+This script intentionally does not hardcode any provider endpoints.
+TXT
+else
+  for url in ${CHECK_URLS}; do
+    echo "\n--- $url"
+    curl -I --max-time 10 "$url" | head -n 5 || true
+  done
+fi
+
+echo "\n== Proxy env vars (current) =="
+env | grep -E '^(HTTP|HTTPS|ALL|NO)_PROXY=' || echo "(none set)"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_venv
+
+# nanoGPT README lists these deps.
+"${VENV_PY}" -m pip install --upgrade pip
+"${VENV_PY}" -m pip install torch numpy transformers datasets tiktoken wandb tqdm
+
+# Quick sanity check
+"${VENV_PY}" -c "import torch; print('torch', torch.__version__); print('mps', torch.backends.mps.is_available())"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_git
+
+mkdir -p "${WORKSPACE_ROOT}/oss_llm"
+
+if [[ -d "${NANOGPT_DIR}/.git" ]]; then
+  echo "nanoGPT already cloned; updating…"
+  cd "${NANOGPT_DIR}"
+  git pull --ff-only
+else
+  echo "Cloning nanoGPT into ${NANOGPT_DIR}…"
+  rm -rf "${NANOGPT_DIR}"
+  git clone https://github.com/karpathy/nanoGPT.git "${NANOGPT_DIR}"
+fi
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_venv
+require_curl
+cd_nanogpt
+
+# Prefer a github.com URL (some networks block raw.githubusercontent.com).
+# This writes the input file where nanoGPT's prepare script expects it.
+INPUT_TXT="data/shakespeare_char/input.txt"
+if [[ ! -f "${INPUT_TXT}" ]]; then
+	mkdir -p "$(dirname "${INPUT_TXT}")"
+	echo "Downloading tiny Shakespeare to ${INPUT_TXT}" >&2
+	curl -fL --retry 3 --retry-delay 1 \
+		-o "${INPUT_TXT}" \
+		"https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt"
+fi
+
+"${VENV_PY}" data/shakespeare_char/prepare.py
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_venv
+cd_nanogpt
+
+"${VENV_PY}" train.py config/train_shakespeare_char.py \
+  --device=cpu --compile=False --dtype=float32 \
+  --eval_interval=10 --eval_iters=10 --log_interval=10 \
+  --block_size=64 --batch_size=12 \
+  --n_layer=4 --n_head=4 --n_embd=128 \
+  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
+  --always_save_checkpoint=True
+
+echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_venv
+cd_nanogpt
+
+# Requires torch with MPS support (Apple Silicon). If MPS isn't available,
+# fall back to CPU.
+DEVICE="mps"
+if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
+  echo "MPS not available; falling back to CPU" >&2
+  DEVICE="cpu"
+fi
+
+"${VENV_PY}" train.py config/train_shakespeare_char.py \
+  --device="${DEVICE}" --compile=False --dtype=float32 \
+  --eval_interval=10 --eval_iters=10 --log_interval=10 \
+  --block_size=64 --batch_size=12 \
+  --n_layer=4 --n_head=4 --n_embd=128 \
+  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
+  --always_save_checkpoint=True
+
+echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
+require_venv
+cd_nanogpt
+
+if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
+  echo "ERROR: ckpt.pt not found. Run training first:" >&2
+  echo "  bash oss_llm/testNanoGPT/30_train_cpu_quick.sh" >&2
+  exit 1
+fi
+
+"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# One-shot validation that prefers Apple Silicon MPS.
+
+bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
+bash oss_llm/testNanoGPT/00_setup_env.sh
+bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
+
+# Train with MPS if available (script falls back to CPU)
+bash oss_llm/testNanoGPT/31_train_mps_quick.sh
+
+# Sample with MPS if available, else CPU
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+# shellcheck source=./_common.sh
+source "${SCRIPT_DIR}/_common.sh"
+require_venv
+cd_nanogpt
+
+DEVICE="mps"
+if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
+  DEVICE="cpu"
+fi
+
+if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
+  echo "ERROR: ckpt.pt not found after training" >&2
+  exit 1
+fi
+
+"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device="${DEVICE}" --dtype=float32 --max_new_tokens=200
+
+printf '\nOK: nanoGPT MPS smoke test completed (device=%s).\n' "${DEVICE}"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# One-shot end-to-end validation.
+bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
+bash oss_llm/testNanoGPT/00_setup_env.sh
+bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
+bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
+bash oss_llm/testNanoGPT/40_sample_cpu.sh
+
+echo "\nOK: nanoGPT smoke test completed."
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
@ -0,0 +1,105 @@
+# nanoGPT: use & test (local)
+
+This folder contains runnable scripts to validate **nanoGPT** end-to-end in this workspace.
+
+## Prereqs
+
+- `git`, `python3`, and `curl` are available.
+- GitHub is reachable (for cloning nanoGPT and downloading the tiny Shakespeare text).
+
+## What the scripts do
+
+- `00_setup_env.sh`: installs Python deps into the workspace venv.
+- `10_clone_nanogpt.sh`: clones `karpathy/nanoGPT` into `oss_llm/nanoGPT` (or updates it).
+- `20_prepare_shakespeare.sh`: downloads/prepares the tiny Shakespeare dataset.
+- `30_train_cpu_quick.sh`: quick CPU training run (writes `out-shakespeare-char/ckpt.pt`).
+- `31_train_mps_quick.sh`: quick MPS (Metal) training run (faster on Apple Silicon).
+- `40_sample_cpu.sh`: samples from the trained checkpoint.
+- `98_smoke_test_mps.sh`: runs clone → deps → prepare → train (MPS) → sample (MPS).
+- `99_smoke_test_all.sh`: runs clone → deps → prepare → train → sample.
+
+## What nanoGPT demonstrates (current capabilities)
+
+With the default tiny Shakespeare **character-level** example, the checkpoint you train here supports:
+
+- **Unconditional generation**: start from a newline and generate Shakespeare-ish character patterns.
+- **Prompted continuation**: provide a short prompt (e.g. a phrase) and generate a continuation.
+- **Sampling controls**:
+  - `--temperature` controls randomness (lower is more deterministic).
+  - `--top_k` clamps sampling to the top-K next-token candidates (lower is more conservative).
+
+Reality check: with the default quick config (small model, short training), output will often look like _Shakespeare-shaped gibberish_. That’s expected; the goal is validating the end-to-end training + sampling workflow.
+
+### Reproduce the demo generations
+
+From the workspace root:
+
+1. Ensure you have a checkpoint (CPU):
+
+```sh
+bash oss_llm/testNanoGPT/99_smoke_test_all.sh
+```
+
+2. Unconditional samples (2 short samples):
+
+```sh
+cd oss_llm/nanoGPT
+./../.venv/bin/python sample.py \
+	--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
+	--num_samples=2 --max_new_tokens=220 --temperature=0.8 --top_k=200
+```
+
+3. Prompted continuation (more conservative sampling):
+
+```sh
+cd oss_llm/nanoGPT
+./../.venv/bin/python sample.py \
+	--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
+	--num_samples=1 --max_new_tokens=220 --temperature=0.4 --top_k=50 \
+	--start="To be, or not to be"
+```
+
+## One-command smoke test
+
+From the workspace root:
+
+```sh
+bash oss_llm/testNanoGPT/99_smoke_test_all.sh
+```
+
+## One-command MPS smoke test
+
+From the workspace root:
+
+```sh
+bash oss_llm/testNanoGPT/98_smoke_test_mps.sh
+```
+
+## Common commands
+
+- Install deps only:
+
+```sh
+bash oss_llm/testNanoGPT/00_setup_env.sh
+```
+
+- Quick CPU train + sample:
+
+```sh
+bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
+bash oss_llm/testNanoGPT/40_sample_cpu.sh
+```
+
+- Quick MPS train + sample:
+
+```sh
+bash oss_llm/testNanoGPT/31_train_mps_quick.sh
+bash oss_llm/testNanoGPT/40_sample_cpu.sh  # sampling can still run on CPU
+```
+
+## Notes
+
+- The CPU/MPS training scripts intentionally use a tiny model + small iteration count to finish quickly.
+- If `ckpt.pt` is missing, it usually means training didn’t run an eval step; these scripts set `--eval_interval` low to force checkpoint writes.
+- Scripts will auto-create `./.venv` on first run if it does not exist.
+- Dataset download is via a `github.com/.../raw/...` URL to avoid reliance on `raw.githubusercontent.com`.
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
@ -0,0 +1,61 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Resolve workspace root (this repo) regardless of where the script is called from.
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+WORKSPACE_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+# Some managed macOS environments set TMPDIR to slow/unusual locations.
+# Force a local temp directory for predictable behavior.
+export TMPDIR="/tmp"
+
+VENV_PY="${WORKSPACE_ROOT}/.venv/bin/python"
+NANOGPT_DIR="${WORKSPACE_ROOT}/oss_llm/nanoGPT"
+
+require_python3() {
+  if ! command -v python3 >/dev/null 2>&1; then
+    echo "ERROR: python3 not found in PATH" >&2
+    exit 1
+  fi
+}
+
+require_curl() {
+  if ! command -v curl >/dev/null 2>&1; then
+    echo "ERROR: curl not found in PATH" >&2
+    exit 1
+  fi
+}
+
+ensure_venv() {
+  if [[ -x "${VENV_PY}" ]]; then
+    return 0
+  fi
+
+  require_python3
+  echo "Creating workspace venv at: ${WORKSPACE_ROOT}/.venv" >&2
+  python3 -m venv "${WORKSPACE_ROOT}/.venv"
+}
+
+require_venv() {
+  ensure_venv
+  if [[ ! -x "${VENV_PY}" ]]; then
+    echo "ERROR: Failed to create venv python at: ${VENV_PY}" >&2
+    exit 1
+  fi
+}
+
+require_git() {
+  if ! command -v git >/dev/null 2>&1; then
+    echo "ERROR: git not found in PATH" >&2
+    exit 1
+  fi
+}
+
+cd_nanogpt() {
+  if [[ ! -d "${NANOGPT_DIR}" ]]; then
+    echo "ERROR: nanoGPT repo not found at ${NANOGPT_DIR}" >&2
+    echo "Run: bash oss_llm/testNanoGPT/10_clone_nanogpt.sh" >&2
+    exit 1
+  fi
+  cd "${NANOGPT_DIR}"
+}