move oss_llm/ from learning_ai_2nd_brain

2026-02-28 00:03:37 -08:00 · 2026-02-28 00:03:37 -08:00 · 1a4b3c1fb3
commit 1a4b3c1fb3
parent a32978f9c3
13 changed files with 971 additions and 0 deletions
--- a/__LOCAL_LLMs/oss_llm/README.md
+++ b/__LOCAL_LLMs/oss_llm/README.md
@ -0,0 +1,383 @@
 # Kimi “2.5” local deployment on macOS (what’s реально possible)
 ## Executive summary
 - **Running the real “Kimi 2.5 / Kimi K2-class” model fully locally on a Mac is not practical**: the official open-weight Kimi K2 deployment guidance targets **multi-node NVIDIA GPU clusters** (vLLM/SGLang/TensorRT-LLM) and assumes CUDA.
 - **What _is_ practical on a Mac:** use **Kimi Code CLI** (official) or the **Moonshot API** (official). That’s not “local inference” (weights on your laptop), but it is “local usage” (client runs on your laptop).
 - **VPN / proxy:** yes, clients can usually work through VPN/proxy, but you must be able to reach the provider’s API endpoints. If your network blocks access, you’ll need allowlisting or a different route.
 ## What GitHub shows (official sources)
 ### 1) Kimi K2 (open-weight model series)
 - Official repo: https://github.com/MoonshotAI/Kimi-K2
 - The repo includes a deployment guide at `docs/deploy_guidance.md`.
 - Key reality check from the guide: **the smallest FP8 128k-seqlen deployment is described as ~16 GPUs (H200/H20-class)** for mainstream setups (vLLM/SGLang). This is fundamentally not macOS-laptop friendly.
 ### 2) Kimi Code CLI (best option for macOS)
 - Official repo: https://github.com/MoonshotAI/kimi-cli
 - Docs: https://moonshotai.github.io/kimi-cli/en/guides/getting-started.html
 - Kimi Code CLI is an **agent client** (terminal tool) that talks to a remote provider.
 - First-run authentication options:
  - `/login` (browser login; auto-configures models)
  - `/setup` (API key flow)
 ## Can we locally deploy “Kimi 2.5” on this Mac?
 ### Practically: no (for K2 / K2-class)
 - **macOS has no CUDA**, so GPU-first inference stacks referenced for K2 deployment (vLLM / TensorRT-LLM) are not an option.
 - Even if you tried CPU inference, the K2-class model scale (MoE, 1T total params / 32B active) is far beyond what is reasonable on a laptop.
 ### What’s feasible instead
 1. **Use Kimi Code CLI on macOS** (recommended)
 2. **Use Moonshot/Kimi APIs from your own scripts** (Python/Node/etc) once your network allows access
 3. If you truly need **local weights on Mac**, use a Mac-friendly model/runtime (e.g., MLX/Ollama) — but that would be **a different model**, not Kimi 2.5/K2.
 ## Options: smaller, Mac-runnable open models (local inference)
 If your goal is “no network calls at runtime”, pick a **local runtime** + a **small-enough model + quantization**.
 ## If you can only access GitHub
 If your enterprise network only allows `github.com`, that severely limits how you obtain model weights because most model hosting is **not** on GitHub.
 What still works:
 - **GitHub Releases assets**: some projects publish quantized model files (often `.gguf`) as release assets.
 - **Git LFS inside a repo**: occasionally weights are stored in-repo via LFS (still uncommon for large models).
 What usually won’t work:
 - Downloading weights from external model registries / hosting sites (blocked by policy).
 Reality check: GitHub has practical size limits, so **very large models are rarely hosted there**. Your best bet is to use **small models (7B–14B) in quantized form**.
 ### How to find downloadable models on GitHub
 - Use [oss_llm/find_models_on_github.py](oss_llm/find_models_on_github.py) to search GitHub repos and list any release assets that look like model files.
 - Prefer assets ending in `.gguf` if you plan to run with `llama.cpp`.
 Example:
 ```sh
 python3 oss_llm/find_models_on_github.py --query "gguf qwen 2.5" --limit 20
 ```
 ### Recommended enterprise pattern
 If you need a specific model but can’t download it directly:
 1. Ask security for an **approved internal mirror** / artifact store.
 2. Mirror the model files there.
 3. Point your local runtime (Ollama/llama.cpp/MLX) at the internal location.
 ### “No security review” learning path (best effort): train a tiny model yourself
 If you want to avoid downloading any third-party model weights at all, the cleanest option is to:
 - clone training code from GitHub
 - train a small model on a local/public-domain text file
 This won’t produce a state-of-the-art assistant, but it’s excellent for learning tokenization, training loops, sampling, and basic eval.
 One popular repo for this is **karpathy/nanoGPT**.
 High-level steps:
 ```sh
 git clone https://github.com/karpathy/nanoGPT.git
 cd nanoGPT
 # create a python env (choose your preferred method)
 python3 -m venv .venv && source .venv/bin/activate
 pip install -r requirements.txt
 # run the built-in Shakespeare example (it downloads a small text file)
 python data/shakespeare_char/prepare.py
 python train.py data/shakespeare_char --device=cpu --compile=False
 # sample
 python sample.py --out_dir=out-shakespeare-char --device=cpu --compile=False
 ```
 If even small downloads are restricted, replace the dataset step with your own local text file and adjust the dataset script accordingly.
 ### nanoGPT demo (this repo)
 This workspace includes a reproducible nanoGPT workflow (CPU and Apple Silicon MPS) plus sampling demo commands:
 - See [oss_llm/testNanoGPT/README.md](oss_llm/testNanoGPT/README.md)
 ### If you do download GGUF weights from GitHub
 It can work (some repos commit a `.gguf` directly), but treat it like any third-party binary:
 - Prefer repos with **clear licensing** (repo license + explicit model license/provenance)
 - Prefer “original publisher” repos over re-uploads
 - Keep models small (e.g., ~100M–3B) for macOS learning
 ### Recommended runtimes (macOS)
 - **Ollama**: easiest “download + chat + local HTTP API” experience.
 - **LM Studio**: easy GUI; also exposes a local API server.
 - **llama.cpp**: most portable; great for CPU/Metal, quantized GGUF models.
 - **MLX** (Apple): best when you want Python-native workflows on Apple Silicon.
 ### Model size guidance (rule of thumb)
 Quantized models are what make laptops viable.
 - **8–10B @ 4-bit**: typically comfortable on 16GB unified memory.
 - **14B @ 4-bit**: better with 24–32GB unified memory.
 - **30B+**: usually needs 64GB+ and will still be slow.
 ### Good “starter” model families (pick one)
 These are widely supported by the runtimes above and have strong general utility:
 - **Llama 3.x (8B class)**: strong general chat + coding for the size.
 - **Qwen 2.5 (7B/14B class)**: strong multilingual + coding.
 - **Mistral 7B class**: fast and solid baseline.
 - **Gemma 2 (9B class)**: good general-purpose quality.
 - **Phi-3.x (mini/small class)**: very fast and lightweight.
 ### Suggested picks by Mac memory
 - **16GB unified memory**: start with an 8–9B model at 4-bit.
 - **32GB unified memory**: 14B at 4-bit is a good sweet spot.
 - **64GB unified memory**: 27–34B at 4-bit becomes feasible (still slower).
 ### Practical setup: Ollama (quickest)
 1. Install runtime (choose the install method your enterprise allows; Homebrew is common):
 ```sh
 brew install ollama
 ```
 2. Start the local service:
 ```sh
 ollama serve
 ```
 3. Pull/run a model (example placeholder):
 ```sh
 ollama run <model-name>
 ```
 4. Use the local API (optional):
 ```sh
 curl http://127.0.0.1:11434/api/tags
 ```
 ### Practical setup: llama.cpp (most controllable)
 If you can obtain a quantized GGUF model file via an approved internal mirror:
 ```sh
 brew install llama.cpp
 llama-cli -m /path/to/model.gguf -p "Hello" -n 256
 ```
 ### Practical setup: MLX (Python-centric on Apple Silicon)
 If your environment allows Python packages and you have an MLX-converted model available internally:
 ```sh
 python3 -m venv .venv && source .venv/bin/activate
 pip install mlx mlx-lm
 python -m mlx_lm.generate --model /path/to/mlx-model --prompt "Hello" --max-tokens 256
 ```
 ### Enterprise note (important)
 Because model hosting sites may be blocked in your network category, the usual pattern in enterprise is:
 1. Security-approved model list
 2. Internal artifact store / mirror for model files
 3. Local runtime (Ollama/llama.cpp/MLX) pointing to those internally hosted artifacts
 ## Steps (macOS): run Kimi Code CLI locally (client-side)
 Source: Kimi CLI docs.
 1. Install
 ```sh
 # Install via uv (Python package manager)
 uv tool install --python 3.13 kimi-cli
 ```
 2. Verify
 ```sh
 kimi --version
 ```
 3. Start in a project directory
 ```sh
 cd /Users/sd9235/code/mygh/learning_ai_2nd_brain
 kimi
 ```
 4. Authenticate
 - Preferred:
  - Run `/login` inside the CLI and complete the browser auth.
 - Alternative:
  - Run `/setup` and choose an API platform + API key + model.
 5. If models don’t show up
 - Kimi CLI FAQ: verify network access to your configured provider’s API endpoints.
 ## Steps (NOT macOS): deploy Kimi K2 weights (server-side)
 If you have access to Linux + NVIDIA GPUs, use the official K2 deployment guide:
 - vLLM (requires CUDA; the guide notes vLLM v0.10.0rc1+)
 - SGLang
 - TensorRT-LLM
 This is the realistic path if you truly need “local” (self-hosted) K2/K2-class inference: **run the model on a GPU box/cluster** and call it from your Mac.
 ## VPN / proxy: are we able to access through it?
 ### What you need
 You generally need outbound access (through your VPN/proxy) to at least:
 - your chosen provider’s API host (varies by provider)
 - and, if downloading open weights, the model hosting site you plan to use.
 ### Quick connectivity checks
 ```sh
 # DNS + HTTPS reachability
 curl -I https://<YOUR_PROVIDER_API_HOST>
 # If you plan to download weights later
 curl -I https://<YOUR_MODEL_HOSTING_SITE>
 ```
 ### Configure proxy in a shell (typical)
 ```sh
 export HTTP_PROXY="http://127.0.0.1:7890"
 export HTTPS_PROXY="http://127.0.0.1:7890"
 export ALL_PROXY="socks5://127.0.0.1:7890"
 export NO_PROXY="localhost,127.0.0.1"
 ```
 ### Configure proxy for Git
 ```sh
 git config --global http.proxy "$HTTPS_PROXY"
 git config --global https.proxy "$HTTPS_PROXY"
 ```
 ### Configure proxy for Python/pip
 ```sh
 pip config set global.proxy "$HTTPS_PROXY"
 ```
 ### Notes about your current network
 From within this VS Code environment, requests to some provider/model-hosting sites were redirected to a corporate/web-filter “blockpage” URL. If you see that on your Mac too, you’ll need one of:
 - VPN that routes around the filter
 - proxy that’s allowed
 - allowlist/exception for those domains
 ## Recommendation
 - If your goal is to _use_ Kimi on this Mac: **install Kimi Code CLI** and make sure your VPN/proxy allows access to your configured provider.
 - If your goal is “true local inference”: **host Kimi K2 on a CUDA GPU server** (or use a smaller Mac-native model instead).
 ## Your Mac (detected)
 - macOS: 15.7.3 (24G419)
 - CPU arch: arm64
 - Machine: MacBook Pro (Mac16,7)
 - Chip: Apple M4 Pro (14 cores)
 - Memory: 48 GB
 - Python: 3.13.10
 ## Will nanoGPT work on this laptop?
 Yes for learning, with a couple of caveats.
 What will work well:
 - **Small CPU/MPS runs** (toy datasets like Shakespeare, short experiments, sampling).
 - With 48GB RAM and Apple Silicon, you have plenty of headroom for nanoGPT-style demos.
 What might block you:
 - Installing dependencies (notably PyTorch) typically requires access to package indexes that may be blocked in your network.
  - In this workspace, PyTorch was successfully installed into the venv and `mps_available` is `True`.
  - If you can’t reach package indexes in a different environment, use an internal Python package mirror, or install from a pre-approved wheelhouse.
 Practical suggestion:
 - Start with CPU (`--device=cpu`) to keep it simple.
 - If your PyTorch build supports Apple Metal (MPS), you can later try `--device=mps` for speed.
 Quick verification (after installing torch):
 ```sh
 python3 -c "import torch; print(torch.__version__); print('mps', torch.backends.mps.is_available())"
 ```
 ## nanoGPT: validated end-to-end in this workspace
 This is a minimal, fast run that was verified on this machine.
 ### 1) Install deps (workspace venv)
 ```sh
 # from the workspace root
 /Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python -m pip install torch numpy transformers datasets tiktoken wandb tqdm
 ```
 ### 2) Clone nanoGPT
 ```sh
 git clone https://github.com/karpathy/nanoGPT.git oss_llm/nanoGPT
 ```
 ### 3) Prepare dataset
 ```sh
 cd oss_llm/nanoGPT
 /Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python data/shakespeare_char/prepare.py
 ```
 ### 4) Short CPU training (writes `out-shakespeare-char/ckpt.pt`)
 ```sh
 /Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python train.py config/train_shakespeare_char.py \
  --device=cpu --compile=False \
  --eval_interval=10 --eval_iters=10 --log_interval=10 \
  --block_size=64 --batch_size=12 \
  --n_layer=4 --n_head=4 --n_embd=128 \
  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
  --always_save_checkpoint=True
 ```
 ### 5) Sample
 ```sh
 /Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python sample.py \
  --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
 ```
 Tip: for speed on Apple Silicon, try `--device=mps` once you’re comfortable.
--- a/__LOCAL_LLMs/oss_llm/find_models_on_github.py
+++ b/__LOCAL_LLMs/oss_llm/find_models_on_github.py
@ -0,0 +1,251 @@
 #!/usr/bin/env python3
 """Find model-like files hosted on GitHub.
 This script ONLY talks to GitHub (api.github.com) and is designed for
 restricted enterprise networks where only GitHub is reachable.
 It searches repositories by keyword, then inspects recent releases to find
 assets that look like model files (e.g., .gguf, .safetensors, .bin, .zip).
 Usage:
    python oss_llm/find_models_on_github.py --query "qwen2.5 gguf" --limit 10
 Optional:
  - Set GITHUB_TOKEN to increase API rate limits.
 Notes:
  - GitHub rarely hosts very large model weights due to size constraints.
  - Prefer smaller models and quantized artifacts.
 """
 import argparse
 import os
 import sys
 import textwrap
 import urllib.parse
 import json
 import subprocess
 MODEL_EXTENSIONS = (
    ".gguf",
    ".safetensors",
    ".bin",
    ".pt",
    ".pth",
    ".onnx",
    ".zip",
    ".tar.gz",
    ".tgz",
 )
 def _http_get_json(url: str) -> object:
    token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
    cmd = [
        "curl",
        "-L",
        "--fail",
        "--silent",
        "--show-error",
        "-H",
        "Accept: application/vnd.github+json",
    ]
    if token:
        cmd += ["-H", f"Authorization: Bearer {token}"]
    cmd.append(url)
    proc = subprocess.run(cmd, capture_output=True, text=True)
    if proc.returncode != 0:
        raise RuntimeError(proc.stderr.strip() or f"curl failed with exit code {proc.returncode}")
    return json.loads(proc.stdout)
 def _looks_like_model_asset(name: str) -> bool:
    lower = name.lower()
    return any(lower.endswith(ext) for ext in MODEL_EXTENSIONS)
 def _safe_int(value: object) -> int | None:
    try:
        return int(value)  # type: ignore[arg-type]
    except Exception:
        return None
 def _scan_repo_tree_for_models(full_name: str, default_branch: str) -> list[str]:
    """Return matching file paths from the repo tree.
    This is useful when model files are stored in-repo (often via Git LFS),
    and therefore don't show up as release assets.
    """
    # The Trees API can be large; callers should keep repo counts low.
    url = f"https://api.github.com/repos/{full_name}/git/trees/{urllib.parse.quote(default_branch)}?recursive=1"
    data = _http_get_json(url)
    if not isinstance(data, dict):
        return []
    tree = data.get("tree")
    if not isinstance(tree, list):
        return []
    hits: list[str] = []
    for node in tree:
        if not isinstance(node, dict):
            continue
        path = node.get("path")
        ntype = node.get("type")
        if ntype != "blob" or not isinstance(path, str):
            continue
        if _looks_like_model_asset(path):
            hits.append(path)
    return hits
 def _human_size(num_bytes: int) -> str:
    n = float(num_bytes)
    for unit in ("B", "KB", "MB", "GB", "TB"):
        if n < 1024.0:
            return f"{n:.1f}{unit}"
        n /= 1024.0
    return f"{n:.1f}PB"
 def main() -> int:
    p = argparse.ArgumentParser(
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description="Search GitHub releases for model-like assets.",
        epilog=textwrap.dedent(
            """
            Examples:
                            python oss_llm/find_models_on_github.py --query "llama gguf" --limit 20
                            python oss_llm/find_models_on_github.py --query "qwen 2.5 gguf" --limit 10
            Tips:
              - Add qualifiers to narrow results, e.g. `language:python`, `in:name`, `topic:llama-cpp`.
              - Set GITHUB_TOKEN to avoid low unauthenticated rate limits.
            """
        ),
    )
    p.add_argument("--query", required=True, help="GitHub search query")
    p.add_argument("--limit", type=int, default=10, help="Max repositories to inspect")
    p.add_argument(
        "--per-page",
        type=int,
        default=10,
        help="Repos per page from search API (max 100)",
    )
    p.add_argument(
        "--max-releases",
        type=int,
        default=5,
        help="Max recent releases per repo to inspect",
    )
    p.add_argument(
        "--scan-tree",
        action="store_true",
        help="Also scan repository file trees for model-like files (slower, more API calls)",
    )
    args = p.parse_args()
    query = args.query.strip()
    if not query:
        print("--query must be non-empty", file=sys.stderr)
        return 2
    per_page = max(1, min(100, args.per_page))
    limit = max(1, args.limit)
    search_q = urllib.parse.quote(query)
    search_url = f"https://api.github.com/search/repositories?q={search_q}&per_page={per_page}"
    try:
        search = _http_get_json(search_url)
    except Exception as e:
        print(f"GitHub search failed: {e}", file=sys.stderr)
        return 1
    items = search.get("items") if isinstance(search, dict) else None
    if not items:
        print("No repositories found.")
        return 0
    inspected = 0
    found_any = False
    for repo in items:
        if inspected >= limit:
            break
        full_name = repo.get("full_name")
        html_url = repo.get("html_url")
        default_branch = repo.get("default_branch")
        if not full_name or not html_url:
            continue
        inspected += 1
        print(f"\n== {full_name} ==")
        print(html_url)
        releases_url = f"https://api.github.com/repos/{full_name}/releases?per_page={args.max_releases}"
        try:
            releases = _http_get_json(releases_url)
        except Exception as e:
            print(f"  releases: error: {e}")
            continue
        if not isinstance(releases, list) or len(releases) == 0:
            print("  releases: none")
            releases = []
        repo_hit = False
        for rel in releases[: args.max_releases]:
            tag = rel.get("tag_name") or "(no tag)"
            name = rel.get("name") or "(no name)"
            assets = rel.get("assets") or []
            model_assets = [a for a in assets if _looks_like_model_asset(a.get("name", ""))]
            if not model_assets:
                continue
            repo_hit = True
            found_any = True
            print(f"  release: {tag} — {name}")
            for a in model_assets:
                aname = a.get("name") or "(no name)"
                size = a.get("size")
                url = a.get("browser_download_url") or ""
                size_str = _human_size(int(size)) if isinstance(size, int) else "?"
                print(f"    - {aname} ({size_str})")
                if url:
                    print(f"      {url}")
        if releases and not repo_hit:
            print("  releases: present, but no model-like assets found")
        if args.scan_tree and isinstance(default_branch, str) and default_branch:
            try:
                paths = _scan_repo_tree_for_models(full_name, default_branch)
            except Exception as e:
                print(f"  repo files: error: {e}")
                continue
            if paths:
                found_any = True
                print(f"  repo files: found {len(paths)} model-like paths")
                for pth in paths[:30]:
                    print(f"    - {pth}")
                if len(paths) > 30:
                    print("    - (more omitted)")
            else:
                print("  repo files: no model-like paths found")
    if not found_any:
        print("\nNo model-like release assets found in inspected repositories.")
        print("Try refining your query (add 'gguf', model family name, or 'quant').")
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/__LOCAL_LLMs/oss_llm/quick_checks.sh
+++ b/__LOCAL_LLMs/oss_llm/quick_checks.sh
@ -0,0 +1,25 @@
 #!/usr/bin/env bash
 set -euo pipefail
 echo "== Network reachability =="
 : "${CHECK_URLS:=}"
 if [[ -z "${CHECK_URLS}" ]]; then
  cat <<'TXT'
 No URLs configured.
 Set CHECK_URLS to a space-separated list of HTTPS URLs you want to test, e.g.:
  CHECK_URLS="https://github.com https://example.com" ./oss_llm/quick_checks.sh
 This script intentionally does not hardcode any provider endpoints.
 TXT
 else
  for url in ${CHECK_URLS}; do
    echo "\n--- $url"
    curl -I --max-time 10 "$url" | head -n 5 || true
  done
 fi
 echo "\n== Proxy env vars (current) =="
 env | grep -E '^(HTTP|HTTPS|ALL|NO)_PROXY=' || echo "(none set)"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
@ -0,0 +1,12 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_venv
 # nanoGPT README lists these deps.
 "${VENV_PY}" -m pip install --upgrade pip
 "${VENV_PY}" -m pip install torch numpy transformers datasets tiktoken wandb tqdm
 # Quick sanity check
 "${VENV_PY}" -c "import torch; print('torch', torch.__version__); print('mps', torch.backends.mps.is_available())"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
@ -0,0 +1,17 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_git
 mkdir -p "${WORKSPACE_ROOT}/oss_llm"
 if [[ -d "${NANOGPT_DIR}/.git" ]]; then
  echo "nanoGPT already cloned; updating…"
  cd "${NANOGPT_DIR}"
  git pull --ff-only
 else
  echo "Cloning nanoGPT into ${NANOGPT_DIR}…"
  rm -rf "${NANOGPT_DIR}"
  git clone https://github.com/karpathy/nanoGPT.git "${NANOGPT_DIR}"
 fi
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
@ -0,0 +1,20 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_venv
 require_curl
 cd_nanogpt
 # Prefer a github.com URL (some networks block raw.githubusercontent.com).
 # This writes the input file where nanoGPT's prepare script expects it.
 INPUT_TXT="data/shakespeare_char/input.txt"
 if [[ ! -f "${INPUT_TXT}" ]]; then
 	mkdir -p "$(dirname "${INPUT_TXT}")"
 	echo "Downloading tiny Shakespeare to ${INPUT_TXT}" >&2
 	curl -fL --retry 3 --retry-delay 1 \
 		-o "${INPUT_TXT}" \
 		"https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt"
 fi
 "${VENV_PY}" data/shakespeare_char/prepare.py
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
@ -0,0 +1,16 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_venv
 cd_nanogpt
 "${VENV_PY}" train.py config/train_shakespeare_char.py \
  --device=cpu --compile=False --dtype=float32 \
  --eval_interval=10 --eval_iters=10 --log_interval=10 \
  --block_size=64 --batch_size=12 \
  --n_layer=4 --n_head=4 --n_embd=128 \
  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
  --always_save_checkpoint=True
 echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
@ -0,0 +1,24 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_venv
 cd_nanogpt
 # Requires torch with MPS support (Apple Silicon). If MPS isn't available,
 # fall back to CPU.
 DEVICE="mps"
 if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
  echo "MPS not available; falling back to CPU" >&2
  DEVICE="cpu"
 fi
 "${VENV_PY}" train.py config/train_shakespeare_char.py \
  --device="${DEVICE}" --compile=False --dtype=float32 \
  --eval_interval=10 --eval_iters=10 --log_interval=10 \
  --block_size=64 --batch_size=12 \
  --n_layer=4 --n_head=4 --n_embd=128 \
  --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
  --always_save_checkpoint=True
 echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
@ -0,0 +1,14 @@
 #!/usr/bin/env bash
 set -euo pipefail
 source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
 require_venv
 cd_nanogpt
 if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
  echo "ERROR: ckpt.pt not found. Run training first:" >&2
  echo "  bash oss_llm/testNanoGPT/30_train_cpu_quick.sh" >&2
  exit 1
 fi
 "${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
@ -0,0 +1,32 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # One-shot validation that prefers Apple Silicon MPS.
 bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
 bash oss_llm/testNanoGPT/00_setup_env.sh
 bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
 # Train with MPS if available (script falls back to CPU)
 bash oss_llm/testNanoGPT/31_train_mps_quick.sh
 # Sample with MPS if available, else CPU
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 # shellcheck source=./_common.sh
 source "${SCRIPT_DIR}/_common.sh"
 require_venv
 cd_nanogpt
 DEVICE="mps"
 if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
  DEVICE="cpu"
 fi
 if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
  echo "ERROR: ckpt.pt not found after training" >&2
  exit 1
 fi
 "${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device="${DEVICE}" --dtype=float32 --max_new_tokens=200
 printf '\nOK: nanoGPT MPS smoke test completed (device=%s).\n' "${DEVICE}"
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
@ -0,0 +1,11 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # One-shot end-to-end validation.
 bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
 bash oss_llm/testNanoGPT/00_setup_env.sh
 bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
 bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
 bash oss_llm/testNanoGPT/40_sample_cpu.sh
 echo "\nOK: nanoGPT smoke test completed."
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
@ -0,0 +1,105 @@
 # nanoGPT: use & test (local)
 This folder contains runnable scripts to validate **nanoGPT** end-to-end in this workspace.
 ## Prereqs
 - `git`, `python3`, and `curl` are available.
 - GitHub is reachable (for cloning nanoGPT and downloading the tiny Shakespeare text).
 ## What the scripts do
 - `00_setup_env.sh`: installs Python deps into the workspace venv.
 - `10_clone_nanogpt.sh`: clones `karpathy/nanoGPT` into `oss_llm/nanoGPT` (or updates it).
 - `20_prepare_shakespeare.sh`: downloads/prepares the tiny Shakespeare dataset.
 - `30_train_cpu_quick.sh`: quick CPU training run (writes `out-shakespeare-char/ckpt.pt`).
 - `31_train_mps_quick.sh`: quick MPS (Metal) training run (faster on Apple Silicon).
 - `40_sample_cpu.sh`: samples from the trained checkpoint.
 - `98_smoke_test_mps.sh`: runs clone → deps → prepare → train (MPS) → sample (MPS).
 - `99_smoke_test_all.sh`: runs clone → deps → prepare → train → sample.
 ## What nanoGPT demonstrates (current capabilities)
 With the default tiny Shakespeare **character-level** example, the checkpoint you train here supports:
 - **Unconditional generation**: start from a newline and generate Shakespeare-ish character patterns.
 - **Prompted continuation**: provide a short prompt (e.g. a phrase) and generate a continuation.
 - **Sampling controls**:
  - `--temperature` controls randomness (lower is more deterministic).
  - `--top_k` clamps sampling to the top-K next-token candidates (lower is more conservative).
 Reality check: with the default quick config (small model, short training), output will often look like _Shakespeare-shaped gibberish_. That’s expected; the goal is validating the end-to-end training + sampling workflow.
 ### Reproduce the demo generations
 From the workspace root:
 1. Ensure you have a checkpoint (CPU):
 ```sh
 bash oss_llm/testNanoGPT/99_smoke_test_all.sh
 ```
 2. Unconditional samples (2 short samples):
 ```sh
 cd oss_llm/nanoGPT
 ./../.venv/bin/python sample.py \
 	--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
 	--num_samples=2 --max_new_tokens=220 --temperature=0.8 --top_k=200
 ```
 3. Prompted continuation (more conservative sampling):
 ```sh
 cd oss_llm/nanoGPT
 ./../.venv/bin/python sample.py \
 	--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
 	--num_samples=1 --max_new_tokens=220 --temperature=0.4 --top_k=50 \
 	--start="To be, or not to be"
 ```
 ## One-command smoke test
 From the workspace root:
 ```sh
 bash oss_llm/testNanoGPT/99_smoke_test_all.sh
 ```
 ## One-command MPS smoke test
 From the workspace root:
 ```sh
 bash oss_llm/testNanoGPT/98_smoke_test_mps.sh
 ```
 ## Common commands
 - Install deps only:
 ```sh
 bash oss_llm/testNanoGPT/00_setup_env.sh
 ```
 - Quick CPU train + sample:
 ```sh
 bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
 bash oss_llm/testNanoGPT/40_sample_cpu.sh
 ```
 - Quick MPS train + sample:
 ```sh
 bash oss_llm/testNanoGPT/31_train_mps_quick.sh
 bash oss_llm/testNanoGPT/40_sample_cpu.sh  # sampling can still run on CPU
 ```
 ## Notes
 - The CPU/MPS training scripts intentionally use a tiny model + small iteration count to finish quickly.
 - If `ckpt.pt` is missing, it usually means training didn’t run an eval step; these scripts set `--eval_interval` low to force checkpoint writes.
 - Scripts will auto-create `./.venv` on first run if it does not exist.
 - Dataset download is via a `github.com/.../raw/...` URL to avoid reliance on `raw.githubusercontent.com`.
--- a/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
+++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
@ -0,0 +1,61 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # Resolve workspace root (this repo) regardless of where the script is called from.
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 WORKSPACE_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
 # Some managed macOS environments set TMPDIR to slow/unusual locations.
 # Force a local temp directory for predictable behavior.
 export TMPDIR="/tmp"
 VENV_PY="${WORKSPACE_ROOT}/.venv/bin/python"
 NANOGPT_DIR="${WORKSPACE_ROOT}/oss_llm/nanoGPT"
 require_python3() {
  if ! command -v python3 >/dev/null 2>&1; then
    echo "ERROR: python3 not found in PATH" >&2
    exit 1
  fi
 }
 require_curl() {
  if ! command -v curl >/dev/null 2>&1; then
    echo "ERROR: curl not found in PATH" >&2
    exit 1
  fi
 }
 ensure_venv() {
  if [[ -x "${VENV_PY}" ]]; then
    return 0
  fi
  require_python3
  echo "Creating workspace venv at: ${WORKSPACE_ROOT}/.venv" >&2
  python3 -m venv "${WORKSPACE_ROOT}/.venv"
 }
 require_venv() {
  ensure_venv
  if [[ ! -x "${VENV_PY}" ]]; then
    echo "ERROR: Failed to create venv python at: ${VENV_PY}" >&2
    exit 1
  fi
 }
 require_git() {
  if ! command -v git >/dev/null 2>&1; then
    echo "ERROR: git not found in PATH" >&2
    exit 1
  fi
 }
 cd_nanogpt() {
  if [[ ! -d "${NANOGPT_DIR}" ]]; then
    echo "ERROR: nanoGPT repo not found at ${NANOGPT_DIR}" >&2
    echo "Run: bash oss_llm/testNanoGPT/10_clone_nanogpt.sh" >&2
    exit 1
  fi
  cd "${NANOGPT_DIR}"
 }