diff --git a/__LOCAL_LLMs/oss_llm/README.md b/__LOCAL_LLMs/oss_llm/README.md new file mode 100644 index 00000000..00fa11ef --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/README.md @@ -0,0 +1,383 @@ +# Kimi “2.5” local deployment on macOS (what’s реально possible) + +## Executive summary + +- **Running the real “Kimi 2.5 / Kimi K2-class” model fully locally on a Mac is not practical**: the official open-weight Kimi K2 deployment guidance targets **multi-node NVIDIA GPU clusters** (vLLM/SGLang/TensorRT-LLM) and assumes CUDA. +- **What _is_ practical on a Mac:** use **Kimi Code CLI** (official) or the **Moonshot API** (official). That’s not “local inference” (weights on your laptop), but it is “local usage” (client runs on your laptop). +- **VPN / proxy:** yes, clients can usually work through VPN/proxy, but you must be able to reach the provider’s API endpoints. If your network blocks access, you’ll need allowlisting or a different route. + +## What GitHub shows (official sources) + +### 1) Kimi K2 (open-weight model series) + +- Official repo: https://github.com/MoonshotAI/Kimi-K2 +- The repo includes a deployment guide at `docs/deploy_guidance.md`. +- Key reality check from the guide: **the smallest FP8 128k-seqlen deployment is described as ~16 GPUs (H200/H20-class)** for mainstream setups (vLLM/SGLang). This is fundamentally not macOS-laptop friendly. + +### 2) Kimi Code CLI (best option for macOS) + +- Official repo: https://github.com/MoonshotAI/kimi-cli +- Docs: https://moonshotai.github.io/kimi-cli/en/guides/getting-started.html +- Kimi Code CLI is an **agent client** (terminal tool) that talks to a remote provider. +- First-run authentication options: + - `/login` (browser login; auto-configures models) + - `/setup` (API key flow) + +## Can we locally deploy “Kimi 2.5” on this Mac? + +### Practically: no (for K2 / K2-class) + +- **macOS has no CUDA**, so GPU-first inference stacks referenced for K2 deployment (vLLM / TensorRT-LLM) are not an option. +- Even if you tried CPU inference, the K2-class model scale (MoE, 1T total params / 32B active) is far beyond what is reasonable on a laptop. + +### What’s feasible instead + +1. **Use Kimi Code CLI on macOS** (recommended) +2. **Use Moonshot/Kimi APIs from your own scripts** (Python/Node/etc) once your network allows access +3. If you truly need **local weights on Mac**, use a Mac-friendly model/runtime (e.g., MLX/Ollama) — but that would be **a different model**, not Kimi 2.5/K2. + +## Options: smaller, Mac-runnable open models (local inference) + +If your goal is “no network calls at runtime”, pick a **local runtime** + a **small-enough model + quantization**. + +## If you can only access GitHub + +If your enterprise network only allows `github.com`, that severely limits how you obtain model weights because most model hosting is **not** on GitHub. + +What still works: + +- **GitHub Releases assets**: some projects publish quantized model files (often `.gguf`) as release assets. +- **Git LFS inside a repo**: occasionally weights are stored in-repo via LFS (still uncommon for large models). + +What usually won’t work: + +- Downloading weights from external model registries / hosting sites (blocked by policy). + +Reality check: GitHub has practical size limits, so **very large models are rarely hosted there**. Your best bet is to use **small models (7B–14B) in quantized form**. + +### How to find downloadable models on GitHub + +- Use [oss_llm/find_models_on_github.py](oss_llm/find_models_on_github.py) to search GitHub repos and list any release assets that look like model files. +- Prefer assets ending in `.gguf` if you plan to run with `llama.cpp`. + +Example: + +```sh +python3 oss_llm/find_models_on_github.py --query "gguf qwen 2.5" --limit 20 +``` + +### Recommended enterprise pattern + +If you need a specific model but can’t download it directly: + +1. Ask security for an **approved internal mirror** / artifact store. +2. Mirror the model files there. +3. Point your local runtime (Ollama/llama.cpp/MLX) at the internal location. + +### “No security review” learning path (best effort): train a tiny model yourself + +If you want to avoid downloading any third-party model weights at all, the cleanest option is to: + +- clone training code from GitHub +- train a small model on a local/public-domain text file + +This won’t produce a state-of-the-art assistant, but it’s excellent for learning tokenization, training loops, sampling, and basic eval. + +One popular repo for this is **karpathy/nanoGPT**. + +High-level steps: + +```sh +git clone https://github.com/karpathy/nanoGPT.git +cd nanoGPT + +# create a python env (choose your preferred method) +python3 -m venv .venv && source .venv/bin/activate +pip install -r requirements.txt + +# run the built-in Shakespeare example (it downloads a small text file) +python data/shakespeare_char/prepare.py +python train.py data/shakespeare_char --device=cpu --compile=False + +# sample +python sample.py --out_dir=out-shakespeare-char --device=cpu --compile=False +``` + +If even small downloads are restricted, replace the dataset step with your own local text file and adjust the dataset script accordingly. + +### nanoGPT demo (this repo) + +This workspace includes a reproducible nanoGPT workflow (CPU and Apple Silicon MPS) plus sampling demo commands: + +- See [oss_llm/testNanoGPT/README.md](oss_llm/testNanoGPT/README.md) + +### If you do download GGUF weights from GitHub + +It can work (some repos commit a `.gguf` directly), but treat it like any third-party binary: + +- Prefer repos with **clear licensing** (repo license + explicit model license/provenance) +- Prefer “original publisher” repos over re-uploads +- Keep models small (e.g., ~100M–3B) for macOS learning + +### Recommended runtimes (macOS) + +- **Ollama**: easiest “download + chat + local HTTP API” experience. +- **LM Studio**: easy GUI; also exposes a local API server. +- **llama.cpp**: most portable; great for CPU/Metal, quantized GGUF models. +- **MLX** (Apple): best when you want Python-native workflows on Apple Silicon. + +### Model size guidance (rule of thumb) + +Quantized models are what make laptops viable. + +- **8–10B @ 4-bit**: typically comfortable on 16GB unified memory. +- **14B @ 4-bit**: better with 24–32GB unified memory. +- **30B+**: usually needs 64GB+ and will still be slow. + +### Good “starter” model families (pick one) + +These are widely supported by the runtimes above and have strong general utility: + +- **Llama 3.x (8B class)**: strong general chat + coding for the size. +- **Qwen 2.5 (7B/14B class)**: strong multilingual + coding. +- **Mistral 7B class**: fast and solid baseline. +- **Gemma 2 (9B class)**: good general-purpose quality. +- **Phi-3.x (mini/small class)**: very fast and lightweight. + +### Suggested picks by Mac memory + +- **16GB unified memory**: start with an 8–9B model at 4-bit. +- **32GB unified memory**: 14B at 4-bit is a good sweet spot. +- **64GB unified memory**: 27–34B at 4-bit becomes feasible (still slower). + +### Practical setup: Ollama (quickest) + +1. Install runtime (choose the install method your enterprise allows; Homebrew is common): + +```sh +brew install ollama +``` + +2. Start the local service: + +```sh +ollama serve +``` + +3. Pull/run a model (example placeholder): + +```sh +ollama run +``` + +4. Use the local API (optional): + +```sh +curl http://127.0.0.1:11434/api/tags +``` + +### Practical setup: llama.cpp (most controllable) + +If you can obtain a quantized GGUF model file via an approved internal mirror: + +```sh +brew install llama.cpp +llama-cli -m /path/to/model.gguf -p "Hello" -n 256 +``` + +### Practical setup: MLX (Python-centric on Apple Silicon) + +If your environment allows Python packages and you have an MLX-converted model available internally: + +```sh +python3 -m venv .venv && source .venv/bin/activate +pip install mlx mlx-lm +python -m mlx_lm.generate --model /path/to/mlx-model --prompt "Hello" --max-tokens 256 +``` + +### Enterprise note (important) + +Because model hosting sites may be blocked in your network category, the usual pattern in enterprise is: + +1. Security-approved model list +2. Internal artifact store / mirror for model files +3. Local runtime (Ollama/llama.cpp/MLX) pointing to those internally hosted artifacts + +## Steps (macOS): run Kimi Code CLI locally (client-side) + +Source: Kimi CLI docs. + +1. Install + +```sh +# Install via uv (Python package manager) +uv tool install --python 3.13 kimi-cli +``` + +2. Verify + +```sh +kimi --version +``` + +3. Start in a project directory + +```sh +cd /Users/sd9235/code/mygh/learning_ai_2nd_brain +kimi +``` + +4. Authenticate + +- Preferred: + - Run `/login` inside the CLI and complete the browser auth. +- Alternative: + - Run `/setup` and choose an API platform + API key + model. + +5. If models don’t show up + +- Kimi CLI FAQ: verify network access to your configured provider’s API endpoints. + +## Steps (NOT macOS): deploy Kimi K2 weights (server-side) + +If you have access to Linux + NVIDIA GPUs, use the official K2 deployment guide: + +- vLLM (requires CUDA; the guide notes vLLM v0.10.0rc1+) +- SGLang +- TensorRT-LLM + +This is the realistic path if you truly need “local” (self-hosted) K2/K2-class inference: **run the model on a GPU box/cluster** and call it from your Mac. + +## VPN / proxy: are we able to access through it? + +### What you need + +You generally need outbound access (through your VPN/proxy) to at least: + +- your chosen provider’s API host (varies by provider) +- and, if downloading open weights, the model hosting site you plan to use. + +### Quick connectivity checks + +```sh +# DNS + HTTPS reachability +curl -I https:// + +# If you plan to download weights later +curl -I https:// +``` + +### Configure proxy in a shell (typical) + +```sh +export HTTP_PROXY="http://127.0.0.1:7890" +export HTTPS_PROXY="http://127.0.0.1:7890" +export ALL_PROXY="socks5://127.0.0.1:7890" +export NO_PROXY="localhost,127.0.0.1" +``` + +### Configure proxy for Git + +```sh +git config --global http.proxy "$HTTPS_PROXY" +git config --global https.proxy "$HTTPS_PROXY" +``` + +### Configure proxy for Python/pip + +```sh +pip config set global.proxy "$HTTPS_PROXY" +``` + +### Notes about your current network + +From within this VS Code environment, requests to some provider/model-hosting sites were redirected to a corporate/web-filter “blockpage” URL. If you see that on your Mac too, you’ll need one of: + +- VPN that routes around the filter +- proxy that’s allowed +- allowlist/exception for those domains + +## Recommendation + +- If your goal is to _use_ Kimi on this Mac: **install Kimi Code CLI** and make sure your VPN/proxy allows access to your configured provider. +- If your goal is “true local inference”: **host Kimi K2 on a CUDA GPU server** (or use a smaller Mac-native model instead). + +## Your Mac (detected) + +- macOS: 15.7.3 (24G419) +- CPU arch: arm64 +- Machine: MacBook Pro (Mac16,7) +- Chip: Apple M4 Pro (14 cores) +- Memory: 48 GB +- Python: 3.13.10 + +## Will nanoGPT work on this laptop? + +Yes for learning, with a couple of caveats. + +What will work well: + +- **Small CPU/MPS runs** (toy datasets like Shakespeare, short experiments, sampling). +- With 48GB RAM and Apple Silicon, you have plenty of headroom for nanoGPT-style demos. + +What might block you: + +- Installing dependencies (notably PyTorch) typically requires access to package indexes that may be blocked in your network. + - In this workspace, PyTorch was successfully installed into the venv and `mps_available` is `True`. + - If you can’t reach package indexes in a different environment, use an internal Python package mirror, or install from a pre-approved wheelhouse. + +Practical suggestion: + +- Start with CPU (`--device=cpu`) to keep it simple. +- If your PyTorch build supports Apple Metal (MPS), you can later try `--device=mps` for speed. + +Quick verification (after installing torch): + +```sh +python3 -c "import torch; print(torch.__version__); print('mps', torch.backends.mps.is_available())" +``` + +## nanoGPT: validated end-to-end in this workspace + +This is a minimal, fast run that was verified on this machine. + +### 1) Install deps (workspace venv) + +```sh +# from the workspace root +/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python -m pip install torch numpy transformers datasets tiktoken wandb tqdm +``` + +### 2) Clone nanoGPT + +```sh +git clone https://github.com/karpathy/nanoGPT.git oss_llm/nanoGPT +``` + +### 3) Prepare dataset + +```sh +cd oss_llm/nanoGPT +/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python data/shakespeare_char/prepare.py +``` + +### 4) Short CPU training (writes `out-shakespeare-char/ckpt.pt`) + +```sh +/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python train.py config/train_shakespeare_char.py \ + --device=cpu --compile=False \ + --eval_interval=10 --eval_iters=10 --log_interval=10 \ + --block_size=64 --batch_size=12 \ + --n_layer=4 --n_head=4 --n_embd=128 \ + --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \ + --always_save_checkpoint=True +``` + +### 5) Sample + +```sh +/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python sample.py \ + --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200 +``` + +Tip: for speed on Apple Silicon, try `--device=mps` once you’re comfortable. diff --git a/__LOCAL_LLMs/oss_llm/find_models_on_github.py b/__LOCAL_LLMs/oss_llm/find_models_on_github.py new file mode 100644 index 00000000..036d9294 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/find_models_on_github.py @@ -0,0 +1,251 @@ +#!/usr/bin/env python3 +"""Find model-like files hosted on GitHub. + +This script ONLY talks to GitHub (api.github.com) and is designed for +restricted enterprise networks where only GitHub is reachable. + +It searches repositories by keyword, then inspects recent releases to find +assets that look like model files (e.g., .gguf, .safetensors, .bin, .zip). + +Usage: + python oss_llm/find_models_on_github.py --query "qwen2.5 gguf" --limit 10 + +Optional: + - Set GITHUB_TOKEN to increase API rate limits. + +Notes: + - GitHub rarely hosts very large model weights due to size constraints. + - Prefer smaller models and quantized artifacts. +""" + +import argparse +import os +import sys +import textwrap +import urllib.parse +import json +import subprocess + +MODEL_EXTENSIONS = ( + ".gguf", + ".safetensors", + ".bin", + ".pt", + ".pth", + ".onnx", + ".zip", + ".tar.gz", + ".tgz", +) + + +def _http_get_json(url: str) -> object: + token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") + + cmd = [ + "curl", + "-L", + "--fail", + "--silent", + "--show-error", + "-H", + "Accept: application/vnd.github+json", + ] + if token: + cmd += ["-H", f"Authorization: Bearer {token}"] + cmd.append(url) + + proc = subprocess.run(cmd, capture_output=True, text=True) + if proc.returncode != 0: + raise RuntimeError(proc.stderr.strip() or f"curl failed with exit code {proc.returncode}") + return json.loads(proc.stdout) + + +def _looks_like_model_asset(name: str) -> bool: + lower = name.lower() + return any(lower.endswith(ext) for ext in MODEL_EXTENSIONS) + + +def _safe_int(value: object) -> int | None: + try: + return int(value) # type: ignore[arg-type] + except Exception: + return None + + +def _scan_repo_tree_for_models(full_name: str, default_branch: str) -> list[str]: + """Return matching file paths from the repo tree. + + This is useful when model files are stored in-repo (often via Git LFS), + and therefore don't show up as release assets. + """ + + # The Trees API can be large; callers should keep repo counts low. + url = f"https://api.github.com/repos/{full_name}/git/trees/{urllib.parse.quote(default_branch)}?recursive=1" + data = _http_get_json(url) + if not isinstance(data, dict): + return [] + tree = data.get("tree") + if not isinstance(tree, list): + return [] + + hits: list[str] = [] + for node in tree: + if not isinstance(node, dict): + continue + path = node.get("path") + ntype = node.get("type") + if ntype != "blob" or not isinstance(path, str): + continue + if _looks_like_model_asset(path): + hits.append(path) + return hits + + +def _human_size(num_bytes: int) -> str: + n = float(num_bytes) + for unit in ("B", "KB", "MB", "GB", "TB"): + if n < 1024.0: + return f"{n:.1f}{unit}" + n /= 1024.0 + return f"{n:.1f}PB" + + +def main() -> int: + p = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Search GitHub releases for model-like assets.", + epilog=textwrap.dedent( + """ + Examples: + python oss_llm/find_models_on_github.py --query "llama gguf" --limit 20 + python oss_llm/find_models_on_github.py --query "qwen 2.5 gguf" --limit 10 + + Tips: + - Add qualifiers to narrow results, e.g. `language:python`, `in:name`, `topic:llama-cpp`. + - Set GITHUB_TOKEN to avoid low unauthenticated rate limits. + """ + ), + ) + p.add_argument("--query", required=True, help="GitHub search query") + p.add_argument("--limit", type=int, default=10, help="Max repositories to inspect") + p.add_argument( + "--per-page", + type=int, + default=10, + help="Repos per page from search API (max 100)", + ) + p.add_argument( + "--max-releases", + type=int, + default=5, + help="Max recent releases per repo to inspect", + ) + p.add_argument( + "--scan-tree", + action="store_true", + help="Also scan repository file trees for model-like files (slower, more API calls)", + ) + args = p.parse_args() + + query = args.query.strip() + if not query: + print("--query must be non-empty", file=sys.stderr) + return 2 + + per_page = max(1, min(100, args.per_page)) + limit = max(1, args.limit) + + search_q = urllib.parse.quote(query) + search_url = f"https://api.github.com/search/repositories?q={search_q}&per_page={per_page}" + + try: + search = _http_get_json(search_url) + except Exception as e: + print(f"GitHub search failed: {e}", file=sys.stderr) + return 1 + + items = search.get("items") if isinstance(search, dict) else None + if not items: + print("No repositories found.") + return 0 + + inspected = 0 + found_any = False + + for repo in items: + if inspected >= limit: + break + + full_name = repo.get("full_name") + html_url = repo.get("html_url") + default_branch = repo.get("default_branch") + if not full_name or not html_url: + continue + + inspected += 1 + print(f"\n== {full_name} ==") + print(html_url) + + releases_url = f"https://api.github.com/repos/{full_name}/releases?per_page={args.max_releases}" + try: + releases = _http_get_json(releases_url) + except Exception as e: + print(f" releases: error: {e}") + continue + + if not isinstance(releases, list) or len(releases) == 0: + print(" releases: none") + releases = [] + + repo_hit = False + for rel in releases[: args.max_releases]: + tag = rel.get("tag_name") or "(no tag)" + name = rel.get("name") or "(no name)" + assets = rel.get("assets") or [] + model_assets = [a for a in assets if _looks_like_model_asset(a.get("name", ""))] + + if not model_assets: + continue + + repo_hit = True + found_any = True + print(f" release: {tag} — {name}") + for a in model_assets: + aname = a.get("name") or "(no name)" + size = a.get("size") + url = a.get("browser_download_url") or "" + size_str = _human_size(int(size)) if isinstance(size, int) else "?" + print(f" - {aname} ({size_str})") + if url: + print(f" {url}") + + if releases and not repo_hit: + print(" releases: present, but no model-like assets found") + + if args.scan_tree and isinstance(default_branch, str) and default_branch: + try: + paths = _scan_repo_tree_for_models(full_name, default_branch) + except Exception as e: + print(f" repo files: error: {e}") + continue + + if paths: + found_any = True + print(f" repo files: found {len(paths)} model-like paths") + for pth in paths[:30]: + print(f" - {pth}") + if len(paths) > 30: + print(" - (more omitted)") + else: + print(" repo files: no model-like paths found") + + if not found_any: + print("\nNo model-like release assets found in inspected repositories.") + print("Try refining your query (add 'gguf', model family name, or 'quant').") + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/__LOCAL_LLMs/oss_llm/quick_checks.sh b/__LOCAL_LLMs/oss_llm/quick_checks.sh new file mode 100644 index 00000000..a93739e8 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/quick_checks.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash +set -euo pipefail + +echo "== Network reachability ==" + +: "${CHECK_URLS:=}" + +if [[ -z "${CHECK_URLS}" ]]; then + cat <<'TXT' +No URLs configured. + +Set CHECK_URLS to a space-separated list of HTTPS URLs you want to test, e.g.: + CHECK_URLS="https://github.com https://example.com" ./oss_llm/quick_checks.sh + +This script intentionally does not hardcode any provider endpoints. +TXT +else + for url in ${CHECK_URLS}; do + echo "\n--- $url" + curl -I --max-time 10 "$url" | head -n 5 || true + done +fi + +echo "\n== Proxy env vars (current) ==" +env | grep -E '^(HTTP|HTTPS|ALL|NO)_PROXY=' || echo "(none set)" diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh new file mode 100755 index 00000000..345a9701 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_venv + +# nanoGPT README lists these deps. +"${VENV_PY}" -m pip install --upgrade pip +"${VENV_PY}" -m pip install torch numpy transformers datasets tiktoken wandb tqdm + +# Quick sanity check +"${VENV_PY}" -c "import torch; print('torch', torch.__version__); print('mps', torch.backends.mps.is_available())" diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh new file mode 100755 index 00000000..51bc1641 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_git + +mkdir -p "${WORKSPACE_ROOT}/oss_llm" + +if [[ -d "${NANOGPT_DIR}/.git" ]]; then + echo "nanoGPT already cloned; updating…" + cd "${NANOGPT_DIR}" + git pull --ff-only +else + echo "Cloning nanoGPT into ${NANOGPT_DIR}…" + rm -rf "${NANOGPT_DIR}" + git clone https://github.com/karpathy/nanoGPT.git "${NANOGPT_DIR}" +fi diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh new file mode 100755 index 00000000..d0496d69 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_venv +require_curl +cd_nanogpt + +# Prefer a github.com URL (some networks block raw.githubusercontent.com). +# This writes the input file where nanoGPT's prepare script expects it. +INPUT_TXT="data/shakespeare_char/input.txt" +if [[ ! -f "${INPUT_TXT}" ]]; then + mkdir -p "$(dirname "${INPUT_TXT}")" + echo "Downloading tiny Shakespeare to ${INPUT_TXT}" >&2 + curl -fL --retry 3 --retry-delay 1 \ + -o "${INPUT_TXT}" \ + "https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt" +fi + +"${VENV_PY}" data/shakespeare_char/prepare.py diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh new file mode 100755 index 00000000..8ed000e0 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_venv +cd_nanogpt + +"${VENV_PY}" train.py config/train_shakespeare_char.py \ + --device=cpu --compile=False --dtype=float32 \ + --eval_interval=10 --eval_iters=10 --log_interval=10 \ + --block_size=64 --batch_size=12 \ + --n_layer=4 --n_head=4 --n_embd=128 \ + --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \ + --always_save_checkpoint=True + +echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt" diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh new file mode 100755 index 00000000..db6dde76 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_venv +cd_nanogpt + +# Requires torch with MPS support (Apple Silicon). If MPS isn't available, +# fall back to CPU. +DEVICE="mps" +if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then + echo "MPS not available; falling back to CPU" >&2 + DEVICE="cpu" +fi + +"${VENV_PY}" train.py config/train_shakespeare_char.py \ + --device="${DEVICE}" --compile=False --dtype=float32 \ + --eval_interval=10 --eval_iters=10 --log_interval=10 \ + --block_size=64 --batch_size=12 \ + --n_layer=4 --n_head=4 --n_embd=128 \ + --max_iters=60 --lr_decay_iters=60 --dropout=0.0 \ + --always_save_checkpoint=True + +echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt" diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh new file mode 100755 index 00000000..6c405417 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash +set -euo pipefail + +source "$(cd "$(dirname "$0")" && pwd)/_common.sh" +require_venv +cd_nanogpt + +if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then + echo "ERROR: ckpt.pt not found. Run training first:" >&2 + echo " bash oss_llm/testNanoGPT/30_train_cpu_quick.sh" >&2 + exit 1 +fi + +"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200 diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh new file mode 100755 index 00000000..434ecee1 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +set -euo pipefail + +# One-shot validation that prefers Apple Silicon MPS. + +bash oss_llm/testNanoGPT/10_clone_nanogpt.sh +bash oss_llm/testNanoGPT/00_setup_env.sh +bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh + +# Train with MPS if available (script falls back to CPU) +bash oss_llm/testNanoGPT/31_train_mps_quick.sh + +# Sample with MPS if available, else CPU +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +# shellcheck source=./_common.sh +source "${SCRIPT_DIR}/_common.sh" +require_venv +cd_nanogpt + +DEVICE="mps" +if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then + DEVICE="cpu" +fi + +if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then + echo "ERROR: ckpt.pt not found after training" >&2 + exit 1 +fi + +"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device="${DEVICE}" --dtype=float32 --max_new_tokens=200 + +printf '\nOK: nanoGPT MPS smoke test completed (device=%s).\n' "${DEVICE}" \ No newline at end of file diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh new file mode 100755 index 00000000..f1c56178 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh @@ -0,0 +1,11 @@ +#!/usr/bin/env bash +set -euo pipefail + +# One-shot end-to-end validation. +bash oss_llm/testNanoGPT/10_clone_nanogpt.sh +bash oss_llm/testNanoGPT/00_setup_env.sh +bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh +bash oss_llm/testNanoGPT/30_train_cpu_quick.sh +bash oss_llm/testNanoGPT/40_sample_cpu.sh + +echo "\nOK: nanoGPT smoke test completed." diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md b/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md new file mode 100644 index 00000000..81461c5c --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/README.md @@ -0,0 +1,105 @@ +# nanoGPT: use & test (local) + +This folder contains runnable scripts to validate **nanoGPT** end-to-end in this workspace. + +## Prereqs + +- `git`, `python3`, and `curl` are available. +- GitHub is reachable (for cloning nanoGPT and downloading the tiny Shakespeare text). + +## What the scripts do + +- `00_setup_env.sh`: installs Python deps into the workspace venv. +- `10_clone_nanogpt.sh`: clones `karpathy/nanoGPT` into `oss_llm/nanoGPT` (or updates it). +- `20_prepare_shakespeare.sh`: downloads/prepares the tiny Shakespeare dataset. +- `30_train_cpu_quick.sh`: quick CPU training run (writes `out-shakespeare-char/ckpt.pt`). +- `31_train_mps_quick.sh`: quick MPS (Metal) training run (faster on Apple Silicon). +- `40_sample_cpu.sh`: samples from the trained checkpoint. +- `98_smoke_test_mps.sh`: runs clone → deps → prepare → train (MPS) → sample (MPS). +- `99_smoke_test_all.sh`: runs clone → deps → prepare → train → sample. + +## What nanoGPT demonstrates (current capabilities) + +With the default tiny Shakespeare **character-level** example, the checkpoint you train here supports: + +- **Unconditional generation**: start from a newline and generate Shakespeare-ish character patterns. +- **Prompted continuation**: provide a short prompt (e.g. a phrase) and generate a continuation. +- **Sampling controls**: + - `--temperature` controls randomness (lower is more deterministic). + - `--top_k` clamps sampling to the top-K next-token candidates (lower is more conservative). + +Reality check: with the default quick config (small model, short training), output will often look like _Shakespeare-shaped gibberish_. That’s expected; the goal is validating the end-to-end training + sampling workflow. + +### Reproduce the demo generations + +From the workspace root: + +1. Ensure you have a checkpoint (CPU): + +```sh +bash oss_llm/testNanoGPT/99_smoke_test_all.sh +``` + +2. Unconditional samples (2 short samples): + +```sh +cd oss_llm/nanoGPT +./../.venv/bin/python sample.py \ + --out_dir=out-shakespeare-char --device=cpu --dtype=float32 \ + --num_samples=2 --max_new_tokens=220 --temperature=0.8 --top_k=200 +``` + +3. Prompted continuation (more conservative sampling): + +```sh +cd oss_llm/nanoGPT +./../.venv/bin/python sample.py \ + --out_dir=out-shakespeare-char --device=cpu --dtype=float32 \ + --num_samples=1 --max_new_tokens=220 --temperature=0.4 --top_k=50 \ + --start="To be, or not to be" +``` + +## One-command smoke test + +From the workspace root: + +```sh +bash oss_llm/testNanoGPT/99_smoke_test_all.sh +``` + +## One-command MPS smoke test + +From the workspace root: + +```sh +bash oss_llm/testNanoGPT/98_smoke_test_mps.sh +``` + +## Common commands + +- Install deps only: + +```sh +bash oss_llm/testNanoGPT/00_setup_env.sh +``` + +- Quick CPU train + sample: + +```sh +bash oss_llm/testNanoGPT/30_train_cpu_quick.sh +bash oss_llm/testNanoGPT/40_sample_cpu.sh +``` + +- Quick MPS train + sample: + +```sh +bash oss_llm/testNanoGPT/31_train_mps_quick.sh +bash oss_llm/testNanoGPT/40_sample_cpu.sh # sampling can still run on CPU +``` + +## Notes + +- The CPU/MPS training scripts intentionally use a tiny model + small iteration count to finish quickly. +- If `ckpt.pt` is missing, it usually means training didn’t run an eval step; these scripts set `--eval_interval` low to force checkpoint writes. +- Scripts will auto-create `./.venv` on first run if it does not exist. +- Dataset download is via a `github.com/.../raw/...` URL to avoid reliance on `raw.githubusercontent.com`. diff --git a/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh b/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh new file mode 100755 index 00000000..ed93c0a1 --- /dev/null +++ b/__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh @@ -0,0 +1,61 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Resolve workspace root (this repo) regardless of where the script is called from. +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +WORKSPACE_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" + +# Some managed macOS environments set TMPDIR to slow/unusual locations. +# Force a local temp directory for predictable behavior. +export TMPDIR="/tmp" + +VENV_PY="${WORKSPACE_ROOT}/.venv/bin/python" +NANOGPT_DIR="${WORKSPACE_ROOT}/oss_llm/nanoGPT" + +require_python3() { + if ! command -v python3 >/dev/null 2>&1; then + echo "ERROR: python3 not found in PATH" >&2 + exit 1 + fi +} + +require_curl() { + if ! command -v curl >/dev/null 2>&1; then + echo "ERROR: curl not found in PATH" >&2 + exit 1 + fi +} + +ensure_venv() { + if [[ -x "${VENV_PY}" ]]; then + return 0 + fi + + require_python3 + echo "Creating workspace venv at: ${WORKSPACE_ROOT}/.venv" >&2 + python3 -m venv "${WORKSPACE_ROOT}/.venv" +} + +require_venv() { + ensure_venv + if [[ ! -x "${VENV_PY}" ]]; then + echo "ERROR: Failed to create venv python at: ${VENV_PY}" >&2 + exit 1 + fi +} + +require_git() { + if ! command -v git >/dev/null 2>&1; then + echo "ERROR: git not found in PATH" >&2 + exit 1 + fi +} + +cd_nanogpt() { + if [[ ! -d "${NANOGPT_DIR}" ]]; then + echo "ERROR: nanoGPT repo not found at ${NANOGPT_DIR}" >&2 + echo "Run: bash oss_llm/testNanoGPT/10_clone_nanogpt.sh" >&2 + exit 1 + fi + cd "${NANOGPT_DIR}" +}