move oss_llm/ from learning_ai_2nd_brain

This commit is contained in:
saravanakumardb1 2026-02-28 00:03:37 -08:00
parent a32978f9c3
commit 1a4b3c1fb3
13 changed files with 971 additions and 0 deletions

View File

@ -0,0 +1,383 @@
# Kimi “2.5” local deployment on macOS (whats реально possible)
## Executive summary
- **Running the real “Kimi 2.5 / Kimi K2-class” model fully locally on a Mac is not practical**: the official open-weight Kimi K2 deployment guidance targets **multi-node NVIDIA GPU clusters** (vLLM/SGLang/TensorRT-LLM) and assumes CUDA.
- **What _is_ practical on a Mac:** use **Kimi Code CLI** (official) or the **Moonshot API** (official). Thats not “local inference” (weights on your laptop), but it is “local usage” (client runs on your laptop).
- **VPN / proxy:** yes, clients can usually work through VPN/proxy, but you must be able to reach the providers API endpoints. If your network blocks access, youll need allowlisting or a different route.
## What GitHub shows (official sources)
### 1) Kimi K2 (open-weight model series)
- Official repo: https://github.com/MoonshotAI/Kimi-K2
- The repo includes a deployment guide at `docs/deploy_guidance.md`.
- Key reality check from the guide: **the smallest FP8 128k-seqlen deployment is described as ~16 GPUs (H200/H20-class)** for mainstream setups (vLLM/SGLang). This is fundamentally not macOS-laptop friendly.
### 2) Kimi Code CLI (best option for macOS)
- Official repo: https://github.com/MoonshotAI/kimi-cli
- Docs: https://moonshotai.github.io/kimi-cli/en/guides/getting-started.html
- Kimi Code CLI is an **agent client** (terminal tool) that talks to a remote provider.
- First-run authentication options:
- `/login` (browser login; auto-configures models)
- `/setup` (API key flow)
## Can we locally deploy “Kimi 2.5” on this Mac?
### Practically: no (for K2 / K2-class)
- **macOS has no CUDA**, so GPU-first inference stacks referenced for K2 deployment (vLLM / TensorRT-LLM) are not an option.
- Even if you tried CPU inference, the K2-class model scale (MoE, 1T total params / 32B active) is far beyond what is reasonable on a laptop.
### Whats feasible instead
1. **Use Kimi Code CLI on macOS** (recommended)
2. **Use Moonshot/Kimi APIs from your own scripts** (Python/Node/etc) once your network allows access
3. If you truly need **local weights on Mac**, use a Mac-friendly model/runtime (e.g., MLX/Ollama) — but that would be **a different model**, not Kimi 2.5/K2.
## Options: smaller, Mac-runnable open models (local inference)
If your goal is “no network calls at runtime”, pick a **local runtime** + a **small-enough model + quantization**.
## If you can only access GitHub
If your enterprise network only allows `github.com`, that severely limits how you obtain model weights because most model hosting is **not** on GitHub.
What still works:
- **GitHub Releases assets**: some projects publish quantized model files (often `.gguf`) as release assets.
- **Git LFS inside a repo**: occasionally weights are stored in-repo via LFS (still uncommon for large models).
What usually wont work:
- Downloading weights from external model registries / hosting sites (blocked by policy).
Reality check: GitHub has practical size limits, so **very large models are rarely hosted there**. Your best bet is to use **small models (7B14B) in quantized form**.
### How to find downloadable models on GitHub
- Use [oss_llm/find_models_on_github.py](oss_llm/find_models_on_github.py) to search GitHub repos and list any release assets that look like model files.
- Prefer assets ending in `.gguf` if you plan to run with `llama.cpp`.
Example:
```sh
python3 oss_llm/find_models_on_github.py --query "gguf qwen 2.5" --limit 20
```
### Recommended enterprise pattern
If you need a specific model but cant download it directly:
1. Ask security for an **approved internal mirror** / artifact store.
2. Mirror the model files there.
3. Point your local runtime (Ollama/llama.cpp/MLX) at the internal location.
### “No security review” learning path (best effort): train a tiny model yourself
If you want to avoid downloading any third-party model weights at all, the cleanest option is to:
- clone training code from GitHub
- train a small model on a local/public-domain text file
This wont produce a state-of-the-art assistant, but its excellent for learning tokenization, training loops, sampling, and basic eval.
One popular repo for this is **karpathy/nanoGPT**.
High-level steps:
```sh
git clone https://github.com/karpathy/nanoGPT.git
cd nanoGPT
# create a python env (choose your preferred method)
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# run the built-in Shakespeare example (it downloads a small text file)
python data/shakespeare_char/prepare.py
python train.py data/shakespeare_char --device=cpu --compile=False
# sample
python sample.py --out_dir=out-shakespeare-char --device=cpu --compile=False
```
If even small downloads are restricted, replace the dataset step with your own local text file and adjust the dataset script accordingly.
### nanoGPT demo (this repo)
This workspace includes a reproducible nanoGPT workflow (CPU and Apple Silicon MPS) plus sampling demo commands:
- See [oss_llm/testNanoGPT/README.md](oss_llm/testNanoGPT/README.md)
### If you do download GGUF weights from GitHub
It can work (some repos commit a `.gguf` directly), but treat it like any third-party binary:
- Prefer repos with **clear licensing** (repo license + explicit model license/provenance)
- Prefer “original publisher” repos over re-uploads
- Keep models small (e.g., ~100M3B) for macOS learning
### Recommended runtimes (macOS)
- **Ollama**: easiest “download + chat + local HTTP API” experience.
- **LM Studio**: easy GUI; also exposes a local API server.
- **llama.cpp**: most portable; great for CPU/Metal, quantized GGUF models.
- **MLX** (Apple): best when you want Python-native workflows on Apple Silicon.
### Model size guidance (rule of thumb)
Quantized models are what make laptops viable.
- **810B @ 4-bit**: typically comfortable on 16GB unified memory.
- **14B @ 4-bit**: better with 2432GB unified memory.
- **30B+**: usually needs 64GB+ and will still be slow.
### Good “starter” model families (pick one)
These are widely supported by the runtimes above and have strong general utility:
- **Llama 3.x (8B class)**: strong general chat + coding for the size.
- **Qwen 2.5 (7B/14B class)**: strong multilingual + coding.
- **Mistral 7B class**: fast and solid baseline.
- **Gemma 2 (9B class)**: good general-purpose quality.
- **Phi-3.x (mini/small class)**: very fast and lightweight.
### Suggested picks by Mac memory
- **16GB unified memory**: start with an 89B model at 4-bit.
- **32GB unified memory**: 14B at 4-bit is a good sweet spot.
- **64GB unified memory**: 2734B at 4-bit becomes feasible (still slower).
### Practical setup: Ollama (quickest)
1. Install runtime (choose the install method your enterprise allows; Homebrew is common):
```sh
brew install ollama
```
2. Start the local service:
```sh
ollama serve
```
3. Pull/run a model (example placeholder):
```sh
ollama run <model-name>
```
4. Use the local API (optional):
```sh
curl http://127.0.0.1:11434/api/tags
```
### Practical setup: llama.cpp (most controllable)
If you can obtain a quantized GGUF model file via an approved internal mirror:
```sh
brew install llama.cpp
llama-cli -m /path/to/model.gguf -p "Hello" -n 256
```
### Practical setup: MLX (Python-centric on Apple Silicon)
If your environment allows Python packages and you have an MLX-converted model available internally:
```sh
python3 -m venv .venv && source .venv/bin/activate
pip install mlx mlx-lm
python -m mlx_lm.generate --model /path/to/mlx-model --prompt "Hello" --max-tokens 256
```
### Enterprise note (important)
Because model hosting sites may be blocked in your network category, the usual pattern in enterprise is:
1. Security-approved model list
2. Internal artifact store / mirror for model files
3. Local runtime (Ollama/llama.cpp/MLX) pointing to those internally hosted artifacts
## Steps (macOS): run Kimi Code CLI locally (client-side)
Source: Kimi CLI docs.
1. Install
```sh
# Install via uv (Python package manager)
uv tool install --python 3.13 kimi-cli
```
2. Verify
```sh
kimi --version
```
3. Start in a project directory
```sh
cd /Users/sd9235/code/mygh/learning_ai_2nd_brain
kimi
```
4. Authenticate
- Preferred:
- Run `/login` inside the CLI and complete the browser auth.
- Alternative:
- Run `/setup` and choose an API platform + API key + model.
5. If models dont show up
- Kimi CLI FAQ: verify network access to your configured providers API endpoints.
## Steps (NOT macOS): deploy Kimi K2 weights (server-side)
If you have access to Linux + NVIDIA GPUs, use the official K2 deployment guide:
- vLLM (requires CUDA; the guide notes vLLM v0.10.0rc1+)
- SGLang
- TensorRT-LLM
This is the realistic path if you truly need “local” (self-hosted) K2/K2-class inference: **run the model on a GPU box/cluster** and call it from your Mac.
## VPN / proxy: are we able to access through it?
### What you need
You generally need outbound access (through your VPN/proxy) to at least:
- your chosen providers API host (varies by provider)
- and, if downloading open weights, the model hosting site you plan to use.
### Quick connectivity checks
```sh
# DNS + HTTPS reachability
curl -I https://<YOUR_PROVIDER_API_HOST>
# If you plan to download weights later
curl -I https://<YOUR_MODEL_HOSTING_SITE>
```
### Configure proxy in a shell (typical)
```sh
export HTTP_PROXY="http://127.0.0.1:7890"
export HTTPS_PROXY="http://127.0.0.1:7890"
export ALL_PROXY="socks5://127.0.0.1:7890"
export NO_PROXY="localhost,127.0.0.1"
```
### Configure proxy for Git
```sh
git config --global http.proxy "$HTTPS_PROXY"
git config --global https.proxy "$HTTPS_PROXY"
```
### Configure proxy for Python/pip
```sh
pip config set global.proxy "$HTTPS_PROXY"
```
### Notes about your current network
From within this VS Code environment, requests to some provider/model-hosting sites were redirected to a corporate/web-filter “blockpage” URL. If you see that on your Mac too, youll need one of:
- VPN that routes around the filter
- proxy thats allowed
- allowlist/exception for those domains
## Recommendation
- If your goal is to _use_ Kimi on this Mac: **install Kimi Code CLI** and make sure your VPN/proxy allows access to your configured provider.
- If your goal is “true local inference”: **host Kimi K2 on a CUDA GPU server** (or use a smaller Mac-native model instead).
## Your Mac (detected)
- macOS: 15.7.3 (24G419)
- CPU arch: arm64
- Machine: MacBook Pro (Mac16,7)
- Chip: Apple M4 Pro (14 cores)
- Memory: 48 GB
- Python: 3.13.10
## Will nanoGPT work on this laptop?
Yes for learning, with a couple of caveats.
What will work well:
- **Small CPU/MPS runs** (toy datasets like Shakespeare, short experiments, sampling).
- With 48GB RAM and Apple Silicon, you have plenty of headroom for nanoGPT-style demos.
What might block you:
- Installing dependencies (notably PyTorch) typically requires access to package indexes that may be blocked in your network.
- In this workspace, PyTorch was successfully installed into the venv and `mps_available` is `True`.
- If you cant reach package indexes in a different environment, use an internal Python package mirror, or install from a pre-approved wheelhouse.
Practical suggestion:
- Start with CPU (`--device=cpu`) to keep it simple.
- If your PyTorch build supports Apple Metal (MPS), you can later try `--device=mps` for speed.
Quick verification (after installing torch):
```sh
python3 -c "import torch; print(torch.__version__); print('mps', torch.backends.mps.is_available())"
```
## nanoGPT: validated end-to-end in this workspace
This is a minimal, fast run that was verified on this machine.
### 1) Install deps (workspace venv)
```sh
# from the workspace root
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python -m pip install torch numpy transformers datasets tiktoken wandb tqdm
```
### 2) Clone nanoGPT
```sh
git clone https://github.com/karpathy/nanoGPT.git oss_llm/nanoGPT
```
### 3) Prepare dataset
```sh
cd oss_llm/nanoGPT
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python data/shakespeare_char/prepare.py
```
### 4) Short CPU training (writes `out-shakespeare-char/ckpt.pt`)
```sh
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python train.py config/train_shakespeare_char.py \
--device=cpu --compile=False \
--eval_interval=10 --eval_iters=10 --log_interval=10 \
--block_size=64 --batch_size=12 \
--n_layer=4 --n_head=4 --n_embd=128 \
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
--always_save_checkpoint=True
```
### 5) Sample
```sh
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python sample.py \
--out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
```
Tip: for speed on Apple Silicon, try `--device=mps` once youre comfortable.

View File

@ -0,0 +1,251 @@
#!/usr/bin/env python3
"""Find model-like files hosted on GitHub.
This script ONLY talks to GitHub (api.github.com) and is designed for
restricted enterprise networks where only GitHub is reachable.
It searches repositories by keyword, then inspects recent releases to find
assets that look like model files (e.g., .gguf, .safetensors, .bin, .zip).
Usage:
python oss_llm/find_models_on_github.py --query "qwen2.5 gguf" --limit 10
Optional:
- Set GITHUB_TOKEN to increase API rate limits.
Notes:
- GitHub rarely hosts very large model weights due to size constraints.
- Prefer smaller models and quantized artifacts.
"""
import argparse
import os
import sys
import textwrap
import urllib.parse
import json
import subprocess
MODEL_EXTENSIONS = (
".gguf",
".safetensors",
".bin",
".pt",
".pth",
".onnx",
".zip",
".tar.gz",
".tgz",
)
def _http_get_json(url: str) -> object:
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
cmd = [
"curl",
"-L",
"--fail",
"--silent",
"--show-error",
"-H",
"Accept: application/vnd.github+json",
]
if token:
cmd += ["-H", f"Authorization: Bearer {token}"]
cmd.append(url)
proc = subprocess.run(cmd, capture_output=True, text=True)
if proc.returncode != 0:
raise RuntimeError(proc.stderr.strip() or f"curl failed with exit code {proc.returncode}")
return json.loads(proc.stdout)
def _looks_like_model_asset(name: str) -> bool:
lower = name.lower()
return any(lower.endswith(ext) for ext in MODEL_EXTENSIONS)
def _safe_int(value: object) -> int | None:
try:
return int(value) # type: ignore[arg-type]
except Exception:
return None
def _scan_repo_tree_for_models(full_name: str, default_branch: str) -> list[str]:
"""Return matching file paths from the repo tree.
This is useful when model files are stored in-repo (often via Git LFS),
and therefore don't show up as release assets.
"""
# The Trees API can be large; callers should keep repo counts low.
url = f"https://api.github.com/repos/{full_name}/git/trees/{urllib.parse.quote(default_branch)}?recursive=1"
data = _http_get_json(url)
if not isinstance(data, dict):
return []
tree = data.get("tree")
if not isinstance(tree, list):
return []
hits: list[str] = []
for node in tree:
if not isinstance(node, dict):
continue
path = node.get("path")
ntype = node.get("type")
if ntype != "blob" or not isinstance(path, str):
continue
if _looks_like_model_asset(path):
hits.append(path)
return hits
def _human_size(num_bytes: int) -> str:
n = float(num_bytes)
for unit in ("B", "KB", "MB", "GB", "TB"):
if n < 1024.0:
return f"{n:.1f}{unit}"
n /= 1024.0
return f"{n:.1f}PB"
def main() -> int:
p = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
description="Search GitHub releases for model-like assets.",
epilog=textwrap.dedent(
"""
Examples:
python oss_llm/find_models_on_github.py --query "llama gguf" --limit 20
python oss_llm/find_models_on_github.py --query "qwen 2.5 gguf" --limit 10
Tips:
- Add qualifiers to narrow results, e.g. `language:python`, `in:name`, `topic:llama-cpp`.
- Set GITHUB_TOKEN to avoid low unauthenticated rate limits.
"""
),
)
p.add_argument("--query", required=True, help="GitHub search query")
p.add_argument("--limit", type=int, default=10, help="Max repositories to inspect")
p.add_argument(
"--per-page",
type=int,
default=10,
help="Repos per page from search API (max 100)",
)
p.add_argument(
"--max-releases",
type=int,
default=5,
help="Max recent releases per repo to inspect",
)
p.add_argument(
"--scan-tree",
action="store_true",
help="Also scan repository file trees for model-like files (slower, more API calls)",
)
args = p.parse_args()
query = args.query.strip()
if not query:
print("--query must be non-empty", file=sys.stderr)
return 2
per_page = max(1, min(100, args.per_page))
limit = max(1, args.limit)
search_q = urllib.parse.quote(query)
search_url = f"https://api.github.com/search/repositories?q={search_q}&per_page={per_page}"
try:
search = _http_get_json(search_url)
except Exception as e:
print(f"GitHub search failed: {e}", file=sys.stderr)
return 1
items = search.get("items") if isinstance(search, dict) else None
if not items:
print("No repositories found.")
return 0
inspected = 0
found_any = False
for repo in items:
if inspected >= limit:
break
full_name = repo.get("full_name")
html_url = repo.get("html_url")
default_branch = repo.get("default_branch")
if not full_name or not html_url:
continue
inspected += 1
print(f"\n== {full_name} ==")
print(html_url)
releases_url = f"https://api.github.com/repos/{full_name}/releases?per_page={args.max_releases}"
try:
releases = _http_get_json(releases_url)
except Exception as e:
print(f" releases: error: {e}")
continue
if not isinstance(releases, list) or len(releases) == 0:
print(" releases: none")
releases = []
repo_hit = False
for rel in releases[: args.max_releases]:
tag = rel.get("tag_name") or "(no tag)"
name = rel.get("name") or "(no name)"
assets = rel.get("assets") or []
model_assets = [a for a in assets if _looks_like_model_asset(a.get("name", ""))]
if not model_assets:
continue
repo_hit = True
found_any = True
print(f" release: {tag}{name}")
for a in model_assets:
aname = a.get("name") or "(no name)"
size = a.get("size")
url = a.get("browser_download_url") or ""
size_str = _human_size(int(size)) if isinstance(size, int) else "?"
print(f" - {aname} ({size_str})")
if url:
print(f" {url}")
if releases and not repo_hit:
print(" releases: present, but no model-like assets found")
if args.scan_tree and isinstance(default_branch, str) and default_branch:
try:
paths = _scan_repo_tree_for_models(full_name, default_branch)
except Exception as e:
print(f" repo files: error: {e}")
continue
if paths:
found_any = True
print(f" repo files: found {len(paths)} model-like paths")
for pth in paths[:30]:
print(f" - {pth}")
if len(paths) > 30:
print(" - (more omitted)")
else:
print(" repo files: no model-like paths found")
if not found_any:
print("\nNo model-like release assets found in inspected repositories.")
print("Try refining your query (add 'gguf', model family name, or 'quant').")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -euo pipefail
echo "== Network reachability =="
: "${CHECK_URLS:=}"
if [[ -z "${CHECK_URLS}" ]]; then
cat <<'TXT'
No URLs configured.
Set CHECK_URLS to a space-separated list of HTTPS URLs you want to test, e.g.:
CHECK_URLS="https://github.com https://example.com" ./oss_llm/quick_checks.sh
This script intentionally does not hardcode any provider endpoints.
TXT
else
for url in ${CHECK_URLS}; do
echo "\n--- $url"
curl -I --max-time 10 "$url" | head -n 5 || true
done
fi
echo "\n== Proxy env vars (current) =="
env | grep -E '^(HTTP|HTTPS|ALL|NO)_PROXY=' || echo "(none set)"

View File

@ -0,0 +1,12 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_venv
# nanoGPT README lists these deps.
"${VENV_PY}" -m pip install --upgrade pip
"${VENV_PY}" -m pip install torch numpy transformers datasets tiktoken wandb tqdm
# Quick sanity check
"${VENV_PY}" -c "import torch; print('torch', torch.__version__); print('mps', torch.backends.mps.is_available())"

View File

@ -0,0 +1,17 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_git
mkdir -p "${WORKSPACE_ROOT}/oss_llm"
if [[ -d "${NANOGPT_DIR}/.git" ]]; then
echo "nanoGPT already cloned; updating…"
cd "${NANOGPT_DIR}"
git pull --ff-only
else
echo "Cloning nanoGPT into ${NANOGPT_DIR}"
rm -rf "${NANOGPT_DIR}"
git clone https://github.com/karpathy/nanoGPT.git "${NANOGPT_DIR}"
fi

View File

@ -0,0 +1,20 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_venv
require_curl
cd_nanogpt
# Prefer a github.com URL (some networks block raw.githubusercontent.com).
# This writes the input file where nanoGPT's prepare script expects it.
INPUT_TXT="data/shakespeare_char/input.txt"
if [[ ! -f "${INPUT_TXT}" ]]; then
mkdir -p "$(dirname "${INPUT_TXT}")"
echo "Downloading tiny Shakespeare to ${INPUT_TXT}" >&2
curl -fL --retry 3 --retry-delay 1 \
-o "${INPUT_TXT}" \
"https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt"
fi
"${VENV_PY}" data/shakespeare_char/prepare.py

View File

@ -0,0 +1,16 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_venv
cd_nanogpt
"${VENV_PY}" train.py config/train_shakespeare_char.py \
--device=cpu --compile=False --dtype=float32 \
--eval_interval=10 --eval_iters=10 --log_interval=10 \
--block_size=64 --batch_size=12 \
--n_layer=4 --n_head=4 --n_embd=128 \
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
--always_save_checkpoint=True
echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"

View File

@ -0,0 +1,24 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_venv
cd_nanogpt
# Requires torch with MPS support (Apple Silicon). If MPS isn't available,
# fall back to CPU.
DEVICE="mps"
if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
echo "MPS not available; falling back to CPU" >&2
DEVICE="cpu"
fi
"${VENV_PY}" train.py config/train_shakespeare_char.py \
--device="${DEVICE}" --compile=False --dtype=float32 \
--eval_interval=10 --eval_iters=10 --log_interval=10 \
--block_size=64 --batch_size=12 \
--n_layer=4 --n_head=4 --n_embd=128 \
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
--always_save_checkpoint=True
echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"

View File

@ -0,0 +1,14 @@
#!/usr/bin/env bash
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
require_venv
cd_nanogpt
if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
echo "ERROR: ckpt.pt not found. Run training first:" >&2
echo " bash oss_llm/testNanoGPT/30_train_cpu_quick.sh" >&2
exit 1
fi
"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200

View File

@ -0,0 +1,32 @@
#!/usr/bin/env bash
set -euo pipefail
# One-shot validation that prefers Apple Silicon MPS.
bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
bash oss_llm/testNanoGPT/00_setup_env.sh
bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
# Train with MPS if available (script falls back to CPU)
bash oss_llm/testNanoGPT/31_train_mps_quick.sh
# Sample with MPS if available, else CPU
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# shellcheck source=./_common.sh
source "${SCRIPT_DIR}/_common.sh"
require_venv
cd_nanogpt
DEVICE="mps"
if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
DEVICE="cpu"
fi
if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
echo "ERROR: ckpt.pt not found after training" >&2
exit 1
fi
"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device="${DEVICE}" --dtype=float32 --max_new_tokens=200
printf '\nOK: nanoGPT MPS smoke test completed (device=%s).\n' "${DEVICE}"

View File

@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -euo pipefail
# One-shot end-to-end validation.
bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
bash oss_llm/testNanoGPT/00_setup_env.sh
bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
bash oss_llm/testNanoGPT/40_sample_cpu.sh
echo "\nOK: nanoGPT smoke test completed."

View File

@ -0,0 +1,105 @@
# nanoGPT: use & test (local)
This folder contains runnable scripts to validate **nanoGPT** end-to-end in this workspace.
## Prereqs
- `git`, `python3`, and `curl` are available.
- GitHub is reachable (for cloning nanoGPT and downloading the tiny Shakespeare text).
## What the scripts do
- `00_setup_env.sh`: installs Python deps into the workspace venv.
- `10_clone_nanogpt.sh`: clones `karpathy/nanoGPT` into `oss_llm/nanoGPT` (or updates it).
- `20_prepare_shakespeare.sh`: downloads/prepares the tiny Shakespeare dataset.
- `30_train_cpu_quick.sh`: quick CPU training run (writes `out-shakespeare-char/ckpt.pt`).
- `31_train_mps_quick.sh`: quick MPS (Metal) training run (faster on Apple Silicon).
- `40_sample_cpu.sh`: samples from the trained checkpoint.
- `98_smoke_test_mps.sh`: runs clone → deps → prepare → train (MPS) → sample (MPS).
- `99_smoke_test_all.sh`: runs clone → deps → prepare → train → sample.
## What nanoGPT demonstrates (current capabilities)
With the default tiny Shakespeare **character-level** example, the checkpoint you train here supports:
- **Unconditional generation**: start from a newline and generate Shakespeare-ish character patterns.
- **Prompted continuation**: provide a short prompt (e.g. a phrase) and generate a continuation.
- **Sampling controls**:
- `--temperature` controls randomness (lower is more deterministic).
- `--top_k` clamps sampling to the top-K next-token candidates (lower is more conservative).
Reality check: with the default quick config (small model, short training), output will often look like _Shakespeare-shaped gibberish_. Thats expected; the goal is validating the end-to-end training + sampling workflow.
### Reproduce the demo generations
From the workspace root:
1. Ensure you have a checkpoint (CPU):
```sh
bash oss_llm/testNanoGPT/99_smoke_test_all.sh
```
2. Unconditional samples (2 short samples):
```sh
cd oss_llm/nanoGPT
./../.venv/bin/python sample.py \
--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
--num_samples=2 --max_new_tokens=220 --temperature=0.8 --top_k=200
```
3. Prompted continuation (more conservative sampling):
```sh
cd oss_llm/nanoGPT
./../.venv/bin/python sample.py \
--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
--num_samples=1 --max_new_tokens=220 --temperature=0.4 --top_k=50 \
--start="To be, or not to be"
```
## One-command smoke test
From the workspace root:
```sh
bash oss_llm/testNanoGPT/99_smoke_test_all.sh
```
## One-command MPS smoke test
From the workspace root:
```sh
bash oss_llm/testNanoGPT/98_smoke_test_mps.sh
```
## Common commands
- Install deps only:
```sh
bash oss_llm/testNanoGPT/00_setup_env.sh
```
- Quick CPU train + sample:
```sh
bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
bash oss_llm/testNanoGPT/40_sample_cpu.sh
```
- Quick MPS train + sample:
```sh
bash oss_llm/testNanoGPT/31_train_mps_quick.sh
bash oss_llm/testNanoGPT/40_sample_cpu.sh # sampling can still run on CPU
```
## Notes
- The CPU/MPS training scripts intentionally use a tiny model + small iteration count to finish quickly.
- If `ckpt.pt` is missing, it usually means training didnt run an eval step; these scripts set `--eval_interval` low to force checkpoint writes.
- Scripts will auto-create `./.venv` on first run if it does not exist.
- Dataset download is via a `github.com/.../raw/...` URL to avoid reliance on `raw.githubusercontent.com`.

View File

@ -0,0 +1,61 @@
#!/usr/bin/env bash
set -euo pipefail
# Resolve workspace root (this repo) regardless of where the script is called from.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
WORKSPACE_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
# Some managed macOS environments set TMPDIR to slow/unusual locations.
# Force a local temp directory for predictable behavior.
export TMPDIR="/tmp"
VENV_PY="${WORKSPACE_ROOT}/.venv/bin/python"
NANOGPT_DIR="${WORKSPACE_ROOT}/oss_llm/nanoGPT"
require_python3() {
if ! command -v python3 >/dev/null 2>&1; then
echo "ERROR: python3 not found in PATH" >&2
exit 1
fi
}
require_curl() {
if ! command -v curl >/dev/null 2>&1; then
echo "ERROR: curl not found in PATH" >&2
exit 1
fi
}
ensure_venv() {
if [[ -x "${VENV_PY}" ]]; then
return 0
fi
require_python3
echo "Creating workspace venv at: ${WORKSPACE_ROOT}/.venv" >&2
python3 -m venv "${WORKSPACE_ROOT}/.venv"
}
require_venv() {
ensure_venv
if [[ ! -x "${VENV_PY}" ]]; then
echo "ERROR: Failed to create venv python at: ${VENV_PY}" >&2
exit 1
fi
}
require_git() {
if ! command -v git >/dev/null 2>&1; then
echo "ERROR: git not found in PATH" >&2
exit 1
fi
}
cd_nanogpt() {
if [[ ! -d "${NANOGPT_DIR}" ]]; then
echo "ERROR: nanoGPT repo not found at ${NANOGPT_DIR}" >&2
echo "Run: bash oss_llm/testNanoGPT/10_clone_nanogpt.sh" >&2
exit 1
fi
cd "${NANOGPT_DIR}"
}