move oss_llm/ from learning_ai_2nd_brain
This commit is contained in:
parent
a32978f9c3
commit
1a4b3c1fb3
383
__LOCAL_LLMs/oss_llm/README.md
Normal file
383
__LOCAL_LLMs/oss_llm/README.md
Normal file
@ -0,0 +1,383 @@
|
||||
# Kimi “2.5” local deployment on macOS (what’s реально possible)
|
||||
|
||||
## Executive summary
|
||||
|
||||
- **Running the real “Kimi 2.5 / Kimi K2-class” model fully locally on a Mac is not practical**: the official open-weight Kimi K2 deployment guidance targets **multi-node NVIDIA GPU clusters** (vLLM/SGLang/TensorRT-LLM) and assumes CUDA.
|
||||
- **What _is_ practical on a Mac:** use **Kimi Code CLI** (official) or the **Moonshot API** (official). That’s not “local inference” (weights on your laptop), but it is “local usage” (client runs on your laptop).
|
||||
- **VPN / proxy:** yes, clients can usually work through VPN/proxy, but you must be able to reach the provider’s API endpoints. If your network blocks access, you’ll need allowlisting or a different route.
|
||||
|
||||
## What GitHub shows (official sources)
|
||||
|
||||
### 1) Kimi K2 (open-weight model series)
|
||||
|
||||
- Official repo: https://github.com/MoonshotAI/Kimi-K2
|
||||
- The repo includes a deployment guide at `docs/deploy_guidance.md`.
|
||||
- Key reality check from the guide: **the smallest FP8 128k-seqlen deployment is described as ~16 GPUs (H200/H20-class)** for mainstream setups (vLLM/SGLang). This is fundamentally not macOS-laptop friendly.
|
||||
|
||||
### 2) Kimi Code CLI (best option for macOS)
|
||||
|
||||
- Official repo: https://github.com/MoonshotAI/kimi-cli
|
||||
- Docs: https://moonshotai.github.io/kimi-cli/en/guides/getting-started.html
|
||||
- Kimi Code CLI is an **agent client** (terminal tool) that talks to a remote provider.
|
||||
- First-run authentication options:
|
||||
- `/login` (browser login; auto-configures models)
|
||||
- `/setup` (API key flow)
|
||||
|
||||
## Can we locally deploy “Kimi 2.5” on this Mac?
|
||||
|
||||
### Practically: no (for K2 / K2-class)
|
||||
|
||||
- **macOS has no CUDA**, so GPU-first inference stacks referenced for K2 deployment (vLLM / TensorRT-LLM) are not an option.
|
||||
- Even if you tried CPU inference, the K2-class model scale (MoE, 1T total params / 32B active) is far beyond what is reasonable on a laptop.
|
||||
|
||||
### What’s feasible instead
|
||||
|
||||
1. **Use Kimi Code CLI on macOS** (recommended)
|
||||
2. **Use Moonshot/Kimi APIs from your own scripts** (Python/Node/etc) once your network allows access
|
||||
3. If you truly need **local weights on Mac**, use a Mac-friendly model/runtime (e.g., MLX/Ollama) — but that would be **a different model**, not Kimi 2.5/K2.
|
||||
|
||||
## Options: smaller, Mac-runnable open models (local inference)
|
||||
|
||||
If your goal is “no network calls at runtime”, pick a **local runtime** + a **small-enough model + quantization**.
|
||||
|
||||
## If you can only access GitHub
|
||||
|
||||
If your enterprise network only allows `github.com`, that severely limits how you obtain model weights because most model hosting is **not** on GitHub.
|
||||
|
||||
What still works:
|
||||
|
||||
- **GitHub Releases assets**: some projects publish quantized model files (often `.gguf`) as release assets.
|
||||
- **Git LFS inside a repo**: occasionally weights are stored in-repo via LFS (still uncommon for large models).
|
||||
|
||||
What usually won’t work:
|
||||
|
||||
- Downloading weights from external model registries / hosting sites (blocked by policy).
|
||||
|
||||
Reality check: GitHub has practical size limits, so **very large models are rarely hosted there**. Your best bet is to use **small models (7B–14B) in quantized form**.
|
||||
|
||||
### How to find downloadable models on GitHub
|
||||
|
||||
- Use [oss_llm/find_models_on_github.py](oss_llm/find_models_on_github.py) to search GitHub repos and list any release assets that look like model files.
|
||||
- Prefer assets ending in `.gguf` if you plan to run with `llama.cpp`.
|
||||
|
||||
Example:
|
||||
|
||||
```sh
|
||||
python3 oss_llm/find_models_on_github.py --query "gguf qwen 2.5" --limit 20
|
||||
```
|
||||
|
||||
### Recommended enterprise pattern
|
||||
|
||||
If you need a specific model but can’t download it directly:
|
||||
|
||||
1. Ask security for an **approved internal mirror** / artifact store.
|
||||
2. Mirror the model files there.
|
||||
3. Point your local runtime (Ollama/llama.cpp/MLX) at the internal location.
|
||||
|
||||
### “No security review” learning path (best effort): train a tiny model yourself
|
||||
|
||||
If you want to avoid downloading any third-party model weights at all, the cleanest option is to:
|
||||
|
||||
- clone training code from GitHub
|
||||
- train a small model on a local/public-domain text file
|
||||
|
||||
This won’t produce a state-of-the-art assistant, but it’s excellent for learning tokenization, training loops, sampling, and basic eval.
|
||||
|
||||
One popular repo for this is **karpathy/nanoGPT**.
|
||||
|
||||
High-level steps:
|
||||
|
||||
```sh
|
||||
git clone https://github.com/karpathy/nanoGPT.git
|
||||
cd nanoGPT
|
||||
|
||||
# create a python env (choose your preferred method)
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
|
||||
# run the built-in Shakespeare example (it downloads a small text file)
|
||||
python data/shakespeare_char/prepare.py
|
||||
python train.py data/shakespeare_char --device=cpu --compile=False
|
||||
|
||||
# sample
|
||||
python sample.py --out_dir=out-shakespeare-char --device=cpu --compile=False
|
||||
```
|
||||
|
||||
If even small downloads are restricted, replace the dataset step with your own local text file and adjust the dataset script accordingly.
|
||||
|
||||
### nanoGPT demo (this repo)
|
||||
|
||||
This workspace includes a reproducible nanoGPT workflow (CPU and Apple Silicon MPS) plus sampling demo commands:
|
||||
|
||||
- See [oss_llm/testNanoGPT/README.md](oss_llm/testNanoGPT/README.md)
|
||||
|
||||
### If you do download GGUF weights from GitHub
|
||||
|
||||
It can work (some repos commit a `.gguf` directly), but treat it like any third-party binary:
|
||||
|
||||
- Prefer repos with **clear licensing** (repo license + explicit model license/provenance)
|
||||
- Prefer “original publisher” repos over re-uploads
|
||||
- Keep models small (e.g., ~100M–3B) for macOS learning
|
||||
|
||||
### Recommended runtimes (macOS)
|
||||
|
||||
- **Ollama**: easiest “download + chat + local HTTP API” experience.
|
||||
- **LM Studio**: easy GUI; also exposes a local API server.
|
||||
- **llama.cpp**: most portable; great for CPU/Metal, quantized GGUF models.
|
||||
- **MLX** (Apple): best when you want Python-native workflows on Apple Silicon.
|
||||
|
||||
### Model size guidance (rule of thumb)
|
||||
|
||||
Quantized models are what make laptops viable.
|
||||
|
||||
- **8–10B @ 4-bit**: typically comfortable on 16GB unified memory.
|
||||
- **14B @ 4-bit**: better with 24–32GB unified memory.
|
||||
- **30B+**: usually needs 64GB+ and will still be slow.
|
||||
|
||||
### Good “starter” model families (pick one)
|
||||
|
||||
These are widely supported by the runtimes above and have strong general utility:
|
||||
|
||||
- **Llama 3.x (8B class)**: strong general chat + coding for the size.
|
||||
- **Qwen 2.5 (7B/14B class)**: strong multilingual + coding.
|
||||
- **Mistral 7B class**: fast and solid baseline.
|
||||
- **Gemma 2 (9B class)**: good general-purpose quality.
|
||||
- **Phi-3.x (mini/small class)**: very fast and lightweight.
|
||||
|
||||
### Suggested picks by Mac memory
|
||||
|
||||
- **16GB unified memory**: start with an 8–9B model at 4-bit.
|
||||
- **32GB unified memory**: 14B at 4-bit is a good sweet spot.
|
||||
- **64GB unified memory**: 27–34B at 4-bit becomes feasible (still slower).
|
||||
|
||||
### Practical setup: Ollama (quickest)
|
||||
|
||||
1. Install runtime (choose the install method your enterprise allows; Homebrew is common):
|
||||
|
||||
```sh
|
||||
brew install ollama
|
||||
```
|
||||
|
||||
2. Start the local service:
|
||||
|
||||
```sh
|
||||
ollama serve
|
||||
```
|
||||
|
||||
3. Pull/run a model (example placeholder):
|
||||
|
||||
```sh
|
||||
ollama run <model-name>
|
||||
```
|
||||
|
||||
4. Use the local API (optional):
|
||||
|
||||
```sh
|
||||
curl http://127.0.0.1:11434/api/tags
|
||||
```
|
||||
|
||||
### Practical setup: llama.cpp (most controllable)
|
||||
|
||||
If you can obtain a quantized GGUF model file via an approved internal mirror:
|
||||
|
||||
```sh
|
||||
brew install llama.cpp
|
||||
llama-cli -m /path/to/model.gguf -p "Hello" -n 256
|
||||
```
|
||||
|
||||
### Practical setup: MLX (Python-centric on Apple Silicon)
|
||||
|
||||
If your environment allows Python packages and you have an MLX-converted model available internally:
|
||||
|
||||
```sh
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
pip install mlx mlx-lm
|
||||
python -m mlx_lm.generate --model /path/to/mlx-model --prompt "Hello" --max-tokens 256
|
||||
```
|
||||
|
||||
### Enterprise note (important)
|
||||
|
||||
Because model hosting sites may be blocked in your network category, the usual pattern in enterprise is:
|
||||
|
||||
1. Security-approved model list
|
||||
2. Internal artifact store / mirror for model files
|
||||
3. Local runtime (Ollama/llama.cpp/MLX) pointing to those internally hosted artifacts
|
||||
|
||||
## Steps (macOS): run Kimi Code CLI locally (client-side)
|
||||
|
||||
Source: Kimi CLI docs.
|
||||
|
||||
1. Install
|
||||
|
||||
```sh
|
||||
# Install via uv (Python package manager)
|
||||
uv tool install --python 3.13 kimi-cli
|
||||
```
|
||||
|
||||
2. Verify
|
||||
|
||||
```sh
|
||||
kimi --version
|
||||
```
|
||||
|
||||
3. Start in a project directory
|
||||
|
||||
```sh
|
||||
cd /Users/sd9235/code/mygh/learning_ai_2nd_brain
|
||||
kimi
|
||||
```
|
||||
|
||||
4. Authenticate
|
||||
|
||||
- Preferred:
|
||||
- Run `/login` inside the CLI and complete the browser auth.
|
||||
- Alternative:
|
||||
- Run `/setup` and choose an API platform + API key + model.
|
||||
|
||||
5. If models don’t show up
|
||||
|
||||
- Kimi CLI FAQ: verify network access to your configured provider’s API endpoints.
|
||||
|
||||
## Steps (NOT macOS): deploy Kimi K2 weights (server-side)
|
||||
|
||||
If you have access to Linux + NVIDIA GPUs, use the official K2 deployment guide:
|
||||
|
||||
- vLLM (requires CUDA; the guide notes vLLM v0.10.0rc1+)
|
||||
- SGLang
|
||||
- TensorRT-LLM
|
||||
|
||||
This is the realistic path if you truly need “local” (self-hosted) K2/K2-class inference: **run the model on a GPU box/cluster** and call it from your Mac.
|
||||
|
||||
## VPN / proxy: are we able to access through it?
|
||||
|
||||
### What you need
|
||||
|
||||
You generally need outbound access (through your VPN/proxy) to at least:
|
||||
|
||||
- your chosen provider’s API host (varies by provider)
|
||||
- and, if downloading open weights, the model hosting site you plan to use.
|
||||
|
||||
### Quick connectivity checks
|
||||
|
||||
```sh
|
||||
# DNS + HTTPS reachability
|
||||
curl -I https://<YOUR_PROVIDER_API_HOST>
|
||||
|
||||
# If you plan to download weights later
|
||||
curl -I https://<YOUR_MODEL_HOSTING_SITE>
|
||||
```
|
||||
|
||||
### Configure proxy in a shell (typical)
|
||||
|
||||
```sh
|
||||
export HTTP_PROXY="http://127.0.0.1:7890"
|
||||
export HTTPS_PROXY="http://127.0.0.1:7890"
|
||||
export ALL_PROXY="socks5://127.0.0.1:7890"
|
||||
export NO_PROXY="localhost,127.0.0.1"
|
||||
```
|
||||
|
||||
### Configure proxy for Git
|
||||
|
||||
```sh
|
||||
git config --global http.proxy "$HTTPS_PROXY"
|
||||
git config --global https.proxy "$HTTPS_PROXY"
|
||||
```
|
||||
|
||||
### Configure proxy for Python/pip
|
||||
|
||||
```sh
|
||||
pip config set global.proxy "$HTTPS_PROXY"
|
||||
```
|
||||
|
||||
### Notes about your current network
|
||||
|
||||
From within this VS Code environment, requests to some provider/model-hosting sites were redirected to a corporate/web-filter “blockpage” URL. If you see that on your Mac too, you’ll need one of:
|
||||
|
||||
- VPN that routes around the filter
|
||||
- proxy that’s allowed
|
||||
- allowlist/exception for those domains
|
||||
|
||||
## Recommendation
|
||||
|
||||
- If your goal is to _use_ Kimi on this Mac: **install Kimi Code CLI** and make sure your VPN/proxy allows access to your configured provider.
|
||||
- If your goal is “true local inference”: **host Kimi K2 on a CUDA GPU server** (or use a smaller Mac-native model instead).
|
||||
|
||||
## Your Mac (detected)
|
||||
|
||||
- macOS: 15.7.3 (24G419)
|
||||
- CPU arch: arm64
|
||||
- Machine: MacBook Pro (Mac16,7)
|
||||
- Chip: Apple M4 Pro (14 cores)
|
||||
- Memory: 48 GB
|
||||
- Python: 3.13.10
|
||||
|
||||
## Will nanoGPT work on this laptop?
|
||||
|
||||
Yes for learning, with a couple of caveats.
|
||||
|
||||
What will work well:
|
||||
|
||||
- **Small CPU/MPS runs** (toy datasets like Shakespeare, short experiments, sampling).
|
||||
- With 48GB RAM and Apple Silicon, you have plenty of headroom for nanoGPT-style demos.
|
||||
|
||||
What might block you:
|
||||
|
||||
- Installing dependencies (notably PyTorch) typically requires access to package indexes that may be blocked in your network.
|
||||
- In this workspace, PyTorch was successfully installed into the venv and `mps_available` is `True`.
|
||||
- If you can’t reach package indexes in a different environment, use an internal Python package mirror, or install from a pre-approved wheelhouse.
|
||||
|
||||
Practical suggestion:
|
||||
|
||||
- Start with CPU (`--device=cpu`) to keep it simple.
|
||||
- If your PyTorch build supports Apple Metal (MPS), you can later try `--device=mps` for speed.
|
||||
|
||||
Quick verification (after installing torch):
|
||||
|
||||
```sh
|
||||
python3 -c "import torch; print(torch.__version__); print('mps', torch.backends.mps.is_available())"
|
||||
```
|
||||
|
||||
## nanoGPT: validated end-to-end in this workspace
|
||||
|
||||
This is a minimal, fast run that was verified on this machine.
|
||||
|
||||
### 1) Install deps (workspace venv)
|
||||
|
||||
```sh
|
||||
# from the workspace root
|
||||
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python -m pip install torch numpy transformers datasets tiktoken wandb tqdm
|
||||
```
|
||||
|
||||
### 2) Clone nanoGPT
|
||||
|
||||
```sh
|
||||
git clone https://github.com/karpathy/nanoGPT.git oss_llm/nanoGPT
|
||||
```
|
||||
|
||||
### 3) Prepare dataset
|
||||
|
||||
```sh
|
||||
cd oss_llm/nanoGPT
|
||||
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python data/shakespeare_char/prepare.py
|
||||
```
|
||||
|
||||
### 4) Short CPU training (writes `out-shakespeare-char/ckpt.pt`)
|
||||
|
||||
```sh
|
||||
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python train.py config/train_shakespeare_char.py \
|
||||
--device=cpu --compile=False \
|
||||
--eval_interval=10 --eval_iters=10 --log_interval=10 \
|
||||
--block_size=64 --batch_size=12 \
|
||||
--n_layer=4 --n_head=4 --n_embd=128 \
|
||||
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
|
||||
--always_save_checkpoint=True
|
||||
```
|
||||
|
||||
### 5) Sample
|
||||
|
||||
```sh
|
||||
/Users/sd9235/code/mygh/learning_ai_2nd_brain/.venv/bin/python sample.py \
|
||||
--out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
|
||||
```
|
||||
|
||||
Tip: for speed on Apple Silicon, try `--device=mps` once you’re comfortable.
|
||||
251
__LOCAL_LLMs/oss_llm/find_models_on_github.py
Normal file
251
__LOCAL_LLMs/oss_llm/find_models_on_github.py
Normal file
@ -0,0 +1,251 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Find model-like files hosted on GitHub.
|
||||
|
||||
This script ONLY talks to GitHub (api.github.com) and is designed for
|
||||
restricted enterprise networks where only GitHub is reachable.
|
||||
|
||||
It searches repositories by keyword, then inspects recent releases to find
|
||||
assets that look like model files (e.g., .gguf, .safetensors, .bin, .zip).
|
||||
|
||||
Usage:
|
||||
python oss_llm/find_models_on_github.py --query "qwen2.5 gguf" --limit 10
|
||||
|
||||
Optional:
|
||||
- Set GITHUB_TOKEN to increase API rate limits.
|
||||
|
||||
Notes:
|
||||
- GitHub rarely hosts very large model weights due to size constraints.
|
||||
- Prefer smaller models and quantized artifacts.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import textwrap
|
||||
import urllib.parse
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
MODEL_EXTENSIONS = (
|
||||
".gguf",
|
||||
".safetensors",
|
||||
".bin",
|
||||
".pt",
|
||||
".pth",
|
||||
".onnx",
|
||||
".zip",
|
||||
".tar.gz",
|
||||
".tgz",
|
||||
)
|
||||
|
||||
|
||||
def _http_get_json(url: str) -> object:
|
||||
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
|
||||
|
||||
cmd = [
|
||||
"curl",
|
||||
"-L",
|
||||
"--fail",
|
||||
"--silent",
|
||||
"--show-error",
|
||||
"-H",
|
||||
"Accept: application/vnd.github+json",
|
||||
]
|
||||
if token:
|
||||
cmd += ["-H", f"Authorization: Bearer {token}"]
|
||||
cmd.append(url)
|
||||
|
||||
proc = subprocess.run(cmd, capture_output=True, text=True)
|
||||
if proc.returncode != 0:
|
||||
raise RuntimeError(proc.stderr.strip() or f"curl failed with exit code {proc.returncode}")
|
||||
return json.loads(proc.stdout)
|
||||
|
||||
|
||||
def _looks_like_model_asset(name: str) -> bool:
|
||||
lower = name.lower()
|
||||
return any(lower.endswith(ext) for ext in MODEL_EXTENSIONS)
|
||||
|
||||
|
||||
def _safe_int(value: object) -> int | None:
|
||||
try:
|
||||
return int(value) # type: ignore[arg-type]
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _scan_repo_tree_for_models(full_name: str, default_branch: str) -> list[str]:
|
||||
"""Return matching file paths from the repo tree.
|
||||
|
||||
This is useful when model files are stored in-repo (often via Git LFS),
|
||||
and therefore don't show up as release assets.
|
||||
"""
|
||||
|
||||
# The Trees API can be large; callers should keep repo counts low.
|
||||
url = f"https://api.github.com/repos/{full_name}/git/trees/{urllib.parse.quote(default_branch)}?recursive=1"
|
||||
data = _http_get_json(url)
|
||||
if not isinstance(data, dict):
|
||||
return []
|
||||
tree = data.get("tree")
|
||||
if not isinstance(tree, list):
|
||||
return []
|
||||
|
||||
hits: list[str] = []
|
||||
for node in tree:
|
||||
if not isinstance(node, dict):
|
||||
continue
|
||||
path = node.get("path")
|
||||
ntype = node.get("type")
|
||||
if ntype != "blob" or not isinstance(path, str):
|
||||
continue
|
||||
if _looks_like_model_asset(path):
|
||||
hits.append(path)
|
||||
return hits
|
||||
|
||||
|
||||
def _human_size(num_bytes: int) -> str:
|
||||
n = float(num_bytes)
|
||||
for unit in ("B", "KB", "MB", "GB", "TB"):
|
||||
if n < 1024.0:
|
||||
return f"{n:.1f}{unit}"
|
||||
n /= 1024.0
|
||||
return f"{n:.1f}PB"
|
||||
|
||||
|
||||
def main() -> int:
|
||||
p = argparse.ArgumentParser(
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
description="Search GitHub releases for model-like assets.",
|
||||
epilog=textwrap.dedent(
|
||||
"""
|
||||
Examples:
|
||||
python oss_llm/find_models_on_github.py --query "llama gguf" --limit 20
|
||||
python oss_llm/find_models_on_github.py --query "qwen 2.5 gguf" --limit 10
|
||||
|
||||
Tips:
|
||||
- Add qualifiers to narrow results, e.g. `language:python`, `in:name`, `topic:llama-cpp`.
|
||||
- Set GITHUB_TOKEN to avoid low unauthenticated rate limits.
|
||||
"""
|
||||
),
|
||||
)
|
||||
p.add_argument("--query", required=True, help="GitHub search query")
|
||||
p.add_argument("--limit", type=int, default=10, help="Max repositories to inspect")
|
||||
p.add_argument(
|
||||
"--per-page",
|
||||
type=int,
|
||||
default=10,
|
||||
help="Repos per page from search API (max 100)",
|
||||
)
|
||||
p.add_argument(
|
||||
"--max-releases",
|
||||
type=int,
|
||||
default=5,
|
||||
help="Max recent releases per repo to inspect",
|
||||
)
|
||||
p.add_argument(
|
||||
"--scan-tree",
|
||||
action="store_true",
|
||||
help="Also scan repository file trees for model-like files (slower, more API calls)",
|
||||
)
|
||||
args = p.parse_args()
|
||||
|
||||
query = args.query.strip()
|
||||
if not query:
|
||||
print("--query must be non-empty", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
per_page = max(1, min(100, args.per_page))
|
||||
limit = max(1, args.limit)
|
||||
|
||||
search_q = urllib.parse.quote(query)
|
||||
search_url = f"https://api.github.com/search/repositories?q={search_q}&per_page={per_page}"
|
||||
|
||||
try:
|
||||
search = _http_get_json(search_url)
|
||||
except Exception as e:
|
||||
print(f"GitHub search failed: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
items = search.get("items") if isinstance(search, dict) else None
|
||||
if not items:
|
||||
print("No repositories found.")
|
||||
return 0
|
||||
|
||||
inspected = 0
|
||||
found_any = False
|
||||
|
||||
for repo in items:
|
||||
if inspected >= limit:
|
||||
break
|
||||
|
||||
full_name = repo.get("full_name")
|
||||
html_url = repo.get("html_url")
|
||||
default_branch = repo.get("default_branch")
|
||||
if not full_name or not html_url:
|
||||
continue
|
||||
|
||||
inspected += 1
|
||||
print(f"\n== {full_name} ==")
|
||||
print(html_url)
|
||||
|
||||
releases_url = f"https://api.github.com/repos/{full_name}/releases?per_page={args.max_releases}"
|
||||
try:
|
||||
releases = _http_get_json(releases_url)
|
||||
except Exception as e:
|
||||
print(f" releases: error: {e}")
|
||||
continue
|
||||
|
||||
if not isinstance(releases, list) or len(releases) == 0:
|
||||
print(" releases: none")
|
||||
releases = []
|
||||
|
||||
repo_hit = False
|
||||
for rel in releases[: args.max_releases]:
|
||||
tag = rel.get("tag_name") or "(no tag)"
|
||||
name = rel.get("name") or "(no name)"
|
||||
assets = rel.get("assets") or []
|
||||
model_assets = [a for a in assets if _looks_like_model_asset(a.get("name", ""))]
|
||||
|
||||
if not model_assets:
|
||||
continue
|
||||
|
||||
repo_hit = True
|
||||
found_any = True
|
||||
print(f" release: {tag} — {name}")
|
||||
for a in model_assets:
|
||||
aname = a.get("name") or "(no name)"
|
||||
size = a.get("size")
|
||||
url = a.get("browser_download_url") or ""
|
||||
size_str = _human_size(int(size)) if isinstance(size, int) else "?"
|
||||
print(f" - {aname} ({size_str})")
|
||||
if url:
|
||||
print(f" {url}")
|
||||
|
||||
if releases and not repo_hit:
|
||||
print(" releases: present, but no model-like assets found")
|
||||
|
||||
if args.scan_tree and isinstance(default_branch, str) and default_branch:
|
||||
try:
|
||||
paths = _scan_repo_tree_for_models(full_name, default_branch)
|
||||
except Exception as e:
|
||||
print(f" repo files: error: {e}")
|
||||
continue
|
||||
|
||||
if paths:
|
||||
found_any = True
|
||||
print(f" repo files: found {len(paths)} model-like paths")
|
||||
for pth in paths[:30]:
|
||||
print(f" - {pth}")
|
||||
if len(paths) > 30:
|
||||
print(" - (more omitted)")
|
||||
else:
|
||||
print(" repo files: no model-like paths found")
|
||||
|
||||
if not found_any:
|
||||
print("\nNo model-like release assets found in inspected repositories.")
|
||||
print("Try refining your query (add 'gguf', model family name, or 'quant').")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
25
__LOCAL_LLMs/oss_llm/quick_checks.sh
Normal file
25
__LOCAL_LLMs/oss_llm/quick_checks.sh
Normal file
@ -0,0 +1,25 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
echo "== Network reachability =="
|
||||
|
||||
: "${CHECK_URLS:=}"
|
||||
|
||||
if [[ -z "${CHECK_URLS}" ]]; then
|
||||
cat <<'TXT'
|
||||
No URLs configured.
|
||||
|
||||
Set CHECK_URLS to a space-separated list of HTTPS URLs you want to test, e.g.:
|
||||
CHECK_URLS="https://github.com https://example.com" ./oss_llm/quick_checks.sh
|
||||
|
||||
This script intentionally does not hardcode any provider endpoints.
|
||||
TXT
|
||||
else
|
||||
for url in ${CHECK_URLS}; do
|
||||
echo "\n--- $url"
|
||||
curl -I --max-time 10 "$url" | head -n 5 || true
|
||||
done
|
||||
fi
|
||||
|
||||
echo "\n== Proxy env vars (current) =="
|
||||
env | grep -E '^(HTTP|HTTPS|ALL|NO)_PROXY=' || echo "(none set)"
|
||||
12
__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
Executable file
12
__LOCAL_LLMs/oss_llm/testNanoGPT/00_setup_env.sh
Executable file
@ -0,0 +1,12 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_venv
|
||||
|
||||
# nanoGPT README lists these deps.
|
||||
"${VENV_PY}" -m pip install --upgrade pip
|
||||
"${VENV_PY}" -m pip install torch numpy transformers datasets tiktoken wandb tqdm
|
||||
|
||||
# Quick sanity check
|
||||
"${VENV_PY}" -c "import torch; print('torch', torch.__version__); print('mps', torch.backends.mps.is_available())"
|
||||
17
__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
Executable file
17
__LOCAL_LLMs/oss_llm/testNanoGPT/10_clone_nanogpt.sh
Executable file
@ -0,0 +1,17 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_git
|
||||
|
||||
mkdir -p "${WORKSPACE_ROOT}/oss_llm"
|
||||
|
||||
if [[ -d "${NANOGPT_DIR}/.git" ]]; then
|
||||
echo "nanoGPT already cloned; updating…"
|
||||
cd "${NANOGPT_DIR}"
|
||||
git pull --ff-only
|
||||
else
|
||||
echo "Cloning nanoGPT into ${NANOGPT_DIR}…"
|
||||
rm -rf "${NANOGPT_DIR}"
|
||||
git clone https://github.com/karpathy/nanoGPT.git "${NANOGPT_DIR}"
|
||||
fi
|
||||
20
__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
Executable file
20
__LOCAL_LLMs/oss_llm/testNanoGPT/20_prepare_shakespeare.sh
Executable file
@ -0,0 +1,20 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_venv
|
||||
require_curl
|
||||
cd_nanogpt
|
||||
|
||||
# Prefer a github.com URL (some networks block raw.githubusercontent.com).
|
||||
# This writes the input file where nanoGPT's prepare script expects it.
|
||||
INPUT_TXT="data/shakespeare_char/input.txt"
|
||||
if [[ ! -f "${INPUT_TXT}" ]]; then
|
||||
mkdir -p "$(dirname "${INPUT_TXT}")"
|
||||
echo "Downloading tiny Shakespeare to ${INPUT_TXT}" >&2
|
||||
curl -fL --retry 3 --retry-delay 1 \
|
||||
-o "${INPUT_TXT}" \
|
||||
"https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt"
|
||||
fi
|
||||
|
||||
"${VENV_PY}" data/shakespeare_char/prepare.py
|
||||
16
__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
Executable file
16
__LOCAL_LLMs/oss_llm/testNanoGPT/30_train_cpu_quick.sh
Executable file
@ -0,0 +1,16 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_venv
|
||||
cd_nanogpt
|
||||
|
||||
"${VENV_PY}" train.py config/train_shakespeare_char.py \
|
||||
--device=cpu --compile=False --dtype=float32 \
|
||||
--eval_interval=10 --eval_iters=10 --log_interval=10 \
|
||||
--block_size=64 --batch_size=12 \
|
||||
--n_layer=4 --n_head=4 --n_embd=128 \
|
||||
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
|
||||
--always_save_checkpoint=True
|
||||
|
||||
echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
|
||||
24
__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
Executable file
24
__LOCAL_LLMs/oss_llm/testNanoGPT/31_train_mps_quick.sh
Executable file
@ -0,0 +1,24 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_venv
|
||||
cd_nanogpt
|
||||
|
||||
# Requires torch with MPS support (Apple Silicon). If MPS isn't available,
|
||||
# fall back to CPU.
|
||||
DEVICE="mps"
|
||||
if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
|
||||
echo "MPS not available; falling back to CPU" >&2
|
||||
DEVICE="cpu"
|
||||
fi
|
||||
|
||||
"${VENV_PY}" train.py config/train_shakespeare_char.py \
|
||||
--device="${DEVICE}" --compile=False --dtype=float32 \
|
||||
--eval_interval=10 --eval_iters=10 --log_interval=10 \
|
||||
--block_size=64 --batch_size=12 \
|
||||
--n_layer=4 --n_head=4 --n_embd=128 \
|
||||
--max_iters=60 --lr_decay_iters=60 --dropout=0.0 \
|
||||
--always_save_checkpoint=True
|
||||
|
||||
echo "Done. Checkpoint should exist at: ${NANOGPT_DIR}/out-shakespeare-char/ckpt.pt"
|
||||
14
__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
Executable file
14
__LOCAL_LLMs/oss_llm/testNanoGPT/40_sample_cpu.sh
Executable file
@ -0,0 +1,14 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
source "$(cd "$(dirname "$0")" && pwd)/_common.sh"
|
||||
require_venv
|
||||
cd_nanogpt
|
||||
|
||||
if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
|
||||
echo "ERROR: ckpt.pt not found. Run training first:" >&2
|
||||
echo " bash oss_llm/testNanoGPT/30_train_cpu_quick.sh" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device=cpu --max_new_tokens=200
|
||||
32
__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
Executable file
32
__LOCAL_LLMs/oss_llm/testNanoGPT/98_smoke_test_mps.sh
Executable file
@ -0,0 +1,32 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# One-shot validation that prefers Apple Silicon MPS.
|
||||
|
||||
bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
|
||||
bash oss_llm/testNanoGPT/00_setup_env.sh
|
||||
bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
|
||||
|
||||
# Train with MPS if available (script falls back to CPU)
|
||||
bash oss_llm/testNanoGPT/31_train_mps_quick.sh
|
||||
|
||||
# Sample with MPS if available, else CPU
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
# shellcheck source=./_common.sh
|
||||
source "${SCRIPT_DIR}/_common.sh"
|
||||
require_venv
|
||||
cd_nanogpt
|
||||
|
||||
DEVICE="mps"
|
||||
if ! "${VENV_PY}" -c "import torch; import sys; sys.exit(0 if torch.backends.mps.is_available() else 1)"; then
|
||||
DEVICE="cpu"
|
||||
fi
|
||||
|
||||
if [[ ! -f "out-shakespeare-char/ckpt.pt" ]]; then
|
||||
echo "ERROR: ckpt.pt not found after training" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
"${VENV_PY}" sample.py --out_dir=out-shakespeare-char --device="${DEVICE}" --dtype=float32 --max_new_tokens=200
|
||||
|
||||
printf '\nOK: nanoGPT MPS smoke test completed (device=%s).\n' "${DEVICE}"
|
||||
11
__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
Executable file
11
__LOCAL_LLMs/oss_llm/testNanoGPT/99_smoke_test_all.sh
Executable file
@ -0,0 +1,11 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# One-shot end-to-end validation.
|
||||
bash oss_llm/testNanoGPT/10_clone_nanogpt.sh
|
||||
bash oss_llm/testNanoGPT/00_setup_env.sh
|
||||
bash oss_llm/testNanoGPT/20_prepare_shakespeare.sh
|
||||
bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
|
||||
bash oss_llm/testNanoGPT/40_sample_cpu.sh
|
||||
|
||||
echo "\nOK: nanoGPT smoke test completed."
|
||||
105
__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
Normal file
105
__LOCAL_LLMs/oss_llm/testNanoGPT/README.md
Normal file
@ -0,0 +1,105 @@
|
||||
# nanoGPT: use & test (local)
|
||||
|
||||
This folder contains runnable scripts to validate **nanoGPT** end-to-end in this workspace.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- `git`, `python3`, and `curl` are available.
|
||||
- GitHub is reachable (for cloning nanoGPT and downloading the tiny Shakespeare text).
|
||||
|
||||
## What the scripts do
|
||||
|
||||
- `00_setup_env.sh`: installs Python deps into the workspace venv.
|
||||
- `10_clone_nanogpt.sh`: clones `karpathy/nanoGPT` into `oss_llm/nanoGPT` (or updates it).
|
||||
- `20_prepare_shakespeare.sh`: downloads/prepares the tiny Shakespeare dataset.
|
||||
- `30_train_cpu_quick.sh`: quick CPU training run (writes `out-shakespeare-char/ckpt.pt`).
|
||||
- `31_train_mps_quick.sh`: quick MPS (Metal) training run (faster on Apple Silicon).
|
||||
- `40_sample_cpu.sh`: samples from the trained checkpoint.
|
||||
- `98_smoke_test_mps.sh`: runs clone → deps → prepare → train (MPS) → sample (MPS).
|
||||
- `99_smoke_test_all.sh`: runs clone → deps → prepare → train → sample.
|
||||
|
||||
## What nanoGPT demonstrates (current capabilities)
|
||||
|
||||
With the default tiny Shakespeare **character-level** example, the checkpoint you train here supports:
|
||||
|
||||
- **Unconditional generation**: start from a newline and generate Shakespeare-ish character patterns.
|
||||
- **Prompted continuation**: provide a short prompt (e.g. a phrase) and generate a continuation.
|
||||
- **Sampling controls**:
|
||||
- `--temperature` controls randomness (lower is more deterministic).
|
||||
- `--top_k` clamps sampling to the top-K next-token candidates (lower is more conservative).
|
||||
|
||||
Reality check: with the default quick config (small model, short training), output will often look like _Shakespeare-shaped gibberish_. That’s expected; the goal is validating the end-to-end training + sampling workflow.
|
||||
|
||||
### Reproduce the demo generations
|
||||
|
||||
From the workspace root:
|
||||
|
||||
1. Ensure you have a checkpoint (CPU):
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/99_smoke_test_all.sh
|
||||
```
|
||||
|
||||
2. Unconditional samples (2 short samples):
|
||||
|
||||
```sh
|
||||
cd oss_llm/nanoGPT
|
||||
./../.venv/bin/python sample.py \
|
||||
--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
|
||||
--num_samples=2 --max_new_tokens=220 --temperature=0.8 --top_k=200
|
||||
```
|
||||
|
||||
3. Prompted continuation (more conservative sampling):
|
||||
|
||||
```sh
|
||||
cd oss_llm/nanoGPT
|
||||
./../.venv/bin/python sample.py \
|
||||
--out_dir=out-shakespeare-char --device=cpu --dtype=float32 \
|
||||
--num_samples=1 --max_new_tokens=220 --temperature=0.4 --top_k=50 \
|
||||
--start="To be, or not to be"
|
||||
```
|
||||
|
||||
## One-command smoke test
|
||||
|
||||
From the workspace root:
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/99_smoke_test_all.sh
|
||||
```
|
||||
|
||||
## One-command MPS smoke test
|
||||
|
||||
From the workspace root:
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/98_smoke_test_mps.sh
|
||||
```
|
||||
|
||||
## Common commands
|
||||
|
||||
- Install deps only:
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/00_setup_env.sh
|
||||
```
|
||||
|
||||
- Quick CPU train + sample:
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/30_train_cpu_quick.sh
|
||||
bash oss_llm/testNanoGPT/40_sample_cpu.sh
|
||||
```
|
||||
|
||||
- Quick MPS train + sample:
|
||||
|
||||
```sh
|
||||
bash oss_llm/testNanoGPT/31_train_mps_quick.sh
|
||||
bash oss_llm/testNanoGPT/40_sample_cpu.sh # sampling can still run on CPU
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The CPU/MPS training scripts intentionally use a tiny model + small iteration count to finish quickly.
|
||||
- If `ckpt.pt` is missing, it usually means training didn’t run an eval step; these scripts set `--eval_interval` low to force checkpoint writes.
|
||||
- Scripts will auto-create `./.venv` on first run if it does not exist.
|
||||
- Dataset download is via a `github.com/.../raw/...` URL to avoid reliance on `raw.githubusercontent.com`.
|
||||
61
__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
Executable file
61
__LOCAL_LLMs/oss_llm/testNanoGPT/_common.sh
Executable file
@ -0,0 +1,61 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Resolve workspace root (this repo) regardless of where the script is called from.
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
WORKSPACE_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
|
||||
|
||||
# Some managed macOS environments set TMPDIR to slow/unusual locations.
|
||||
# Force a local temp directory for predictable behavior.
|
||||
export TMPDIR="/tmp"
|
||||
|
||||
VENV_PY="${WORKSPACE_ROOT}/.venv/bin/python"
|
||||
NANOGPT_DIR="${WORKSPACE_ROOT}/oss_llm/nanoGPT"
|
||||
|
||||
require_python3() {
|
||||
if ! command -v python3 >/dev/null 2>&1; then
|
||||
echo "ERROR: python3 not found in PATH" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
require_curl() {
|
||||
if ! command -v curl >/dev/null 2>&1; then
|
||||
echo "ERROR: curl not found in PATH" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
ensure_venv() {
|
||||
if [[ -x "${VENV_PY}" ]]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
require_python3
|
||||
echo "Creating workspace venv at: ${WORKSPACE_ROOT}/.venv" >&2
|
||||
python3 -m venv "${WORKSPACE_ROOT}/.venv"
|
||||
}
|
||||
|
||||
require_venv() {
|
||||
ensure_venv
|
||||
if [[ ! -x "${VENV_PY}" ]]; then
|
||||
echo "ERROR: Failed to create venv python at: ${VENV_PY}" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
require_git() {
|
||||
if ! command -v git >/dev/null 2>&1; then
|
||||
echo "ERROR: git not found in PATH" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
cd_nanogpt() {
|
||||
if [[ ! -d "${NANOGPT_DIR}" ]]; then
|
||||
echo "ERROR: nanoGPT repo not found at ${NANOGPT_DIR}" >&2
|
||||
echo "Run: bash oss_llm/testNanoGPT/10_clone_nanogpt.sh" >&2
|
||||
exit 1
|
||||
fi
|
||||
cd "${NANOGPT_DIR}"
|
||||
}
|
||||
Loading…
Reference in New Issue
Block a user