docs(devops): add secure single-vm api exposure guidance
This commit is contained in:
parent
ac6d1e0911
commit
626e19f776
@ -4,6 +4,10 @@
|
||||
> Nothing pre-installed required — the script handles everything from a blank Ubuntu machine.
|
||||
> Two files: this README and `setup.sh`. Copy both to the VM and run the script.
|
||||
|
||||
Related:
|
||||
|
||||
- [`SECURE_API_EXPOSURE.md`](./SECURE_API_EXPOSURE.md) — recommended public API exposure model, alternatives, and security guidance for client-facing URLs
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
456
docs/devops/single_azure_vm/docker/SECURE_API_EXPOSURE.md
Normal file
456
docs/devops/single_azure_vm/docker/SECURE_API_EXPOSURE.md
Normal file
@ -0,0 +1,456 @@
|
||||
# Secure API Exposure for Single-Azure-VM Docker Deployment
|
||||
|
||||
> Decision note for the single-VM Docker deployment in this folder.
|
||||
> Goal: stop exposing raw `http://<vm-ip>:<port>` backend URLs directly to web and mobile clients.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
For the current ByteLyst single-VM Docker deployment, the recommended public access pattern is:
|
||||
|
||||
1. Use one public DNS name such as `api.<domain>`.
|
||||
2. Terminate TLS on the VM at the HTTP gateway layer.
|
||||
3. Keep backend containers on the Docker network and avoid publishing their service ports publicly.
|
||||
4. Route requests by path prefix at the gateway:
|
||||
- `/platform`
|
||||
- `/extraction`
|
||||
- `/mcp`
|
||||
- `/peakpulse`
|
||||
- `/chronomind`
|
||||
- `/jarvisjr`
|
||||
- `/nomgap`
|
||||
- `/mindlyst`
|
||||
- `/lysnrai`
|
||||
- `/notelett`
|
||||
- `/flowmonk`
|
||||
- `/actiontrail`
|
||||
- `/localmemgpt`
|
||||
|
||||
For this repo and deployment model, the best near-term choice is:
|
||||
|
||||
- **Primary recommendation:** keep the existing gateway pattern and harden it into a single HTTPS API entrypoint.
|
||||
- **Implementation preference:** either:
|
||||
- **Traefik with ACME/Let's Encrypt** if you want to stay aligned with the existing stack, or
|
||||
- **Caddy** if you want the simplest single-VM HTTPS setup.
|
||||
|
||||
Do **not** keep exposing `4003`, `4005`, `4007`, and `4010-4019` as public app-facing endpoints for production-like clients.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The current single-VM documentation and deployment model expose many direct backend ports:
|
||||
|
||||
- platform services on `4003`, `4005`, `4007`
|
||||
- product backends on `4010` through `4019`
|
||||
|
||||
That creates avoidable issues:
|
||||
|
||||
- raw VM IPs leak into app config
|
||||
- every public port must be opened in the Azure NSG
|
||||
- no central TLS termination point
|
||||
- no single place for CORS, auth forwarding, rate limiting, or request logging
|
||||
- harder DNS migration if the VM changes
|
||||
- mobile and web clients become coupled to infrastructure details
|
||||
|
||||
The right abstraction for clients is a stable HTTPS domain, not a VM IP and port.
|
||||
|
||||
---
|
||||
|
||||
## Options Compared
|
||||
|
||||
### Option A: Keep direct IP:port exposure
|
||||
|
||||
Example:
|
||||
|
||||
- `http://<vm-ip>:4003`
|
||||
- `http://<vm-ip>:4012`
|
||||
|
||||
**Pros**
|
||||
|
||||
- simplest to bootstrap
|
||||
- no reverse-proxy configuration required
|
||||
- easy to debug locally
|
||||
|
||||
**Cons**
|
||||
|
||||
- poor security posture
|
||||
- difficult to hide infrastructure details
|
||||
- not suitable for polished web/mobile clients
|
||||
- many inbound ports remain publicly reachable
|
||||
- forces client config changes if host or port mapping changes
|
||||
- HTTPS must be solved separately per surface
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- acceptable only for temporary prototype smoke testing
|
||||
- not recommended for client-facing environments
|
||||
|
||||
---
|
||||
|
||||
### Option B: Single HTTPS domain with path-based routing on the VM
|
||||
|
||||
Example:
|
||||
|
||||
- `https://api.<domain>/platform/...`
|
||||
- `https://api.<domain>/chronomind/...`
|
||||
|
||||
**Pros**
|
||||
|
||||
- one public URL for all apps
|
||||
- only `80/443` need to be public
|
||||
- TLS termination is centralized
|
||||
- easiest client configuration model
|
||||
- easy to add middleware for auth, CORS, rate limits, headers, and logging
|
||||
- easiest migration path from one VM to another
|
||||
|
||||
**Cons**
|
||||
|
||||
- gateway config must be maintained
|
||||
- path-prefix design requires route hygiene across services
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- **best fit for the current single-VM ByteLyst deployment**
|
||||
|
||||
---
|
||||
|
||||
### Option C: Per-service HTTPS subdomains
|
||||
|
||||
Example:
|
||||
|
||||
- `https://platform-api.<domain>`
|
||||
- `https://chronomind-api.<domain>`
|
||||
- `https://jarvisjr-api.<domain>`
|
||||
|
||||
**Pros**
|
||||
|
||||
- clean separation by product/service
|
||||
- natural fit if products will be separated operationally later
|
||||
- easier to apply product-specific policies at DNS/cert level
|
||||
|
||||
**Cons**
|
||||
|
||||
- more DNS records
|
||||
- more certificates/listeners to manage
|
||||
- more client env variables than a single API hostname
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- good alternative if product isolation is a stronger goal than client simplicity
|
||||
- still better than raw IP:port exposure
|
||||
|
||||
---
|
||||
|
||||
### Option D: Azure Application Gateway in front of the VM
|
||||
|
||||
**Pros**
|
||||
|
||||
- Azure-native L7 entrypoint
|
||||
- TLS termination
|
||||
- host/path routing
|
||||
- WAF support
|
||||
- health probes and backend pools
|
||||
|
||||
**Cons**
|
||||
|
||||
- extra Azure cost and operational layer
|
||||
- more infrastructure to provision and maintain
|
||||
- heavier than needed for one raw VM unless WAF/compliance is required
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- good when you need Azure-managed edge controls or WAF
|
||||
- otherwise more than this deployment currently needs
|
||||
|
||||
---
|
||||
|
||||
### Option E: Azure Front Door in front of the VM
|
||||
|
||||
**Pros**
|
||||
|
||||
- global edge entrypoint
|
||||
- TLS termination
|
||||
- WAF support
|
||||
- good for multi-region or globally distributed traffic
|
||||
|
||||
**Cons**
|
||||
|
||||
- overkill for a single-region, single-VM deployment
|
||||
- extra cost and configuration
|
||||
- best value appears when you have multiple regions or origins
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- not the first choice for this repo's current single-VM architecture
|
||||
|
||||
---
|
||||
|
||||
### Option F: Azure API Management in front of the VM
|
||||
|
||||
**Pros**
|
||||
|
||||
- strongest API product/governance story
|
||||
- subscriptions, API versioning, developer portal, transformations, quotas
|
||||
- very good if third parties consume your APIs
|
||||
|
||||
**Cons**
|
||||
|
||||
- significantly heavier and costlier than a simple gateway
|
||||
- more than needed for first-party web/mobile clients on one VM
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- consider later if ByteLyst needs external developer APIs or formal API governance
|
||||
- not the first thing to add for this deployment
|
||||
|
||||
---
|
||||
|
||||
### Option G: Caddy on the VM
|
||||
|
||||
**Pros**
|
||||
|
||||
- very simple configuration
|
||||
- automatic HTTPS
|
||||
- good default fit for one VM and one domain
|
||||
- lower operational overhead than a more feature-rich proxy
|
||||
|
||||
**Cons**
|
||||
|
||||
- less aligned with the current repo's Traefik-based stack references
|
||||
- if the team already standardizes on Traefik, introduces another gateway pattern
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- best choice if simplicity is the priority and gateway features stay modest
|
||||
|
||||
---
|
||||
|
||||
### Option H: Traefik on the VM
|
||||
|
||||
**Pros**
|
||||
|
||||
- already present in the ecosystem stack and docs
|
||||
- supports host/path routing and ACME
|
||||
- works well with Docker labels and compose-driven deployments
|
||||
- consistent with the repo's current gateway direction
|
||||
|
||||
**Cons**
|
||||
|
||||
- current docs and compose posture still expose too many ports directly
|
||||
- must be hardened:
|
||||
- remove insecure dashboard exposure
|
||||
- add TLS
|
||||
- stop publishing backend ports publicly
|
||||
|
||||
**Recommendation**
|
||||
|
||||
- best choice if we want minimal architectural change from the current stack
|
||||
|
||||
---
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Option | Security | Complexity | Azure cost | Fit for current repo | Recommended |
|
||||
| ---------------------------- | -------- | ----------- | ----------- | ----------------------------- | ----------- |
|
||||
| Direct IP:port | Low | Low | Low | Prototype only | No |
|
||||
| Single domain + path routing | High | Medium | Low | Excellent | **Yes** |
|
||||
| Per-service subdomains | High | Medium | Low | Good | Yes |
|
||||
| Application Gateway | High | Medium/High | Medium | Good | Maybe later |
|
||||
| Front Door | High | High | Medium/High | Weak for single VM | No for now |
|
||||
| API Management | High | High | High | Weak for first-party apps now | Later only |
|
||||
| Caddy | High | Low | Low | Good | **Yes** |
|
||||
| Traefik | High | Medium | Low | Excellent | **Yes** |
|
||||
|
||||
---
|
||||
|
||||
## Recommended Path
|
||||
|
||||
### Recommendation 1: Public clients should know only one API hostname
|
||||
|
||||
Use:
|
||||
|
||||
- `https://api.<domain>`
|
||||
|
||||
Do not use:
|
||||
|
||||
- `http://<vm-ip>:4003`
|
||||
- `http://<vm-ip>:4012`
|
||||
- `http://<vm-ip>:4019`
|
||||
|
||||
### Recommendation 2: Use path-based routing for all APIs
|
||||
|
||||
Suggested mapping:
|
||||
|
||||
- `https://api.<domain>/platform`
|
||||
- `https://api.<domain>/extraction`
|
||||
- `https://api.<domain>/mcp`
|
||||
- `https://api.<domain>/peakpulse`
|
||||
- `https://api.<domain>/chronomind`
|
||||
- `https://api.<domain>/jarvisjr`
|
||||
- `https://api.<domain>/nomgap`
|
||||
- `https://api.<domain>/mindlyst`
|
||||
- `https://api.<domain>/lysnrai`
|
||||
- `https://api.<domain>/notelett`
|
||||
- `https://api.<domain>/flowmonk`
|
||||
- `https://api.<domain>/actiontrail`
|
||||
- `https://api.<domain>/localmemgpt`
|
||||
|
||||
This is the cleanest model for web and mobile apps because they only need:
|
||||
|
||||
- one host
|
||||
- one TLS policy
|
||||
- one place for future cross-cutting controls
|
||||
|
||||
### Recommendation 3: Keep only `80/443` public for app traffic
|
||||
|
||||
Publicly reachable:
|
||||
|
||||
- `80`
|
||||
- `443`
|
||||
|
||||
Not publicly reachable for clients:
|
||||
|
||||
- `4003`
|
||||
- `4005`
|
||||
- `4007`
|
||||
- `4010-4019`
|
||||
|
||||
Operational ports should also be restricted or removed from the public internet where possible:
|
||||
|
||||
- `3000` Grafana
|
||||
- `3300` Gitea
|
||||
- `8025` Mailpit
|
||||
- `8080` Traefik dashboard
|
||||
- `10000` Azurite
|
||||
- `11434` Ollama
|
||||
|
||||
### Recommendation 4: Choose Traefik or Caddy based on team preference
|
||||
|
||||
#### If staying close to the current repo
|
||||
|
||||
Use **Traefik** and harden it:
|
||||
|
||||
- enable ACME / Let's Encrypt
|
||||
- remove insecure dashboard exposure
|
||||
- use host/path routers for each backend
|
||||
- stop exposing backend container ports publicly
|
||||
|
||||
#### If optimizing for the fewest moving parts
|
||||
|
||||
Use **Caddy**:
|
||||
|
||||
- simpler config
|
||||
- automatic HTTPS
|
||||
- excellent fit for one VM and one API domain
|
||||
|
||||
### Recommendation 5: Do not add Azure Front Door or API Management yet
|
||||
|
||||
Those make sense later if one of these becomes true:
|
||||
|
||||
- global traffic and multi-region failover matter
|
||||
- a WAF is mandatory at the Azure edge
|
||||
- third-party API consumers need subscriptions, products, quotas, or a developer portal
|
||||
|
||||
For the current single-VM Docker deployment, they add more complexity than value.
|
||||
|
||||
---
|
||||
|
||||
## Concrete Recommendation for ByteLyst
|
||||
|
||||
### Near-term
|
||||
|
||||
- Keep the current single VM.
|
||||
- Keep Docker Compose.
|
||||
- Keep a gateway in front of services.
|
||||
- Move clients to **one HTTPS API hostname**.
|
||||
- Use **path-based routing**.
|
||||
- Remove public exposure of backend ports.
|
||||
|
||||
### Preferred implementation
|
||||
|
||||
- **Preferred:** harden the existing Traefik-based gateway approach
|
||||
- **Alternative:** replace Traefik with Caddy if the team values simpler config over consistency with current stack references
|
||||
|
||||
### Why this is the best fit
|
||||
|
||||
- smallest change from the current architecture
|
||||
- lowest added Azure cost
|
||||
- best client developer experience
|
||||
- hides VM IP and port details
|
||||
- keeps future migration options open
|
||||
|
||||
---
|
||||
|
||||
## Suggested Client Base URLs
|
||||
|
||||
### Web and mobile
|
||||
|
||||
Use one shared base host:
|
||||
|
||||
- `https://api.<domain>`
|
||||
|
||||
Then per-service prefixes:
|
||||
|
||||
- platform: `https://api.<domain>/platform`
|
||||
- extraction: `https://api.<domain>/extraction`
|
||||
- mcp: `https://api.<domain>/mcp`
|
||||
- peakpulse: `https://api.<domain>/peakpulse`
|
||||
- chronomind: `https://api.<domain>/chronomind`
|
||||
- jarvisjr: `https://api.<domain>/jarvisjr`
|
||||
- nomgap: `https://api.<domain>/nomgap`
|
||||
- mindlyst: `https://api.<domain>/mindlyst`
|
||||
- lysnrai: `https://api.<domain>/lysnrai`
|
||||
- notelett: `https://api.<domain>/notelett`
|
||||
- flowmonk: `https://api.<domain>/flowmonk`
|
||||
- actiontrail: `https://api.<domain>/actiontrail`
|
||||
- localmemgpt: `https://api.<domain>/localmemgpt`
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes for This Folder
|
||||
|
||||
To make this real in the single-VM Docker deployment, this folder should eventually be updated to:
|
||||
|
||||
1. change the documented public access model from `http://<vm-ip>:<port>` to `https://api.<domain>/<service>`
|
||||
2. reduce Azure NSG public inbound rules to `80/443` for app traffic
|
||||
3. stop publishing backend ports publicly in compose for client access
|
||||
4. add gateway routing rules for all backend services
|
||||
5. add TLS automation
|
||||
6. lock down or remove public access to operational tools
|
||||
|
||||
---
|
||||
|
||||
## Final Recommendation
|
||||
|
||||
**Recommended architecture for this repo's single-Azure-VM Docker deployment:**
|
||||
|
||||
- **One HTTPS API domain**
|
||||
- **Path-based routing**
|
||||
- **Backends private on the Docker network**
|
||||
- **Traefik hardened and kept, or Caddy adopted for simplicity**
|
||||
- **Do not expose raw service ports to web/mobile clients**
|
||||
|
||||
If the team wants the least disruptive route, use **Traefik + ACME + path routing**.
|
||||
If the team wants the simplest operator experience, use **Caddy + path routing**.
|
||||
|
||||
For the current deployment, **Azure Front Door**, **Application Gateway**, and **API Management** are valid alternatives, but they are not the best first move.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Microsoft Learn, Application Gateway components:
|
||||
- https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-components
|
||||
- Microsoft Learn, Azure Front Door overview via Microsoft Learn guidance:
|
||||
- https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-howto-work-with-azure-front-door
|
||||
- Caddy documentation, automatic HTTPS and reverse proxy:
|
||||
- https://caddyserver.com/docs/
|
||||
- https://caddyserver.com/docs/caddyfile/directives/reverse_proxy
|
||||
- Traefik documentation:
|
||||
- https://doc.traefik.io/traefik/
|
||||
- Existing repo context:
|
||||
- `docs/devops/single_azure_vm/docker/README.md`
|
||||
- `docs/devops/SINGLE_VM_ENHANCED_PLAN.md`
|
||||
- `docs/audits/AI_SECURITY_AUDIT_REPORT.md`
|
||||
Loading…
Reference in New Issue
Block a user