Private AI clusters pooled over the internet, on your hardware.
No VPS, no API fees.
Run massive open-source models no single machine could host. Send one invite link, everyone joins, everyone shares the compute.
$ pip install progresspals
Three commands. Real distributed inference.
No coordination overhead, no Kubernetes, no public swarms. Just your people, your hardware, your model.
Create a swarm
Pick a model. The CLI detects your hardware, claims the layers you can hold, and generates a single-use invite link.
Invite your team
Share one link. Each peer joins, downloads only their assigned slice of the model, and contributes their GPU/RAM.
Run it like OpenAI
Start the local OpenAI-compatible endpoint. Point Cursor, Aider, Continue, or any SDK at it. Inference flows through the chain.
A drop-in replacement for OpenAI — running on your team's hardware.
pals serve exposes the swarm as a local OpenAI-compatible endpoint at http://localhost:11434/v1. Any tool that speaks the OpenAI API works unchanged — point it at your endpoint and it codes, chats, and reasons through your private cluster.
POST /v1/chat/completions · POST /v1/completions · GET /v1/models · SSE streaming
Plug it into the tools your team
already uses.
Because the swarm exposes a standard OpenAI-compatible endpoint, anything in your agent stack — coding harnesses, gateways, frameworks — just works.
Coding agents
Your team's swarm becomes the brain inside the IDE. Point the agent at the local endpoint and it codes, edits, and refactors against your shared cluster.
# Cursor → Settings → Models → Custom OpenAI Base URL
http://localhost:11434/v1
# Aider
aider --openai-api-base http://localhost:11434/v1 \
--openai-api-key any-stringPersonal AI agents
Self-hosted agents and assistants that already speak the OpenAI API. Swap the provider URL for the swarm and they run on your team's hardware instead of someone else's GPUs.
# Most gateways read these standard env vars: export OPENAI_API_BASE=http://localhost:11434/v1 export OPENAI_BASE_URL=http://localhost:11434/v1 export OPENAI_API_KEY=any-string
Agent frameworks & SDKs
Stack your own agents on top. Anything built on the OpenAI SDK accepts a base_url override — your swarm becomes the model layer underneath multi-agent orchestration, RAG, evals, anything.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="any-string",
)
client.chat.completions.create(
model="meta-llama/Llama-3.1-405B-Instruct",
messages=[{"role":"user","content":"..."}],
)Every tool listed accepts a custom OpenAI base URL. If yours does, it will too — there is no special integration, just the standard /v1/chat/completions contract with SSE streaming.
Built for teams that want
their own models, privately.
Everything you need to stand up a serious cluster with people you trust — without renting a single GPU.
Invite-only swarms
No public discovery, no random peers. Single-use tokens, regenerable, expiring. Only people you invite can join.
OpenAI-compatible endpoint
`pals serve` exposes /v1/chat/completions, /v1/completions, /v1/models with SSE streaming. Cursor, Aider, Continue work unchanged.
Encrypted activations
Per-swarm AES-256 key derived from the invite token via HKDF. Activation tensors are encrypted before leaving each peer.
Pipeline parallelism
Built on Petals. Each peer holds a contiguous slice of the model. The coordinator balances layer assignment automatically.
Run 405B on consumer GPUs
Llama 405B, Mixtral 8x22B, Falcon 180B, BLOOM 176B — models no single machine can hold. Spread them across 4, 8, 20 peers.
Member controls
Live peer list with status, layers held, throughput. Kick peers, regenerate invites, cap swarm size — all from the CLI or TUI.
Llama 405B on a few laptops.
Anything Petals supports out of the box, your swarm can run. New architectures land as upstream Petals adds them.
HuggingFace model IDs work directly — just pass --model to pals create.
Llama
Meta- ›2 70B
- ›3 8B
- ›3 70B
- ›3.1 8B
- ›3.1 70B
- ›3.1 405B
- ›3.3 70B
Mixtral
Mistral- ›8x7B
- ›8x22B
Falcon
TII- ›40B
- ›180B
BLOOM
BigScience- ›176B
We tell you exactly what the trust model is.
Only invite people you trust.
We are not a public network. There is no swarm discovery, no stranger prompts, no content moderation queue. Your swarm is exactly the people you sent the link to.
Activations are encrypted in transit.
We derive a 256-bit AES key from your invite token via HKDF. Tensors are encrypted before leaving a peer and decrypted on arrival. The key never leaves member machines — Supabase only stores a hash.
What we do not pretend.
P2P inference exposes IP addresses to other swarm members. The first peer in the chain sees decrypted inputs. We sandbox computation where the OS allows it, but this is not a hardware enclave. Use a VPN if the threat model demands it.
Free. The whole thing.
No paid tier yet. We will add one based on what teams actually ask for — not before.
Everything. No card. No usage caps beyond the 50-peer hard ceiling.
- Create private swarms up to 50 peers
- Single-use invite tokens, regenerable
- Encrypted activations (AES-256, HKDF-derived)
- Member list, kick, throughput stats
- Full CLI access (pals init / create / join / run)
- OpenAI-compatible local endpoint (pals serve)
- TUI dashboard (Textual)
- Supabase-backed auth and invite verification
Questions teams ask
before they install.
If yours is not here, the answer is probably either in how it works or in the trust model.
What is ProgressPals?
Private, peer-to-peer AI inference. You and a small group of trusted people pool your hardware over the internet to run large open-source models that no single machine could host on its own. One CLI, one invite link, one local OpenAI-compatible endpoint.
What models can my swarm run?
Llama 2, 3, 3.1 and 3.3 up to 405B, Mixtral 8x7B and 8x22B, Falcon 40B and 180B, BLOOM 176B. Pass any supported HuggingFace model ID to pals create --model.
Can my team use it with Cursor, Aider, or our agent framework?
Yes. pals serve exposes an OpenAI-compatible endpoint at http://localhost:11434/v1. Point Cursor, Cline, Roo Code, Continue, Aider, Zed, OpenClaw, Open WebUI, n8n, LangChain, LlamaIndex, AutoGen, CrewAI, the Vercel AI SDK, or anything that uses the OpenAI SDK directly at it — no code changes.
Who can see my prompts?
The first peer in your chain decrypts your input to run their layers — that is how transformer inference works at all. Activations between subsequent peers are encrypted with a per-swarm AES-256 key derived from the invite token. The trust model is simple and honest: only invite people you would trust to see your prompts.
Why private swarms only?
Public AI swarms create content moderation queues, expose users to stranger prompts, and pile on legal liability. Removing public swarms removes all three. You only compute on (and decrypt inputs from) people you actually invited.
Do I need a GPU?
Strongly recommended. Each peer's contribution scales with how many model layers their VRAM can hold. CPU-only peers can technically join, but throughput will be slow enough that you probably want at least one consumer GPU per peer.
Does it work on Apple Silicon (M1, M2, M3, M4)?
Yes. Apple Silicon Macs can join any swarm and contribute layers — the install needs one extra build step the CLI handles for you. Per-pal throughput is lower than on equivalent NVIDIA hardware in v1 (we use PyTorch's Metal path), so a Mac is best as one pal in a mixed swarm or as a client running pals serve. A Mac-native MLX backend is on the roadmap.
How many peers do I need for a big model?
It depends on the model and how aggressively it is quantized, but the rule is intuitive: more layers in the model, or less VRAM per peer, means more peers. The CLI detects each peer's hardware on join and the coordinator auto-assigns layer ranges to balance the chain.
Is it really free?
Yes. No paid tier yet. We will add one when we have real signal from teams about what is worth charging for — not before.
Start your first swarm
in under five minutes.
Linux and macOS. Works with any consumer GPU Petals supports.