β‘ Quick Start#
Get your first AI response in under 5 minutes.
Step 1 β Install#
pip install aicortex-core
Step 2 β Your First Chat#
from aicortex import chat
response = chat("What is quantum computing?")
print(response)
Thatβs it. No API key. No config file. No server setup required.
AI Cortex automatically selects a model and routes your request to an available Ollama server from its bundled endpoint registry.
Step 3 β Pick a Model#
AI Cortex comes pre-loaded with metadata for hundreds of models across five families. You can specify exactly which one you want:
from aicortex import chat, families, models
# See what model families are available
print(families())
# β ['llama', 'mistral', 'gemma', 'deepseek', 'qwen']
# List all models in a family
print(models("mistral"))
# β ['mistral:7b', 'mistral:instruct', ...]
# Chat with a specific model
response = chat(
"Explain transformer architecture in plain English.",
model="mistral:7b",
temperature=0.6,
max_tokens=300,
)
print(response)
Step 4 β Stream Responses in Real Time#
Streaming gives you token-by-token output as the model generates β perfect for chatbots and interactive UIs:
from aicortex import chat
stream = chat("Write a short poem about the ocean.", stream=True)
for event in stream:
if event.type == "start":
print("π’ Generating...\n")
elif event.type == "token":
print(event.content, end="", flush=True)
elif event.type == "end":
print("\n\nβ
Done!")
π‘ Tip: Use
stream.text()to get the full concatenated response after iterating:full_text = stream.text()
Step 5 β Explore Model Metadata#
from aicortex import get_model_info, list_model_servers, get_server_info
# Full metadata for a model (size, family, quantization, etc.)
info = get_model_info("llama3.2:3b")
print(info)
# See all Ollama servers hosting a specific model
servers = list_model_servers("llama3.2:3b")
for s in servers:
print(f" {s['url']} β {s['location']['city']}, {s['location']['country']}")
# Get connection params for use with LangChain's OllamaLLM
from aicortex import get_llm_params
params = get_llm_params("llama3.2:3b")
print(params) # {'model': 'llama3.2:3b', 'base_url': 'http://...'}
Step 6 β Run the OpenAI-Compatible Server (Optional)#
Turn AI Cortex into an OpenAI-compatible REST API with one call:
pip install aicortex-core[server]
from aicortex.tools import run_server
run_server(host="127.0.0.1", port=8000, default_model="llama3.2:3b")
Then use it with curl, the openai Python SDK, or any OpenAI-compatible tool:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Summary: What You Just Learned#
Task |
How |
|---|---|
Simple chat |
|
Specific model |
|
Streaming |
|
List models |
|
Model info |
|
Server mode |
|
Where to Go Next#
π Basic Usage β all parameters, error handling, and advanced patterns
π Streaming β deep dive into
StreamEventtypes and real-time patternsπ€ Model Management β how the model registry works
π₯οΈ Server Mode β full OpenAI-compatible proxy docs
π§ Tools β update the model database with live endpoint scanning