Circular vDAG Tutorial: A Deep Dive into AIOSv1 Policies
Introduction and Overview
Welcome! This tutorial demonstrates how to build and run a sophisticated, circular vDAG using AIOSv1. We'll construct a multi-turn debate system featuring two debaters (A and B) and a judge. By the end of this video, you'll be able to build your own self-correcting, debating, or collaborative systems.
In this tutorial, your take away will be: - How to implement a circular workflow using post-processing routing policies. - How to use a pre-processing policy to create and inject a "running summary" for long-term context. - How to deploy and manage the vDAG on a Kubernetes cluster.
This directory contains local copies of all the policy and model inference code, and the vDAG itself is defined inline, making this a self-contained example.
Tutorial Overview
- What are Policies and Blocks in AIOSv1?: Understand the core components that separate logic from inference.
- Implementation Deep Dive: See the high-level architecture of our three-agent debate system.
- Code and Policies: A Closer Look: Examine the local Python scripts that define our agent behaviors.
- Circular vDAG spec: Review the declarative JSON that defines the vDAG structure.
- The Debate Flow and History: Trace the data packets as they move between agents.
- Knobs to Tune for More Control: Learn how to change the debate's behavior by adjusting policy parameters.
- Deploying and Managing the vDAG: Use API calls to register and deploy the vDAG on a cluster.
- Running and Inspecting the Debate: Trigger the debate with an inference call and see the results.
- Observing logs in K8s: Monitor the real-time interaction between agents via Kubernetes logs.
- Troubleshooting: Get tips for debugging common issues in a circular vDAG.
- Cleanup: Learn how to properly remove the vDAG and its controller from the cluster.
1. What are Policies in AIOSv1?
A policy is a dynamically loadable, executable Python code that is used in various places and use cases across the AIOS system. Since policies are dynamic, they allow developers to implement custom functionalities throughout the AIOS system. Below are some examples of polices used in AIOS that are relevant for this tutorial as well. - Preprocessing Policies: Modify incoming requests. In this tutorial, we use one to manage a conversation summary. - Postprocessing Policies: Modify outgoing responses. Here, we use them to route the conversation between debaters and the judge.
📖 Further Reading: AIOSv1 Policies System Overview
What is a "Block" in AIOSv1?
Block the core component AIOSv1 responsible for instantiating, serving, scaling and managing the AI inference or any general computational workload defined using AIOSv1 Instance SDK. In this tutorial, a "block" is the core component that executes your model code(e.g., running a Llama.cpp model) and has several policies associated with it. It is intentionally kept simple and focused on inference. The complex logic is offloaded to the policies that wrap it. A block can be composed of below policies apart from the other policies in the link below:
1. Preprocessing Policy (Optional): Acts on the request before it hits your model.
2. Inference Code: The actual model execution.
3. Postprocessing Policy (Optional): Acts on the response from your model.
📖 Further Reading: What is a Block?
2. Implementation Deep Dive
Here is a high-level overview of the components in our circular debate system.
- Preprocessing (Summarizer Policy)
- Builds a per-session
recent_turnswindow, summarizing the conversation periodically. - This summary is then injected back into the prompt, giving the models long-term context.
-
The summarization has its own knobs, like cadence (
summarize_every_n_messages) and token thresholds (min_tokens_for_summarization). -
Debater Inference Code (
main_debate_simple.py) - A simple inference wrapper with a fixed role (A or B) using AIOSv1's LLM SDK
- It constructs a prompt from the topic and the opponent's last turn.
-
If a
running_summaryis available (from the preprocessor), it's injected as a system message. -
Debater Postprocessing (Router Policy)
- This is where the core routing logic lives.
- It decides whether to send the response to the opponent or escalate to the judge based on a set of rules.
-
For example, it uses
judge_interval_roundsto escalate the debate to the judge every N rounds for a periodic review. It also usesmax_consec_by_same_roleas a safeguard to prevent one debater from dominating the conversation. -
Judge Inference Code (
main_judge_capped.py) - Builds a prompt for the judge using the topic and the full recent history.
-
The judge's job is to assess the state of the debate and decide whether it should continue or end.
-
Judge Postprocessing (Router Policy)
- Parses the judge's decision from the model's output. A
CONTINUEdecision routes the conversation back to one of the debaters, maintaining the circular flow. AFINAL_JUDGMENTdecision terminates the vDAG execution. -
It also enforces global caps like
max_roundsto ensure the debate eventually concludes, preventing infinite loops. -
Stopping Conditions
- The debate ends when the Judge returns
FINAL_JUDGMENTor when a hard cap (likemax_rounds) is reached.
3. Code and Policies: A Closer Look
This tutorial is self-contained. All the code for the model inferences and policies is located in this directory for easy reference. The policies are assumed to be pre-registered with AIOS.
Local Code and Policies
- Model Inference Code:
- [Debater (
main_debate_simple.py)] (./model_inference/main_debate_simple.py) - [Judge (
main_judge_capped.py)] (./model_inference/main_judge_capped.py) - Policy Code:
- Preprocessing Summarizer
- Postprocessing Debater Router
- Postprocessing Judge Router
A Note on the Inference Code and Policies
It's important to understand how the inference code and policies collaborate.
Inference Code: The Python scripts (main_debate_simple.py and main_judge_capped.py) are intentionally minimal. Their main job is to:
1. Receive a request.
2. Format a prompt based on the input data.
3. Call the LLM for inference.
4. Return the raw output.
They use specific system messages to guide the behavior of the models:
- Judge System Message:
You are an impartial debate judge. You will be given TOPIC, ROUNDS, and the latest turns from A and B. Decide strictly by clarity, relevance to the topic/instruction, coherence, and factual plausibility. Return exactly: DECISION: CONTINUE_A|CONTINUE_B|FINAL_JUDGMENT If FINAL_JUDGMENT, also return: WINNER: A|B|DRAW REASON: <very short reason> Output only the specified fields with no extra text.
- Debater System Message:
You are a debate participant debater-A or role A in a router-orchestrated exchange. Provide a concise, on-topic argument that advances your side, directly addressing the latest message and also based on your past arguments. Do not declare a winner, do not judge, and do not ask who speaks next. Be specific, factual where possible, and avoid meta-comments or system/control tokens. Keep the response short and self-contained in 100 words.
Policies Calling LLMs: Policies can also execute their own inference calls. For instance, the preprocessing summarizer needs to can call an LLM to generate a summary. This is typically done by using a utility client within the policy code to call an "external" LLM service. This "external" service could even be another AIOS Block optimized for summarization. This powerful feature allows you to build complex, multi-model workflows where policies act as intelligent agents, preparing and routing data between different specialized models. One can even use the same models for summarization and review via rest or grpc calls.
4. Circular vDAG specification
# This Python dictionary defines the entire circular vDAG.
# Notice how each node is a "model_inference" and specifies its pre- and post-processing policies.
# The "graph" structure below defines the static connections between the nodes for the initial request.
# However, the circular behavior (e.g., Debater -> Debater -> Judge -> Debater) is not defined here.
# Instead, it is dynamically managed by the post-processing router policies, which decide where to send the packet next based on the rules we've defined.
circular_vdag_spec = {
"parser_version": "Parser/V1",
"body": {
"spec": {
"values": {
"vdagName": "llm-circular-vdag-demo-17",
"vdagVersion": { "version": "1.0.0", "release-tag": "stable" },
"discoveryTags": ["vdag-llm", "circular-vdag"],
"controller": {},
"nodes": [
{
"spec": {
"values": {
"nodeLabel": "debater-A",
"nodeType": "block",
"manualBlockId": "llama4-scout-17b-block-circular",
"preprocessingPolicyRule": {"policyRuleURI": "preprocessing_policy_for_summarization:0.0.1-stable"},
"postprocessingPolicyRule": {"policyRuleURI": "postprocessing_policy_router_debater:0.0.1-stable"},
"modelParameters": {}
},
"IOMap": [
{
"inputs": [{ "name": "input_0", "reference": "input_0" }],
"outputs": [{ "name": "output_0", "reference": "output_0" }]
}
]
}
},
{
"spec": {
"values": {
"nodeLabel": "debater-B",
"nodeType": "block",
"manualBlockId": "magistral-small-2506-llama-cpp-block-circular",
"preprocessingPolicyRule": {"policyRuleURI": "preprocessing_policy_for_summarization:0.0.1-stable"},
"postprocessingPolicyRule": {"policyRuleURI": "postprocessing_policy_router_debater:0.0.1-stable"},
"modelParameters": {}
},
"IOMap": [
{
"inputs": [{ "name": "input_0", "reference": "input_0" }],
"outputs": [{ "name": "output_0", "reference": "output_0" }]
}
]
}
},
{
"spec": {
"values": {
"nodeLabel": "judge-llm",
"nodeType": "block",
"manualBlockId": "deepseek-r1-distill-70b-block-circular",
"preprocessingPolicyRule": {"policyRuleURI": "preprocessing_policy_for_summarization:0.0.1-stable"},
"postprocessingPolicyRule": {
"policyRuleURI": "postprocessing_policy_router_judge:0.0.1-stable"
},
"modelParameters": {}
},
"IOMap": [
{
"inputs": [{ "name": "input_0", "reference": "input_0" }],
"outputs": [{ "name": "output_0", "reference": "output_0" }]
}
]
}
}
],
"graph": {
"input": [
{
"nodeLabel": "debater-A",
"inputNames": [
"input_0"
]
}
],
"connections": [
{
"nodeLabel": "debater-B",
"inputs": [
{
"nodeLabel": "debater-A",
"outputNames": [
"output_0"
]
}
]
},
{
"nodeLabel": "judge-llm",
"inputs": [
{
"nodeLabel": "debater-B",
"outputNames": [
"output_0"
]
}
]
}
],
"output": [
{
"nodeLabel": "judge-llm",
"outputNames": [
"output_0"
]
}
]
}
}
}
}
}
5. The Debate Flow and History
The data packets flowing between the model_inferences have a canonical schema enforced by the policies. This ensures that each component gets the information it needs in a predictable format.
Canonical Packet Examples
Debater model_inference → Debater Router The model output is simple. The router will use this to construct the next packet.
{
"reply": "Opening argument...",
"prev_role": "A",
"topic": "Is remote work more productive?",
"session_id": "debate_207",
"router_meta": {
"running_summary": "...",
"recent_turns": [{"role":"A","reply":"Opening argument..."}]
}
}
Debater Router → Opponent The router transforms the packet into the canonical format for the next debater.
{
"prev_turn_text": "Opening argument...",
"prev_turn_role": "A",
"receiver_role": "B",
"topic": "Is remote work more productive?",
"session_id": "debate_207",
"router_meta": {"recent_turns": [{"role":"A","text":"Opening argument..."}]}
}
Debater Router → Judge (on escalation) When it's time for the judge to review, the packet contains the history needed to make a decision.
{
"topic": "Is remote work more productive?",
"session_id": "debate_207",
"router_meta": {"router_counts": {"A": 2, "B": 2}, "recent_turns": [/* ... */]}
}
Judge model_inference → Judge Router The judge outputs a decision, which the judge's router will interpret.
{
"judge_text": "... DECISION: CONTINUE_A",
"opponent_last": "...",
"bump_round": true,
"topic": "Is remote work more productive?",
"session_id": "debate_207",
"router_meta": {"router_counts": {"A": 2, "B": 2}, "recent_turns": [/* ... */]}
}
Judge Router → A/B (on continue) If the judge decides to continue, the router sends a packet to the appropriate debater.
{
"prev_turn_text": "Opponent last reply...",
"prev_turn_role": "B",
"receiver_role": "A",
"topic": "Is remote work more productive?",
"session_id": "debate_207"
}
6. Knobs to Tune for More Control
The behavior of the debate is controlled by parameters compiled into the policies. You can adjust these to change the dynamics of the conversation.
- Preprocessing Summarizer (
preprocessing_policy_for_summarization:0.0.1-stable) summarize_every_n_messages(default 3): Controls the frequency of summarization. A lower number means more frequent summaries, which provides better context but increases computational overhead.history_max_messages(default 3): Defines the size of the sliding window of conversation turns used for the summary. A larger window provides more context to the summarizer model.min_tokens_for_summarization(default 300): A gate to prevent summarizing very short exchanges, saving resources.-
include_last_summary_in_prompt(true): Enables chained summaries, where the previous summary is included in the prompt for the next one, creating a continuous thread of context. -
Debater Router (
postprocessing_policy_router_debater:0.0.1-stable) review_threshold(default 0.5): An optional quality gate. If the model output includes a confidence score, the router can use this threshold to decide whether to accept the response or retry. (Note: This is not used in the current simple debater).judge_interval_rounds(default 4): Sets the cadence for periodic review by the judge. After this many rounds, the debate is automatically escalated to the judge.-
max_consec_by_same_role(default 3): A safety measure to prevent one debater from making multiple consecutive arguments, ensuring a balanced conversation. -
Judge Router (
postprocessing_policy_router_judge:0.0.1-stable) max_rounds(default 20): A hard cap on the total number of rounds in the debate. This acts as a failsafe to prevent infinite loops and control costs.judge_continue_cap(default 5): Limits the number of times the judge can return aCONTINUEdecision. After this cap is reached, the judge is forced to make aFINAL_JUDGMENT, ensuring the debate concludes.
Tip: Watch the pod logs to see the exact prompts being constructed, especially when summaries are used.
7. Deploying and Managing the vDAG
The following commands show how to register the vDAG, create a controller, and run inference.
Note: Adjust the IP addresses to match your cluster's endpoints.
a. Create the vDAG with the AIOS createvDAG Endpoint
import requests
createvDAG_URL = "http://MANAGEMENTMASTER:30501/api/createvDAG"
response = requests.post(createvDAG_URL, json=circular_vdag_spec)
print(f"Parser Response Status: {response.status_code}")
print('Parser Response Body:', response.json())
Parser Response Status: 200
Parser Response Body: {'result': {'task_id': 'a572ebf5-2220-43ed-87be-7df0dd397e37', 'vdagURI': 'llm-circular-vdag-demo-17:1.0.0-stable'}, 'success': True, 'task_id': ''}
b. Verify the vDAG is registered
!curl -X GET http://MANAGEMENTMASTER:30103/vdag/llm-circular-vdag-demo-17:1.0.0-stable | json_pp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3226 100 3226 0 0 499k 0 --:--:-- --:--:-- --:--:-- 525k
{
"data" : {
"assignment_info" : {
"debater-A" : "llama4-scout-17b-block-circular",
"debater-B" : "magistral-small-2506-llama-cpp-block-circular",
"judge-llm" : "deepseek-r1-distill-70b-block-circular"
},
"compiled_graph_data" : {
"head" : "llama4-scout-17b-block-circular",
"rev_mapping" : {
"deepseek-r1-distill-70b-block-circular" : "judge-llm",
"llama4-scout-17b-block-circular" : "debater-A",
"magistral-small-2506-llama-cpp-block-circular" : "debater-B"
},
"t2_graph" : {
"deepseek-r1-distill-70b-block-circular" : [],
"llama4-scout-17b-block-circular" : [
"magistral-small-2506-llama-cpp-block-circular"
],
"magistral-small-2506-llama-cpp-block-circular" : [
"deepseek-r1-distill-70b-block-circular"
]
},
"t3_graph" : {
"deepseek-r1-distill-70b-block-circular" : {
"outputs" : []
},
"llama4-scout-17b-block-circular" : {
"outputs" : [
{
"block_id" : "magistral-small-2506-llama-cpp-block-circular",
"host" : "magistral-small-2506-llama-cpp-block-circular-executor-svc.blocks.svc.cluster.local",
"port" : 6379,
"queue_name" : "magistral-small-2506-llama-cpp-block-circular_inputs"
}
]
},
"magistral-small-2506-llama-cpp-block-circular" : {
"outputs" : [
{
"block_id" : "deepseek-r1-distill-70b-block-circular",
"host" : "deepseek-r1-distill-70b-block-circular-executor-svc.blocks.svc.cluster.local",
"port" : 6379,
"queue_name" : "deepseek-r1-distill-70b-block-circular_inputs"
}
]
}
},
"tail" : [
"deepseek-r1-distill-70b-block-circular"
]
},
"controller" : {
"initParameters" : {},
"initSettings" : {},
"inputSources" : [],
"policies" : []
},
"discoveryTags" : [
"vdag-llm",
"circular-vdag"
],
"graph" : {
"connections" : [
{
"inputs" : [
{
"nodeLabel" : "debater-A",
"outputNames" : [
"output_0"
]
}
],
"nodeLabel" : "debater-B"
},
{
"inputs" : [
{
"nodeLabel" : "debater-B",
"outputNames" : [
"output_0"
]
}
],
"nodeLabel" : "judge-llm"
}
],
"input" : [
{
"inputNames" : [
"input_0"
],
"nodeLabel" : "debater-A"
}
],
"output" : [
{
"nodeLabel" : "judge-llm",
"outputNames" : [
"output_0"
]
}
]
},
"metadata" : {},
"nodes" : [
{
"IOMap" : [],
"assignmentPolicyRule" : {},
"inputProtocol" : {},
"manualBlockId" : "llama4-scout-17b-block-circular",
"modelParameters" : {},
"nodeLabel" : "debater-A",
"nodeType" : "block",
"outputProtocol" : {},
"postprocessingPolicyRule" : {
"policyRuleURI" : "postprocessing_policy_router_debater:0.0.1-stable"
},
"preprocessingPolicyRule" : {
"policyRuleURI" : "preprocessing_policy_for_summarization:0.0.1-stable"
},
"vdagURI" : ""
},
{
"IOMap" : [],
"assignmentPolicyRule" : {},
"inputProtocol" : {},
"manualBlockId" : "magistral-small-2506-llama-cpp-block-circular",
"modelParameters" : {},
"nodeLabel" : "debater-B",
"nodeType" : "block",
"outputProtocol" : {},
"postprocessingPolicyRule" : {
"policyRuleURI" : "postprocessing_policy_router_debater:0.0.1-stable"
},
"preprocessingPolicyRule" : {
"policyRuleURI" : "preprocessing_policy_for_summarization:0.0.1-stable"
},
"vdagURI" : ""
},
{
"IOMap" : [],
"assignmentPolicyRule" : {},
"inputProtocol" : {},
"manualBlockId" : "deepseek-r1-distill-70b-block-circular",
"modelParameters" : {},
"nodeLabel" : "judge-llm",
"nodeType" : "block",
"outputProtocol" : {},
"postprocessingPolicyRule" : {
"policyRuleURI" : "postprocessing_policy_router_judge:0.0.1-stable"
},
"preprocessingPolicyRule" : {
"policyRuleURI" : "preprocessing_policy_for_summarization:0.0.1-stable"
},
"vdagURI" : ""
}
],
"status" : "assigned",
"vdagURI" : "llm-circular-vdag-demo-17:1.0.0-stable",
"vdag_name" : "llm-circular-vdag-demo-17",
"vdag_version" : {
"release-tag" : "stable",
"version" : "1.0.0"
}
},
"success" : true
}
http://MANAGEMENTMASTER:30201/block/health/llama4-scout-17b-block-circular
http://MANAGEMENTMASTER:30201/block/health/magistral-small-2506-llama-cpp-block-circular
http://MANAGEMENTMASTER:30201/block/health/deepseek-r1-distill-70b-block-circular
c. Create a vDAG Controller to deploy the pipeline
This command tells AIOS to use the necessary blocks that are already up for our vDAG. For more information check vdag controller
%%bash
curl -X POST http://MANAGEMENTMASTER:30600/vdag-controller/gcp-cluster-2 \
-H "Content-Type: application/json" \
-d '{
"action": "create_controller",
"payload": {
"vdag_controller_id": "llm-circular-vdag-demo-17",
"vdag_uri": "llm-circular-vdag-demo-17:1.0.0-stable",
"config": {
"policy_execution_mode": "local",
"replicas": 1
},
"search_tags": []
}
}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 354 100 58 100 296 557 2844 --:--:-- --:--:-- --:--:-- 3403
{"data":"Controller created successfully","success":true}
d. Get the controller details
!curl -X GET http://MANAGEMENTMASTER:30103/vdag-controller/llm-circular-vdag-demo-17 | json_pp
8. Running and Inspecting the Debate
Once the controller is active, you can start a debate by sending a request to the first model_inference in the chain (debater-A).
- Trigger the first turn: Post to the
v1/inferendpoint with asession_idandtopic. - Observe the logs: The most valuable insights come from watching the logs of the pods.
- Look for the summarizer's output when it fires.
- Inspect the debater and judge prompts, which are logged in detail.
import requests
import json
# Example: Start a debate on remote work
# We make the request directly in Python to avoid shell escaping issues.
INFERENCE_URL = "http://CLUSTER1MASTER:32647/v1/infer"
debate_payload = {
"model": "llama4-scout-17b-block-circular",
"session_id": "debate_220",
"seq_no": 1,
"data": {
"mode": "chat",
"message": "Please begin with your opening.",
# "session_id": "debate_212",
# "topic": "Debate for atleast 20 rounds,Propose a chain-snatching action recognition system from traffic CCTV, integrating detector+tracker+temporal model; target ≥0.85 F1 with FPPI ≤0.05/hr; include dataset plan and domain adaptation.?"
"topic": "Debate for atleast 20 rounds,Design an end-to-end real-time computer vision pipeline for multi-camera loitering and chain-snatching detection in urban CCTV. Constraints: 1080p@15–25 FPS, ≤300 ms alert latency, ≥90% recall at FPPI ≤0.1/hr/camera, 200 cameras on mixed Jetson Orin/T4 edge nodes, intermittent connectivity. Specify: detection (persons/riders/hand–object), MOT + re-ID, dwell-time estimation, action recognition, geo-fencing, alerting; model choices (e.g., RT-DETR/YOLOv8 vs lightweight MobileNet-SSD), trackers (ByteTrack/OC-SORT), re-ID (FastReID), temporal models; data/labeling plan and hard-negative mining; robustness (night/rain/occlusion/domain shift); privacy (on-device blur/redaction) and bias checks; monitoring/drift/A/B; throughput and GPU/CPU/power budgets; accuracy–latency trade-offs and fallback modes.Propose an evaluation plan and benchmarks (mAP, IDF1, FPPI, E2E latency) plus synthetic stress tests (night/rain/crowds) for the above pipelines; include A/B and rollout strategy."
},
"graph": {},
"selection_query": {}
}
debate_output = None
try:
response = requests.post(INFERENCE_URL, json=debate_payload, timeout=600) # Added a long timeout for long debates
response.raise_for_status()
debate_output = response.json()
print("Inference request successful. The raw JSON output is stored in 'debate_output'.")
except requests.exceptions.RequestException as e:
print(f"Failed to get response from inference service: {e}")
except json.JSONDecodeError:
print("Failed to parse JSON from response. Raw text:")
print(response.text)
Inference request successful. The raw JSON output is stored in 'debate_output'.
print(debate_output)
{'data': {'bump_round': True, 'judge_text': "Alright, so I'm trying to design an end-to-end real-time computer vision pipeline for detecting loitering and chain-snatching incidents using urban CCTV cameras. The constraints are pretty tight: 1080p resolution at 15–25 FPS, maximum alert latency of 300 ms, and a high recall rate of at least 90% with a low false positive rate (FPPI ≤0.1/hr/camera). Plus, I have to handle 200 cameras on a mix of Jetson Orin and T4 edge nodes, which means I need to be mindful of computational resources and potential connectivity issues.\n\nFirst, I need to break down the problem into manageable components. The pipeline should include detection of persons, riders, and hand-object interactions, multi-object tracking (MOT) with re-identification (re-ID), dwell-time estimation, action recognition, geo-fencing, and alerting. Each of these components will require careful selection of models and algorithms to meet the performance and latency constraints.\n\nFor detection, I'm considering models like RT-DETR and YOLOv8. RT-DETR is known for its accuracy, especially in detecting small objects, which is crucial for identifying something like a chain being snatched. However, YOLOv8 might offer better speed, which is essential for real-time processing. I need to weigh the trade-offs between accuracy and speed here.\n\nNext, for MOT, options like ByteTrack and OC-SORT come to mind. ByteTrack is lightweight and efficient, which is great for edge devices, but OC-SORT might handle crowded scenes better, which is common in urban areas. I need to think about which is more critical: handling crowds or keeping the computational load low.\n\nRe-ID is another important aspect. FastReID is a solid choice, but I might need a lightweight version to ensure it doesn't bog down the system. Maybe a distilled version could maintain accuracy while reducing computational demands.\n\nDwell-time estimation can be approached with a Kalman filter or a Gaussian process. Kalman filters are computationally efficient, which is a plus for edge devices, but Gaussian processes might offer better accuracy, which could be critical for correctly identifying loitering behavior.\n\nAction recognition is where I'm a bit stuck. A Graph Convolutional Network (GCN) could model complex interactions between objects, which might be beneficial for understanding the dynamics of a chain-snatching incident. On the other hand, a lightweight 3D CNN could process spatial and temporal features simultaneously, which is advantageous for real-time action recognition.\n\nGeo-fencing and alerting need to be integrated seamlessly. I should consider how to define geo-fences and ensure that alerts are generated and transmitted efficiently, especially with intermittent connectivity.\n\nRobustness is a big concern. The system needs to handle various environmental conditions like night, rain, and occlusions. Techniques like data augmentation, hard-negative mining, domain adaptation, and adversarial training could help improve robustness. Privacy is another aspect; on-device blurring or redaction will be necessary to protect individuals' privacy.\n\nMonitoring for drift and bias is essential. I need to implement checks to ensure the system doesn't develop biases over time and that its performance doesn't degrade due to concept drift. A/B testing could help compare different model configurations and ensure the system remains accurate and reliable.\n\nIn terms of evaluation, I'll need to use metrics like mAP, IDF1, FPPI, and E2E latency to assess the system's performance comprehensively. Synthetic stress tests simulating night, rain, and crowded conditions will help validate the system's robustness in challenging scenarios.\n\nThe rollout strategy should be phased, starting with a small-scale deployment to test the system in real-world conditions before scaling up to all 200 cameras. This approach will help identify and address any issues early on.\n\nOverall, I'm leaning towards using YOLOv8 for detection due to its speed-accuracy balance, OC-SORT for MOT in crowded areas, a lightweight 3D CNN for action recognition to handle real-time processing, and a Kalman filter for dwell-time estimation to keep things efficient on edge devices. I'll also prioritize domain adaptation and adversarial training for robustness and ensure privacy measures are in place. The evaluation plan will focus on both accuracy and latency metrics, with thorough stress testing to ensure reliability under various conditions.\n</think>\n\nDECISION: FINAL_JUDGMENT \nWINNER: B \nREASON: B's pipeline offers a better balance of speed and accuracy, crucial for real-time processing, with efficient models like YOLOv8 and OC-SORT, and robust techniques for action recognition and dwell-time estimation.", 'opponent_last': '', 'router_meta': {'recent_turns': [{'reply': 'I propose a multi-stage chain-snatching action recognition system from traffic CCTV footage. The system integrates a detector, tracker, and temporal model to accurately identify and classify chain-snatching incidents. The detector, based on YOLOv4, will identify potential chain-snatching events. The tracker, utilizing DeepSORT, will then track the movement of individuals involved. A temporal model, such as a Graph Convolutional Network (GCN), will analyze the tracked sequences to recognize the chain-snatching action. Our goal is to achieve ≥0.85 F1 score with ≤0.05/hr FPPI. We plan to use a dataset of 10,000 annotated CCTV clips and apply domain adaptation techniques, such as adversarial training.', 'role': 'A'}, {'reply': 'I agree on the multi-stage approach, but propose a more robust system. Instead of YOLOv4, I suggest using CenterNet for the detector due to its superior performance in detecting small objects, which is crucial for chain-snatching incidents where the chain is often small and fast-moving. For the tracker, I recommend using FairMOT, which has shown excellent performance in crowded scenes, a common scenario in traffic CCTV footage. The temporal model should be a 3D CNN, which is better suited for action recognition than GCN. Our dataset should include 15,000 annotated clips to ensure diversity and robustness. Domain adaptation should involve style transfer and adversarial training to handle variations in lighting and camera angles.', 'role': 'B'}, {'reply': "For real-time multi-camera loitering and chain-snatching detection, I propose a pipeline integrating RT-DETR for detection, ByteTrack for MOT, and a temporal model for action recognition. For re-ID, I suggest using FastReID. To meet constraints, we'll use Jetson Orin for compute-intensive tasks and T4 for others. Our detection model will identify persons, riders, and hand-object interactions. Dwell-time estimation will be based on tracked trajectories. Geo-fencing and alerting will be implemented using a separate module. To ensure robustness, we'll use data augmentation, hard-negative mining, and domain adaptation. For privacy, on-device blur/redaction will be applied. Evaluation will include mAP, IDF1, FPPI, and E2E latency benchmarks, with synthetic stress tests for night, rain, and crowds. A/B testing and monitoring will ensure accuracy and drift detection.", 'role': 'A'}, {'reply': "I propose a more efficient pipeline using YOLOv8 for detection, OC-SORT for MOT, and a lightweight 3D CNN for action recognition. YOLOv8 offers better speed-accuracy trade-offs than RT-DETR, crucial for real-time processing. OC-SORT is more robust in crowded scenes than ByteTrack. For re-ID, I suggest using a lightweight version of FastReID to reduce computational load. To meet constraints, we'll distribute tasks across Jetson Orin and T4 nodes based on their compute capabilities. Dwell-time estimation will be done using a Kalman filter for smoother trajectories. Geo-fencing and alerting will be integrated into the pipeline for efficiency. For robustness, we'll use a combination of data augmentation, hard-negative mining, and adversarial training. Evaluation will include mAP, IDF1, FPPI, and E2E latency benchmarks, with synthetic stress tests for night, rain, and crowds. A/B testing and monitoring will ensure accuracy and drift detection.", 'role': 'B'}, {'reply': 'I agree on using YOLOv8 and OC-SORT, but argue that a Graph Convolutional Network (GCN) is more suitable for action recognition than a 3D CNN, as it can model complex interactions between objects. For dwell-time estimation, I suggest using a more accurate method, such as a Gaussian process, instead of a Kalman filter. To ensure robustness, I recommend incorporating additional techniques, such as domain adaptation and style transfer, to handle variations in lighting and camera angles. Our evaluation plan should prioritize recall and FPPI metrics, as false negatives and positives have significant consequences in loitering and chain-snatching detection.', 'role': 'A'}, {'reply': 'While GCNs can model complex interactions, 3D CNNs are more efficient for action recognition in real-time systems, as they process temporal and spatial features simultaneously. For dwell-time estimation, Kalman filters are computationally lighter and more suitable for edge devices like Jetson Orin/T4, ensuring lower latency. Domain adaptation and style transfer are valuable, but adversarial training provides better robustness against adversarial attacks and domain shifts. Our evaluation plan should also prioritize E2E latency and IDF1, as they are crucial for real-time performance and tracking accuracy.', 'role': 'B'}], 'router_counts': {'B': 3}, 'running_summary': '- A: Proposes a multi-stage system for real-time chain-snatching and loitering detection using RT-DETR for detection, ByteTrack for MOT, and a temporal model for action recognition. Recommends RT-DETR for detection, ByteTrack for tracking, and a temporal model for action recognition. Suggests using Jetson Orin and T4 for compute distribution, FastReID for re-ID, and Gaussian process for dwell-time estimation. Emphasizes robustness through data augmentation, domain adaptation, and style transfer. Evaluation includes mAP, IDF1, FPPI, and E2E latency, with synthetic stress tests and A/B testing.\n- B: Proposes a more efficient pipeline using YOLOv8 for detection, OC-SORT for MOT, and a lightweight 3D CNN for action recognition. Advocates for YOLOv8 and OC-SORT for better speed-accuracy trade-offs and robustness in crowded scenes. Recommends a lightweight FastReID for re-ID and a Kalman filter for dwell-time estimation. Suggests domain adaptation, adversarial training, and synthetic stress tests for robustness. Evaluation includes mAP, IDF1, FPPI, and E2E latency, with A/B testing and monitoring.\n\nBoth A and B agree on using YOLOv8 and OC-SORT for real-time efficiency and robustness, but differ on the choice of action recognition model (GCN vs. 3D CNN) and dwell-time estimation method (Gaussian process vs. Kalman filter). Both emphasize domain adaptation and comprehensive evaluation metrics.'}, 'session_id': 'vdag::llm-circular-vdag-demo-17:1.0.0-stable::debate_220', 'topic': 'Debate for atleast 20 rounds,Design an end-to-end real-time computer vision pipeline for multi-camera loitering and chain-snatching detection in urban CCTV. Constraints: 1080p@15–25 FPS, ≤300 ms alert latency, ≥90% recall at FPPI ≤0.1/hr/camera, 200 cameras on mixed Jetson Orin/T4 edge nodes, intermittent connectivity. Specify: detection (persons/riders/hand–object), MOT + re-ID, dwell-time estimation, action recognition, geo-fencing, alerting; model choices (e.g., RT-DETR/YOLOv8 vs lightweight MobileNet-SSD), trackers (ByteTrack/OC-SORT), re-ID (FastReID), temporal models; data/labeling plan and hard-negative mining; robustness (night/rain/occlusion/domain shift); privacy (on-device blur/redaction) and bias checks; monitoring/drift/A/B; throughput and GPU/CPU/power budgets; accuracy–latency trade-offs and fallback modes.Propose an evaluation plan and benchmarks (mAP, IDF1, FPPI, E2E latency) plus synthetic stress tests (night/rain/crowds) for the above pipelines; include A/B and rollout strategy.'}, 'seq_no': 1, 'session_id': 'debate_220', 'ts': 1755766184.35858}
import json
import re
# The 'debate_output' variable now holds the Python dictionary from the request.
try:
# Check if the request was successful and debate_output is a dictionary
if debate_output and isinstance(debate_output, dict):
# Extract the main data object from the response
data = debate_output.get("data", {})
# The debate history is in router_meta.recent_turns
recent_turns = data.get("router_meta", {}).get("recent_turns", [])
topic = data.get("topic", "N/A")
print("="*50)
print(" DEBATE REPLAY")
print("="*50)
print(f"Topic: {topic}\n")
if not recent_turns:
print("No turns found in the output. The debate may have ended immediately or an error occurred.")
print("\nRaw Output:")
print(json.dumps(debate_output, indent=2))
else:
for i, turn in enumerate(recent_turns):
role = turn.get("role", "Unknown")
reply = turn.get("reply", turn.get("text", "No content"))
print(f"--- Turn {i+1}: Role '{role}' ---")
print(reply)
print("-" * (22 + len(role)))
# Display the final judgment if available
judge_text = data.get("judge_text", "")
if "FINAL_JUDGMENT" in judge_text:
# Use regex to extract the relevant parts of the judge's decision
winner_match = re.search(r"WINNER: (A|B|DRAW)", judge_text)
reason_match = re.search(r"REASON: (.*)", judge_text, re.DOTALL)
winner = winner_match.group(1) if winner_match else "Not specified"
reason = reason_match.group(1).strip() if reason_match else "Not specified"
print("\n" + "="*50)
print(" FINAL JUDGMENT")
print("="*50)
print(f"Decision: FINAL_JUDGMENT")
print(f"Winner: {winner}")
print(f"Reason: {reason}")
print("="*50)
else:
print("Debate output not available or is in an incorrect format.")
if debate_output:
print("\nRaw Output Received:")
print(json.dumps(debate_output, indent=2))
except Exception as e:
print(f"An error occurred while processing the debate output: {e}")
if debate_output:
print("\nRaw Output Received:")
print(json.dumps(debate_output, indent=2))
==================================================
DEBATE REPLAY
==================================================
Topic: Debate for atleast 20 rounds,Design an end-to-end real-time computer vision pipeline for multi-camera loitering and chain-snatching detection in urban CCTV. Constraints: 1080p@15–25 FPS, ≤300 ms alert latency, ≥90% recall at FPPI ≤0.1/hr/camera, 200 cameras on mixed Jetson Orin/T4 edge nodes, intermittent connectivity. Specify: detection (persons/riders/hand–object), MOT + re-ID, dwell-time estimation, action recognition, geo-fencing, alerting; model choices (e.g., RT-DETR/YOLOv8 vs lightweight MobileNet-SSD), trackers (ByteTrack/OC-SORT), re-ID (FastReID), temporal models; data/labeling plan and hard-negative mining; robustness (night/rain/occlusion/domain shift); privacy (on-device blur/redaction) and bias checks; monitoring/drift/A/B; throughput and GPU/CPU/power budgets; accuracy–latency trade-offs and fallback modes.Propose an evaluation plan and benchmarks (mAP, IDF1, FPPI, E2E latency) plus synthetic stress tests (night/rain/crowds) for the above pipelines; include A/B and rollout strategy.
--- Turn 1: Role 'A' ---
I propose a multi-stage chain-snatching action recognition system from traffic CCTV footage. The system integrates a detector, tracker, and temporal model to accurately identify and classify chain-snatching incidents. The detector, based on YOLOv4, will identify potential chain-snatching events. The tracker, utilizing DeepSORT, will then track the movement of individuals involved. A temporal model, such as a Graph Convolutional Network (GCN), will analyze the tracked sequences to recognize the chain-snatching action. Our goal is to achieve ≥0.85 F1 score with ≤0.05/hr FPPI. We plan to use a dataset of 10,000 annotated CCTV clips and apply domain adaptation techniques, such as adversarial training.
-----------------------
--- Turn 2: Role 'B' ---
I agree on the multi-stage approach, but propose a more robust system. Instead of YOLOv4, I suggest using CenterNet for the detector due to its superior performance in detecting small objects, which is crucial for chain-snatching incidents where the chain is often small and fast-moving. For the tracker, I recommend using FairMOT, which has shown excellent performance in crowded scenes, a common scenario in traffic CCTV footage. The temporal model should be a 3D CNN, which is better suited for action recognition than GCN. Our dataset should include 15,000 annotated clips to ensure diversity and robustness. Domain adaptation should involve style transfer and adversarial training to handle variations in lighting and camera angles.
-----------------------
--- Turn 3: Role 'A' ---
For real-time multi-camera loitering and chain-snatching detection, I propose a pipeline integrating RT-DETR for detection, ByteTrack for MOT, and a temporal model for action recognition. For re-ID, I suggest using FastReID. To meet constraints, we'll use Jetson Orin for compute-intensive tasks and T4 for others. Our detection model will identify persons, riders, and hand-object interactions. Dwell-time estimation will be based on tracked trajectories. Geo-fencing and alerting will be implemented using a separate module. To ensure robustness, we'll use data augmentation, hard-negative mining, and domain adaptation. For privacy, on-device blur/redaction will be applied. Evaluation will include mAP, IDF1, FPPI, and E2E latency benchmarks, with synthetic stress tests for night, rain, and crowds. A/B testing and monitoring will ensure accuracy and drift detection.
-----------------------
--- Turn 4: Role 'B' ---
I propose a more efficient pipeline using YOLOv8 for detection, OC-SORT for MOT, and a lightweight 3D CNN for action recognition. YOLOv8 offers better speed-accuracy trade-offs than RT-DETR, crucial for real-time processing. OC-SORT is more robust in crowded scenes than ByteTrack. For re-ID, I suggest using a lightweight version of FastReID to reduce computational load. To meet constraints, we'll distribute tasks across Jetson Orin and T4 nodes based on their compute capabilities. Dwell-time estimation will be done using a Kalman filter for smoother trajectories. Geo-fencing and alerting will be integrated into the pipeline for efficiency. For robustness, we'll use a combination of data augmentation, hard-negative mining, and adversarial training. Evaluation will include mAP, IDF1, FPPI, and E2E latency benchmarks, with synthetic stress tests for night, rain, and crowds. A/B testing and monitoring will ensure accuracy and drift detection.
-----------------------
--- Turn 5: Role 'A' ---
I agree on using YOLOv8 and OC-SORT, but argue that a Graph Convolutional Network (GCN) is more suitable for action recognition than a 3D CNN, as it can model complex interactions between objects. For dwell-time estimation, I suggest using a more accurate method, such as a Gaussian process, instead of a Kalman filter. To ensure robustness, I recommend incorporating additional techniques, such as domain adaptation and style transfer, to handle variations in lighting and camera angles. Our evaluation plan should prioritize recall and FPPI metrics, as false negatives and positives have significant consequences in loitering and chain-snatching detection.
-----------------------
--- Turn 6: Role 'B' ---
While GCNs can model complex interactions, 3D CNNs are more efficient for action recognition in real-time systems, as they process temporal and spatial features simultaneously. For dwell-time estimation, Kalman filters are computationally lighter and more suitable for edge devices like Jetson Orin/T4, ensuring lower latency. Domain adaptation and style transfer are valuable, but adversarial training provides better robustness against adversarial attacks and domain shifts. Our evaluation plan should also prioritize E2E latency and IDF1, as they are crucial for real-time performance and tracking accuracy.
-----------------------
==================================================
FINAL JUDGMENT
==================================================
Decision: FINAL_JUDGMENT
Winner: B
Reason: B's pipeline offers a better balance of speed and accuracy, crucial for real-time processing, with efficient models like YOLOv8 and OC-SORT, and robust techniques for action recognition and dwell-time estimation.
==================================================
9. Observability: Kubernetes Dashboard and Logs
- Kubernetes Dashboard (if enabled):
- Open: https://CLUSTER1MASTER:32319/
10. Troubleshooting
- Missing summaries: Check the
summarize_every_n_messagesandmin_tokens_for_summarizationgates. Short conversations won't trigger summaries. - Unexpected judge finalization: The
max_roundsorjudge_continue_capwas likely hit. The judge router has a "force-finalize" handshake to ensure termination. - Schema mismatches: Ensure all components are using the canonical fields. Legacy fields are ignored.
- Duplicate turns in summarizer window: This can happen if the role or text differs slightly. The current deduplication is role-aware and whitespace-normalized.
11. Cleanup
Always remove controllers and VDAGs that are no longer in use to free up cluster resources.
a. Remove the controller
%%bash
curl -X POST http://MANAGEMENTMASTER:30600/vdag-controller/gcp-cluster-2 \
-H "Content-Type: application/json" \
-d '{
"action": "remove_controller",
"payload": {
"vdag_controller_id": "llm-circular-vdag-demo-17"
}
}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 177 100 58 100 119 705 1447 --:--:-- --:--:-- --:--:-- 2158
{"data":"Controller removed successfully","success":true}
b. Delete the vDAG definition
%%bash
curl -X DELETE http://MANAGEMENTMASTER:30103/vdag/llm-circular-vdag-demo-17:1.0.0-stable
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 51 100 51 0 0 5858 0 --:--:-- --:--:-- --:--:-- 6375
{"data":{"message":"vDAG deleted"},"success":true}
13. Circular vDAG Summary
Here is a visual summary of the circular vDAG we have built in this tutorial. This diagram shows how the Debaters and the Judge interact, with the policies routing the conversation between them in a loop until a final judgment is reached.
