How to Design a Fully Local Multi-Agent Orchestration System Using TinyLlama for Intelligent Task Decomposition and Autonomous Collaboration

In this tutorial, we explore how we can orchestrate a team of specialized AI agents locally using an efficient manager-agent architecture powered by TinyLlama. We walk through how we build structured task decomposition, inter-agent collaboration, and autonomous reasoning loops without relying on any external APIs. By running everything directly through the transformers library, we create a fully offline, lightweight, and transparent multi-agent system that we can customize, inspect, and extend. Through the snippets, we observe how each component, from task structures to agent prompts to result synthesis, comes together to form a coherent human-AI workflow that we control end-to-end. Check out the FULL CODES here.

!pip install transformers torch accelerate bitsandbytes -q

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import json
import re
from typing import List, Dict, Any
from dataclasses import dataclass, asdict
from datetime import datetime

@dataclass
class Task:
id: str
description: str
assigned_to: str = None
status: str = “pending”
result: Any = None
dependencies: List[str] = None

def __post_init__(self):
if self.dependencies is None:
self.dependencies = []

@dataclass
class Agent:
name: str
role: str
expertise: str
system_prompt: str

We set up all the core imports and define the fundamental data structures needed to manage tasks and agents. We define Task and Agent as structured entities to cleanly orchestrate work. By doing this, we ensure that every part of the system has a consistent and reliable foundation. Check out the FULL CODES here.

AGENT_REGISTRY = {
“researcher”: Agent(
name=”researcher”,
role=”Research Specialist”,
expertise=”Information gathering, analysis, and synthesis”,
system_prompt=”You are a research specialist. Provide thorough research on topics.”
),
“coder”: Agent(
name=”coder”,
role=”Software Engineer”,
expertise=”Writing clean, efficient code with best practices”,
system_prompt=”You are an expert programmer. Write clean, well-documented code.”
),
“writer”: Agent(
name=”writer”,
role=”Content Writer”,
expertise=”Clear communication and documentation”,
system_prompt=”You are a professional writer. Create clear, engaging content.”
),
“analyst”: Agent(
name=”analyst”,
role=”Data Analyst”,
expertise=”Data interpretation and insights”,
system_prompt=”You are a data analyst. Provide clear insights from data.”
)
}

class LocalLLM:
def __init__(self, model_name: str = “TinyLlama/TinyLlama-1.1B-Chat-v1.0″):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
) if torch.cuda.is_available() else None
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map=”auto”,
low_cpu_mem_usage=True
)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token

def generate(self, prompt: str, max_tokens: int = 300) -> str:
formatted_prompt = f”<|system|>\nYou are a helpful AI assistant.</s>\n<|user|>\n{prompt}</s>\n<|assistant|>\n”
inputs = self.tokenizer(
formatted_prompt,
return_tensors=”pt”,
truncation=True,
max_length=1024,
padding=True
)
inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = self.model.generate(
**inputs,
max_new_tokens=max_tokens,
temperature=0.7,
do_sample=True,
top_p=0.9,
pad_token_id=self.tokenizer.pad_token_id,
eos_token_id=self.tokenizer.eos_token_id,
use_cache=True
)
full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
if “<|assistant|>” in full_response:
return full_response.split(“<|assistant|>”)[-1].strip()
return full_response[len(formatted_prompt):].strip()

We register all our specialized agents and implement the local LLM wrapper that powers the system. We load TinyLlama in an efficient 4-bit mode so we can run everything smoothly on Colab or local hardware. With this, we give ourselves a flexible and fully local way to generate responses for each agent. Check out the FULL CODES here.

class ManagerAgent:
def __init__(self, model_name: str = “TinyLlama/TinyLlama-1.1B-Chat-v1.0”):
self.llm = LocalLLM(model_name)
self.agents = AGENT_REGISTRY
self.tasks: Dict[str, Task] = {}
self.execution_log = []

def log(self, message: str):
timestamp = datetime.now().strftime(“%H:%M:%S”)
log_entry = f”[{timestamp}] {message}”
self.execution_log.append(log_entry)
print(log_entry)

def decompose_goal(self, goal: str) -> List[Task]:
self.log(f”🎯 Decomposing goal: {goal}”)
agent_info = “\n”.join([f”- {name}: {agent.expertise}” for name, agent in self.agents.items()])
prompt = f”””Break down this goal into 3 specific subtasks. Assign each to the best agent.

Goal: {goal}

Available agents:
{agent_info}

Respond ONLY with a JSON array.”””
response = self.llm.generate(prompt, max_tokens=250)
try:
json_match = re.search(r’\[\s*\{.*?\}\s*\]’, response, re.DOTALL)
if json_match:
tasks_data = json.loads(json_match.group())
else:
raise ValueError(“No JSON found”)
except:
tasks_data = self._create_default_tasks(goal)

tasks = []
for i, task_data in enumerate(tasks_data[:3]):
task = Task(
id=task_data.get(‘id’, f’task_{i+1}’),
description=task_data.get(‘description’, f’Work on: {goal}’),
assigned_to=task_data.get(‘assigned_to’, list(self.agents.keys())[i % len(self.agents)]),
dependencies=task_data.get(‘dependencies’, [] if i == 0 else [f’task_{i}’])
)
self.tasks[task.id] = task
tasks.append(task)
self.log(f” ✓ {task.id}: {task.description[:50]}… → {task.assigned_to}”)

return tasks

We begin constructing the ManagerAgent class and focus on how we decompose a high-level goal into well-defined subtasks. We generate structured JSON-based tasks and automatically assign them to the right agent. By doing this, we allow the system to think step by step and organize work just like a human project manager. Check out the FULL CODES here.

def _create_default_tasks(self, goal: str) -> List[Dict]:
if any(word in goal.lower() for word in [‘code’, ‘program’, ‘implement’, ‘algorithm’]):
return [
{“id”: “task_1”, “description”: f”Research and explain the concept: {goal}”, “assigned_to”: “researcher”, “dependencies”: []},
{“id”: “task_2”, “description”: f”Write code implementation for: {goal}”, “assigned_to”: “coder”, “dependencies”: [“task_1”]},
{“id”: “task_3”, “description”: f”Create documentation and examples”, “assigned_to”: “writer”, “dependencies”: [“task_2”]}
]
return [
{“id”: “task_1”, “description”: f”Research: {goal}”, “assigned_to”: “researcher”, “dependencies”: []},
{“id”: “task_2”, “description”: f”Analyze findings and structure content”, “assigned_to”: “analyst”, “dependencies”: [“task_1”]},
{“id”: “task_3”, “description”: f”Write comprehensive response”, “assigned_to”: “writer”, “dependencies”: [“task_2″]}
]

def execute_task(self, task: Task, context: Dict[str, Any] = None) -> str:
self.log(f”🤖 Executing {task.id} with {task.assigned_to}”)
task.status = “in_progress”
agent = self.agents[task.assigned_to]
context_str = “”
if context and task.dependencies:
context_str = “\n\nContext from previous tasks:\n”
for dep_id in task.dependencies:
if dep_id in context:
context_str += f”- {context[dep_id][:150]}…\n”

prompt = f”””{agent.system_prompt}

Task: {task.description}{context_str}

Provide a clear, concise response:”””
result = self.llm.generate(prompt, max_tokens=250)
task.result = result
task.status = “completed”
self.log(f” ✓ Completed {task.id}”)
return result

We define fallback task logic and the full execution flow for each task. We guide each agent with its own system prompt and provide contextual information to keep results coherent. This allows us to execute tasks intelligently while respecting dependency order. Check out the FULL CODES here.

def synthesize_results(self, goal: str, results: Dict[str, str]) -> str:
self.log(“🔄 Synthesizing final results”)
results_text = “\n\n”.join([f”Task {tid}:\n{res[:200]}” for tid, res in results.items()])
prompt = f”””Combine these task results into one final coherent answer.

Original Goal: {goal}

Task Results:
{results_text}

Final comprehensive answer:”””
return self.llm.generate(prompt, max_tokens=350)

def execute_goal(self, goal: str) -> Dict[str, Any]:
self.log(f”\n{‘=’*60}\n🎬 Starting Manager Agent\n{‘=’*60}”)
tasks = self.decompose_goal(goal)
results = {}
completed = set()
max_iterations = len(tasks) * 2
iteration = 0

while len(completed) < len(tasks) and iteration < max_iterations:
iteration += 1
for task in tasks:
if task.id in completed:
continue
deps_met = all(dep in completed for dep in task.dependencies)
if deps_met:
result = self.execute_task(task, results)
results[task.id] = result
completed.add(task.id)

final_output = self.synthesize_results(goal, results)
self.log(f”\n{‘=’*60}\n✅ Execution Complete!\n{‘=’*60}\n”)

return {
“goal”: goal,
“tasks”: [asdict(task) for task in tasks],
“final_output”: final_output,
“execution_log”: self.execution_log
}

We synthesize the outputs from all subtasks and convert them into one unified final answer. We also implement an orchestration loop that ensures each task runs only after its dependencies are complete. This snippet shows how we bring everything together into a smooth multi-step reasoning pipeline. Check out the FULL CODES here.

def demo_basic():
manager = ManagerAgent()
goal = “Explain binary search algorithm with a simple example”
result = manager.execute_goal(goal)
print(“\n” + “=”*60)
print(“FINAL OUTPUT”)
print(“=”*60)
print(result[“final_output”])
return result

def demo_coding():
manager = ManagerAgent()
goal = “Implement a function to find the maximum element in a list”
result = manager.execute_goal(goal)
print(“\n” + “=”*60)
print(“FINAL OUTPUT”)
print(“=”*60)
print(result[“final_output”])
return result

def demo_custom(custom_goal: str):
manager = ManagerAgent()
result = manager.execute_goal(custom_goal)
print(“\n” + “=”*60)
print(“FINAL OUTPUT”)
print(“=”*60)
print(result[“final_output”])
return result

if __name__ == “__main__”:
print(“🤖 Manager Agent Tutorial – APIless Local Version”)
print(“=”*60)
print(“Using TinyLlama (1.1B) – Fast & efficient!\n”)
result = demo_basic()
print(“\n\n💡 Try more:”)
print(” – demo_coding()”)
print(” – demo_custom(‘your goal here’)”)

We provide demonstration functions to easily test our system with different goals. We run sample tasks to observe how the manager decomposes, executes, and synthesizes work in real time. This gives us an interactive way to understand the entire workflow and refine it further.

In conclusion, we demonstrate how to design and operate a complete multi-agent orchestration system locally with minimal dependencies. We now understand how the manager breaks down goals, routes tasks to the right expert agents, collects their outputs, resolves dependencies, and synthesizes the final result. This implementation allows us to appreciate how modular, predictable, and powerful local agentic patterns can be when built from scratch.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Source link

How to Design a Fully Local Multi-Agent Orchestration System Using TinyLlama for Intelligent Task Decomposition and Autonomous Collaboration

You may also like