TaskOSS is an open-source Task Operating System designed to define, track, evaluate, and learn from every unit of work. It works across humans, AI agents, and QA workflows.

How does TaskOSS handle task evaluation?

TaskOSS uses YAML-based task specifications that include QA criteria. Each task defines its own success criteria, and the system automatically evaluates outcomes.

Is TaskOSS open source?

Yes, TaskOSS is fully open source under the MIT License.

Open Source • MIT Licensed

TaskOSS - The Task Operating System

Define once. Evaluate everywhere. Learn continuously.
The open infrastructure for making every task observable, retryable, and improvable - across humans, AI agents, and automated systems.

Book a Demo Try the Tracker

task.yaml

# TaskOSS Spec v1.0
task_id: "deploy-auth-service"
version: "2.1.0"
goal: "Deploy authentication service to production"

owner:
  type: "agent"
  id: "claude-ops-v3"

qa:
  eval_type: "functional"
  criteria:
    - "All endpoints return 200 OK"
    - "Auth tokens validate correctly"
    - "Latency under 200ms p99"
  retry:
    enabled: true
    max_attempts: 3
    backoff: "exponential"

status: "completed"
eval_score: 0.96

The Problem

Every Day, Billions in Work Disappears Without a Trace

Tasks complete without evaluation - work finishes but quality is unknown
AI agents don't know if they succeeded - no structured feedback loops
Platform teams lack retry logic - failures repeat without learning
No visibility into task quality - outcomes are opaque

Track My Tasks

deploy-frontend-v3

Failed • No retry configured • 2h ago

sync-user-database

Completed • No evaluation • Unknown quality

generate-monthly-report

Stale • Last updated 3 days ago

migrate-auth-schema

Failed 4x • No learning loop • Blocked

Core Benefits

QA-Native. Agent-Ready. Retry-Smart.

Every task gets an ID, a score, and a chance to improve. No more black-box workflows.

Define Once, Run Anywhere

Declarative YAML specs that work across CI/CD, agents, and manual workflows.

Built-In QA & Eval

Every task carries its own success criteria and automatic evaluation logic.

Traceable Outputs

Full lineage from definition to execution to evaluation results.

Agent-First Interop

Native support for AI agents with structured feedback and retry loops.

Live Example

See It In Action: From Definition to Insight

One YAML file. Full observability. Every task becomes a learning opportunity.

onboard-user.yaml

task_id: "onboard-new-user"
version: "1.2.0"
goal: "Complete user onboarding with verification"

owner:
  type: "agent"
  id: "onboarding-agent-v2"

inputs:
  user_email: "required"
  plan_type: "optional"

qa:
  eval_type: "functional"
  criteria:
    - "User record created in database"
    - "Welcome email sent successfully"
    - "Session token generated"
  timeout: "30s"
  retry:
    enabled: true
    max_attempts: 3

evaluation.json

task_id: "onboard-new-user"
execution_id: "exec-7f3a9b2c"

evaluation:
  overall_score: 0.94
  status: "passed"
  criteria_results:
    - name: "User record created"
      passed: true
      latency: "45ms"
    - name: "Welcome email sent"
      passed: true
      latency: "1.2s"
    - name: "Session token generated"
      passed: true
      latency: "12ms"

completed_at: "2024-01-15T14:32:07Z"

retry-log.yaml

task_id: "onboard-new-user"

retry_history:
  - attempt: 1
    status: "failed"
    reason: "Email service timeout"
    timestamp: "14:31:45Z"

  - attempt: 2
    status: "failed"
    reason: "Email service timeout"
    backoff: "2s"

  - attempt: 3
    status: "success"
    backoff: "4s"

learning:
  pattern: "email_service_latency"
  recommendation: "Increase timeout to 5s"

onboard-new-user Passed

Version 1.2.0

Owner onboarding-agent-v2

Execution Time 22.3s

Retry Attempts 3 / 3

Eval Score

94%

User record created

Welcome email sent

Session token generated

How It Works

Three Steps to Task Intelligence

Define

Create a task spec with task_id, goal, owner, and QA criteria in YAML.

Track

Execute the task and stream updates - status, progress, logs, and retries.

Evaluate & Learn

Run QA evaluations, capture scores, and feed learnings back into the system.

Use Cases

Built for Every Team That Ships

From product launches to AI agent orchestration - TaskOSS brings structure to chaos.

Product Manager

Full Visibility

See every task's real status - not just "done" but quality, retries, and blockers.

QA Lead

Eval Automation

Define criteria once, run on every execution. Track quality trends over time.

DevOps

Retry & Resilience

Built-in retry policies with exponential backoff. Tasks self-heal without intervention.

Agent Framework

Structured Feedback

Give AI agents clear success signals. Enable learning loops with versioned specs.