Agentic QA Engineer – Generative AI & Multi-Agent Systems

United StatesOnsiteContractPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Seeking a hands-on Agentic QA Engineer to lead end-to-end testing for agentic and multi-agent AI systems. This role involves defining QA strategy, building scalable test frameworks, and ensuring the reliability, accuracy, latency, and orchestration correctness of AI systems from development through production. Responsibilities include designing tests for agent orchestration, tool usage, planner-executor loops, and inter-agent workflows, validating critical AI components like state, memory, and prompts, and implementing resiliency and chaos tests. The engineer will also define and measure performance SLOs, integrate testing into CI/CD pipelines, and collaborate with cross-functional teams.

Role Summary

Seeking a hands-on Agentic QA Engineer to lead end-to-end testing for agentic and multi-agent AI systems. You will define QA strategy, build scalable test frameworks, and ensure reliability, accuracy, latency, and orchestration correctness from Dev → Prod.

Key Responsibilities

Own QA strategy for agentic/multi-agent systems across Dev, Staging, Prod

Design tests for agent orchestration, tool usage, planner-executor loops, inter-agent workflows

Validate state, memory, prompts, context windows, and agent graph correctness

Build resiliency & chaos tests (failover, retries, circuit breakers, degraded modes)

Define and measure latency SLOs, reliability, soak tests, canary releases

Implement accuracy validation frameworks (semantic similarity, factuality, hallucination, guardrails – PII/toxicity)

Perform load/stress testing for multi-agent systems (scale, concurrency, throughput)

Create reusable test artifacts (synthetic data, prompt libraries, simulators, agent fixtures)

Integrate testing into CI/CD pipelines & production monitoring

Drive release readiness, incident triage, and operational excellence

Collaborate with Agentic Ops, Data Science, MLOps, and Platform teams

Required Skills

7+ years QA; 2+ years in AI/ML/LLM systems & agentic architectures

Strong Python or TypeScript/JavaScript (test frameworks, simulators)

Experience with LLM evaluation (BLEU, ROUGE, BERTScore, embeddings, semantic similarity)

Knowledge of prompt testing, guardrails, hallucination detection

Expertise in distributed systems testing, latency profiling, chaos engineering

Experience with LangChain, LangGraph, LlamaIndex, DSPy, OpenAI/Azure OpenAI orchestration

Strong CI/CD (GitHub Actions/Azure DevOps)

Observability: OpenTelemetry, Prometheus/Grafana, Datadog

Knowledge of security, PII, compliance in AI systems

Preferred Skills

Multi-agent simulation & agent graph testing

MLOps & evaluation pipelines, A/B testing

AWS, serverless, containers, event-driven architectures

Managing SLAs, cost, and latency for AI systems

Ready to apply?

You'll be redirected to PGC Digital (America) Inc: CMMI Level 3 Company's application page.