iVedha Inc. logo
iVedha Inc. Verified
Information Technology and Services

Lead AI Engineer - SRE, LLM Agents, Full-Stack Architecture

United StatesOnsiteFull TimeLeadPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

iVedha Inc. is seeking a Lead AI Engineer for a leading financial institution to design, build, and operationalize next-generation agentic AI systems. This leadership role focuses on LLM agents, Site Reliability Engineering (SRE), and full-stack architecture within a regulated banking environment. Responsibilities include architecting multi-agent LLM systems, implementing MCP servers, developing RAG pipelines, and leading AI observability with the ELK stack. The role requires expert proficiency in Node.js (TypeScript) and Python, deep AI/ML understanding, SRE best practices, and experience with enterprise AI tools. Candidates must demonstrate awareness of banking and compliance requirements like SOC 2 Type II and PII protection. This is a high-impact opportunity to shape AI transformation in financial services.

About iVedha:

iVedha Inc. is a global AI-first digital transformation company with over 25 years of excellence. Powered by the
iVedha Fabric - our AI-native operating system
, we unify cloud, data, AI, security, and people to deliver measurable, resilient outcomes. Our expertise spans
Agentic AI, Generative AI, Cloud Engineering, Cybersecurity, Data Modernization, Application Transformation,
and
Talent Enablement
.

Join our team of forward-thinking innovators shaping the future of intelligent enterprises, where automation, observability, and AI-driven quality assurance redefine delivery velocity.

About the Opportunity

A leading financial institution is seeking a highly experienced Lead AI Engineer to join its advanced technology division. This is a high-impact, leadership-track role at the intersection of AI engineering, Site Reliability, and enterprise-grade software architecture. The successful candidate will design, build, and operationalize the next generation of agentic AI systems within a regulated banking environment — driving intelligent automation while maintaining the rigorous security, compliance, and availability standards demanded by the financial services industry.

You will architect multi-agent LLM systems, implement Model Context Protocol (MCP) servers, build production-grade RAG pipelines, and lead AI observability practices using the ELK stack. This role requires deep technical expertise combined with the leadership acumen to mentor engineers and influence cross-functional technical decisions.

Key Responsibilities

Pillar 1 — AI Architecture & Agentic Systems

  • Design and implement sophisticated LLM-powered agentic workflows and multi-agent architectures capable of autonomous reasoning, planning, and tool execution within secure financial system boundaries.
  • Architect and deploy scalable Model Context Protocol (MCP) servers to enable standardized, secure, and rich context management between AI models, internal banking APIs, and external data sources.
  • Develop production-grade Retrieval-Augmented Generation (RAG) and GraphRAG pipelines that ground AI agents in accurate, real-time enterprise financial data with full auditability.
  • Leverage expertise in Meta AI (Llama ecosystem), Google AI (Gemini, Vertex AI), and Microsoft Copilot to build and integrate cutting-edge AI features while adhering to financial data handling policies.
  • Implement prompt versioning, model drift detection, and automated evaluation pipelines to maintain AI system quality and regulatory compliance over time.

Pillar 2 — Full-Stack Engineering

  • Lead end-to-end development of robust, scalable AI applications using Node.js (TypeScript) and Python (FastAPI/Django) — both languages are required.
  • Champion AI-assisted developer workflows ('Vibe Coding') using advanced tools such as Cursor and GitHub Copilot to improve team productivity and code quality.
  • Design and implement secure, high-performance RESTful and GraphQL APIs to serve LLM inferences and agentic actions to frontend and downstream systems.
  • Develop and maintain Bash and Python automation scripts for infrastructure management, deployment orchestration, and operational efficiency.
  • Mentor junior and mid-level engineers in AI-native development practices and modern architectural patterns.

Pillar 3 — Site Reliability Engineering & AI Observability

  • Implement comprehensive observability stacks using the ELK Stack (Elasticsearch, Logstash, Kibana) specifically tuned for LLM performance metrics: latency, token usage, hallucination rates, and model drift indicators.
  • Apply SRE best practices to AI workloads — ensuring high availability, fault tolerance, incident response playbooks, and SLO/SLA management for LLM inference services.
  • Build and maintain CI/CD pipelines tailored for machine learning models, including prompt versioning, model evaluation gates, shadow deployments, and automated rollback.
  • Design alerting, on-call runbooks, and escalation paths for AI system incidents within a regulated financial services environment.

Required Qualifications:

- Programming Languages
- Expert-level proficiency in Node.js (TypeScript/JavaScript) and Python. Both are required. Bash scripting for infrastructure automation is mandatory.
- AI & Machine Learning
- Deep understanding of LLM architectures, prompt engineering, fine-tuning techniques (LoRA/qLoRA), and embedding models. Proven experience building and operating production-grade LLM applications.
- Agentic Frameworks
- Hands-on experience designing autonomous agents and implementing Model Context Protocol (MCP) servers for standardized tool and context management.
- RAG & Vector Databases
- Strong experience building RAG and GraphRAG pipelines. Proficiency with vector databases (Pinecone, Milvus, or Weaviate) and embedding model selection strategies.
- Observability & SRE
- Extensive hands-on experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for distributed system logging, monitoring, and AI-specific metrics tracking.
- Cloud & Infrastructure
- Proven experience with cloud-native architectures. Azure and AKS (Azure Kubernetes Service) experience strongly preferred for this engagement.
- Enterprise AI Tools
- Demonstrated expertise with Microsoft Copilot (Copilot Studio extensibility, custom connectors), Meta AI open-source models, and Google AI infrastructure (Gemini/Vertex AI).
- Leadership -
8+ years of progressive software engineering experience. Minimum 3 years in a technical leadership or architectural role with a focus on AI/ML systems.

Banking & Compliance Requirements:

Given the regulated nature of this environment, candidates must demonstrate awareness of and experience with the following:

  • Working knowledge of SOC 2 Type II compliance principles and their impact on AI system design and data handling.
  • Understanding of financial data classification, PII protection, and audit trail requirements for AI-generated outputs.
  • Experience implementing secure credential management (e.g., Azure Key Vault, HashiCorp Vault) in production AI systems.
  • Familiarity with model governance requirements — including explainability, version control, and documentation for AI systems in regulated environments.
  • Knowledge of zero-trust security principles and least-privilege access patterns for AI agent tool integrations.

Preferred Qualifications:

  • Experience building or integrating AI observability platforms with OpenTelemetry for unified tracing across AI and infrastructure layers.
  • Elastic Certified Engineer or Elastic Certified Observability Engineer certification.
  • Familiarity with Elastic Agent and Fleet management for centralized log collection in enterprise environments.
  • Prior experience in financial services, banking technology, or fintech with exposure to trading systems, fraud detection, or compliance platforms.
  • Contributions to open-source AI/ML projects or published research in LLM applications.

Why This Role

This is a rare opportunity to be at the forefront of AI engineering within a major financial institution — building systems that push the boundaries of what autonomous agents can achieve within a complex, regulated enterprise. You will have direct architectural influence over the institution's AI transformation roadmap, work with cutting-edge models and frameworks, and lead a high-caliber engineering team. Your decisions will shape how AI is responsibly deployed in financial services for years to come.

Ready to apply?
You'll be redirected to iVedha Inc.'s application page.