Agentic Workflows: How AI Automates Agency Work

What is an agentic workflow?

Standard AI: you ask, the AI responds. One exchange. An agentic workflow is different: you give a goal, the AI plans steps, calls tools, evaluates results, and loops until done, running multiple actions without you directing each one.

The loop is what makes it "agentic": plan → act → observe → plan again. Where a chatbot answers a question, an agentic workflow executes a process (pulling data, writing documents, checking results, revising) from a single goal input.

The word "agentic" comes from agency: the capacity to act independently and purposefully. An agentic system has agency: it doesn't wait for your next instruction at every step. IBM defines it as AI that can "accomplish a specific goal with limited supervision." Anthropic draws a sharper line: in an agentic system, the LLM dynamically directs its own processes and tool usage rather than following predefined code paths.

Three tiers of AI automation

Not all "AI automation" is the same. There are three meaningfully different tiers, and only the third is truly agentic.

Tier 1: Traditional automation

Rule-based. If X then Y. Robotic process automation (RPA) sits here. No LLM, no reasoning: just scripted logic that breaks when reality doesn't match the script. Reliable for simple, repetitive tasks; brittle for anything that requires judgement.

Tier 2: Non-agentic AI

An LLM generates a response from a prompt. One call, one output. Useful for drafting, summarising, or classifying, but the model doesn't take actions, use tools, or remember anything between calls. Your ChatGPT conversation is Tier 2.

Tier 3: Agentic workflow

An agent uses an LLM to plan, calls tools to act, reflects on results, and loops until the goal is met. It can adapt mid-task: if one tool fails, it substitutes another. IBM documented an example where a web search API failed mid-task and the agent automatically switched to a Wikipedia search tool and completed the job unchanged. The process is dynamic, not scripted.

The agent loop

What "tools" means

An agent's "tools" are functions it can call: web search, API requests, code execution, file reading/writing, email sending, database queries. The agent decides which tool to call based on what it needs to do next. You define what tools are available; the agent chooses when to use them. When an LLM selects and calls a tool, it's called function calling. This is how an agent reaches beyond its training data and interacts with the real world. LLMs by themselves can't directly interact with external tools or databases in real time. Only agents can.

Memory: how agents remember

A plain LLM has no memory between calls: each conversation starts blank. Memory is what turns a stateless model into a persistent, learning agent. Agentic frameworks use three distinct memory types, and understanding them matters when you're deciding what your agent can actually do.

Short-term memory

Conversation history and intermediate results held in the context window during a single task run. The agent can refer back to what it already tried, what tools returned, and what decisions it made. Ends when the task ends; nothing carries over to the next run. Also called in-context memory.

Long-term memory (persistent)

Information stored externally and retrieved in future sessions. A client brief an agent processed last month can be recalled for a new task today. Also called persistent memory. This is what lets an agent personalise responses, build on past work, and avoid relearning the same context each time it runs.

External / vector memory

Large knowledge stores queried by semantic similarity: a vector database holds embeddings of your documents, past outputs, or client data. The agent queries it like a search engine, pulling only the relevant chunks into context. Common storage options: vector stores (Pinecone, Weaviate) for unstructured content, key/value stores (Redis) for fast structured lookups, and knowledge graphs (Neo4j) for complex relational data where connections between facts matter.

Standard AI vs agentic workflow

	Standard AI prompt	Agentic workflow
Input	Your question	Your goal
Steps	One response	Multiple planned steps
Tool use	No	Yes: web, APIs, files, databases
Memory	None between calls	Short-term + optional persistent
Adapts to failure	No	Yes: substitutes tools, retries steps
Human direction	Every exchange	Set goal, review output
Best for	Answering questions	Executing multi-step processes

Five common workflow patterns

Most production agentic systems are built from a small set of composable patterns. You can combine them. Anthropic's engineering team (who works with hundreds of production agent deployments) found that the most effective systems use the simplest pattern that gets the job done, not the most sophisticated one available.

Prompt chaining

Each LLM call feeds its output into the next. Good for tasks with a fixed sequence of steps where each step is an easier problem than tackling the whole. Each step can include a programmatic check before proceeding. Example: research topic → write outline → check outline meets criteria → write draft.

Routing

An initial LLM classifies the input, then routes it to a specialised downstream path. Allows each path to be optimised for its type of input without one path's requirements degrading another. Example: a support triage agent that routes billing questions, technical bugs, and feature requests to three separate specialised workflows.

Parallelisation

Multiple agents work on independent sub-tasks simultaneously, then their outputs are combined. Two variants: sectioning (breaking a task into parallel independent parts) and voting (running the same task multiple times, then aggregating results for higher confidence). Useful when sub-tasks don't depend on each other: auditing five client accounts in parallel rather than one at a time.

Orchestrator-workers

A central "orchestrator" LLM dynamically breaks down a complex task and delegates sub-tasks to specialist "worker" LLMs. The orchestrator synthesises their outputs. Unlike parallelisation, the sub-tasks aren't predefined; the orchestrator determines them based on the specific input. Best for open-ended tasks where the exact steps aren't known upfront: complex research, multi-file code changes, or analysing information from multiple sources.

Evaluator-optimiser

One LLM generates output; a second LLM critiques it and provides feedback; the first revises. The loop runs until the evaluator is satisfied or a step limit is hit. Works well when LLM responses demonstrably improve given structured critique: writing, translation, complex analysis. Analogous to the iterative editing process a human writer uses to reach a polished final draft.

Multi-agent systems

A single agent has one context window: it can only hold so much information at once and can only use one set of tools at a time. Multi-agent systems solve this by running multiple specialised agents in parallel or in sequence, each with its own role, tools, and memory slice. Tasks that would overflow a single agent's context become tractable when split across agents.

IBM describes two main multi-agent architectures:

Vertical: conductor + workers

A conductor agent (typically a more capable LLM) oversees the task, handles planning, and delegates to simpler specialist worker agents. Workers handle specific subtasks: data retrieval, code execution, email sending, document writing. Fast for sequential workflows with clear handoffs, but the conductor is a single point of failure and a potential bottleneck.

Horizontal: peer agents

Agents operate as equals, each contributing domain expertise. A research agent, a writing agent, and a fact-checker might collaborate on a content piece, each reviewing and passing work back to the others. More resilient than vertical because there's no single bottleneck, but slower and harder to coordinate cleanly.

Coordination and risk

The orchestration layer handles handoffs, passing task state between agents, monitoring progress, managing failures. As more agents work in series, the risk of cascading errors grows: a flawed output from one agent can corrupt every downstream agent that depends on it. IBM notes that multi-agent systems can also produce traffic jams, bottlenecks, and resource conflicts. Guardrails, human checkpoints, and maximum step limits are not optional: they're the architecture.

Orchestration frameworks

You don't build agentic workflows from scratch. These frameworks handle the plumbing (LLM calls, tool definitions, memory management, agent coordination) so you focus on the logic that's specific to your use case.

Framework	Best for	Style
LangChain	Chaining LLM calls with tools and retrieval	Code-first (Python / JS)
LangGraph	Stateful multi-agent graphs with cycles and branching	Code-first (Python)
CrewAI	Role-based multi-agent teams with defined tasks	Code-first (Python)
AutoGen	Conversational multi-agent collaboration (Microsoft)	Code-first (Python)
n8n	Visual workflow builder with AI node support	Low-code / visual

Anthropic's engineering team recommends starting with direct LLM API calls before adopting a framework. Frameworks add abstraction layers that can hide what's happening under the hood and make debugging harder. Start simple, graduate to a framework once you understand the shape of your problem.

Agency use cases

Weekly client reporting

Agent pulls data from Google Ads and GA4, writes a plain-English summary, flags anomalies, and drops a draft for account manager review. That replaces 2 hours of manual work per client per week.

Competitive monitoring

Agent searches competitor mentions across news and social, categorises by sentiment and relevance, and produces a weekly briefing without a human touching it until review. No dashboard to check, no news tabs to maintain.

Post-launch QA

Agent crawls every page of a newly launched site, checks for broken links, missing meta tags, slow images, and console errors, then produces a prioritised fix list before the account manager's morning coffee.

Content production pipeline

Brief goes in: agent researches the topic via web search, drafts a structure, writes a first draft, self-critiques for tone and accuracy (evaluator-optimiser pattern), and flags any claims it couldn't verify with a source. A human reviews the flagged items. The whole draft takes minutes, not a day.

Lead research and enrichment

Agent takes a prospect list, queries company websites and news sources for each name, and returns a scored profile (company size, recent news, tech stack signals, open roles) ready to push into your CRM. Replaces 20–30 minutes of manual pre-call research per prospect.

Proposal and SOW generation

Agent reads discovery call notes, retrieves relevant case studies from your knowledge base via vector search, maps the project scope to your service tiers, and assembles a first-draft proposal document. The account manager edits the judgement calls and signs off; the agent handles the assembly and research retrieval.

When not to use agentic workflows

Anthropic's engineering team is direct about this: "we recommend finding the simplest solution possible, and only increasing complexity when needed." Agentic workflows trade latency and cost for capability. That trade isn't always worth making.

Simple, deterministic tasks

If a single LLM call (or a simple rule) reliably does the job, adding an agent loop is unnecessary complexity and cost. Draft a one-paragraph summary? One call. Research and produce a 10-source briefing? Agent. The question to ask: does this task actually require planning and tool use, or does it just require good prompting?

Latency-sensitive operations

Each loop iteration means another LLM call. Multi-agent systems multiply that by the number of agents. A task that needs an answer in under two seconds can't wait for three rounds of planning and reflection. Use direct LLM calls for real-time, user-facing responses.

High cost per iteration

LLM API calls cost tokens. A multi-agent workflow that runs 20 tool calls and reflection passes to produce a report costs 20× the token budget of a single call. For high-volume, low-margin tasks (think bulk email personalisation at scale) the economics can make agents impractical.

Compounding errors in long pipelines

Each step in an agentic workflow can introduce error, and errors compound. In a 10-step pipeline, a wrong assumption at step 3 propagates through steps 4–10. For consequential outputs (client-facing documents, financial data, legal text) the autonomous nature of agents increases the blast radius of mistakes. Human review checkpoints within the workflow, not just at the end, are the mitigation.

Poorly specified goals

Agents optimise for the goal they're given. If the goal is underspecified (say, "increase engagement" instead of "increase replies from decision-makers") the agent will find ways to technically satisfy it that you didn't intend. IBM calls this reward hacking. Precise goal definition isn't optional; it's the most important design decision in any agentic system.

Frequently Asked Questions

Is agentic AI the same as AI automation?

Not quite. Traditional automation (like Zapier or Make) follows fixed, predetermined steps (if A then B). Agentic AI reasons through tasks: it can handle variation, make decisions mid-process, and recover from errors. You give an agent a goal, not a rigid script.

What tools enable agentic workflows?

The major options are OpenAI's Assistants API, Anthropic's Claude with tool use, LangChain/LangGraph, CrewAI, and AutoGen. Many of these are wrapped in no-code or low-code products like Zapier AI, n8n, and Dust. You don't need to be a developer to use agentic workflows at a basic level.

How much human oversight is needed for an agentic workflow?

It depends on what the workflow touches. For internal research or first-draft generation, minimal oversight is fine: you review the output before acting on it. For anything client-facing (sending emails, posting content, making API calls that affect live systems), you should build in a human review checkpoint. This is called 'human-in-the-loop' design.

What's the difference between an agentic workflow and a chatbot?

A chatbot responds to one message at a time and doesn't take independent action. An agentic workflow executes a multi-step process on your behalf: it calls APIs, writes files, searches the web, and works toward a goal without you directing each step.

Can agentic workflows replace account managers?

No. That's not the right framing. Agentic workflows eliminate the low-value repetitive work: pulling data, formatting reports, drafting first versions. Account managers still own the client relationship, the strategy, and any decision that requires judgement, empathy, or context an AI doesn't have.

Related Terms

Retrieval-Augmented Generation

An AI technique where the model searches your own documents or data before generating a response, so answers are grounded in your specific information, not just the model's training.