Skip to content
Build With AI
Back

What really is an agent? Build your first AI Agent from scratch

Yes, nobody is writing any code by hand these days. And yes, many devs are not even reading the code written by AI Models anymore. They do indeed seem to have gotten that good.

And yet, it is probably still a good idea to learn the fundamentals of what is happening, so you’re not completely at sea if the AI Gods do ask you to chose some architectural options. And indeed it has never been easier to learn - given that everyone now has access to a patient, knowledgeable tutor of their own.

And thus you should build your own AI Agent from scratch.

This was the advice my friend, and one of the brilliant co-founders of Hasura , where I used to work, had given me one fine evening.

Of course, I took his advice, and when I went through the exercise it unlocked intuition for me that made it so much easier for me to see what was going on.

And then recently I came across this wonderful blogpost from fly.io’s “Everyone Can Write an Agent” and figured I should build a Gemini API friendly version of this!

What follows is a tutorial created almost completely by Claude Code. This guides you on how to build an AI agent incrementally - starting from the simplest possible thing and adding features one at a time. Each step has complete, working code you can copy-paste and run.

But first things first…

What is an Agent really?

It’s been a long standing debate in technical circles on what an agent really means. But according to Simon Willison and Thomas Ptacek at fly an LLM Agent is basically (1) an LLM running in a loop that (2) uses tools.

What is an LLM running in a loop? What is a tool? What does it mean for an LLM to actually use a tool? Hope you build this intuition as you build this agent.

A note for non technical readers

Unfortunately this tutorial will be most helpful if you already know what is a terminal, a little bit of Python and understand what an API is. You’ll need to know how to open a code editor (like VSCode, Cursor etc) and run some Python code. It will also be helpful to work with an AI Coding Agent (like Cursor/Claude Code/ Antigravity).

Depending on where you start, all of this might sound intimidating. But I sincerely hope that you keep in mind that none of this stuff is out of your reach. Just ask your favourite AI App (ChatGPT, Gemini etc) to help you out with the setup, it might be a little frustrating, but you’ll get there.

And equally valid - feel free to skip this altogether. You really don’t need to know this at all!

One more thing to keep in mind… with all this effort, what you’ll end up with is actually pretty basic! This “agent” will look very prosaic, its just some bunch of text that’s not even easy to read (you’ll see a bunch of ## and other tags).

Hopefully I write something more useful for non-technical folks in the future. If you do try it out, I’d love to hear from you.


Prerequisites

Before you start, make sure you have:

  1. Python 3.7+ installed
  2. A Gemini API key from Google AI Studio
  3. A project folder with these files:
your-project/
├── .env
├── .venv/           (virtual environment)
└── agent.py         (we'll create this)

Quick Setup Commands

# Create project folder
mkdir my-agent && cd my-agent

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install requests python-dotenv

# Create .env file with your API key
echo "GOOGLE_API_KEY=your_api_key_here" > .env

Step 1: The Simplest Thing - Call the API Once

Let’s start with the absolute basics: send one message to Gemini, get one response back.

What We’re Building

A script that:

That’s it. No conversation, no tools, no loop.

The Code

Create a file called agent.py:

"""
Step 1: The simplest possible Gemini API call.
Just send one message and get one response.
"""

import os
import requests
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
API_KEY = os.getenv("GOOGLE_API_KEY")

# Gemini API endpoint
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

# The message we want to send
message = "What is the capital of France? Answer in one word."

# Build the request body
# Gemini expects a "contents" array with messages
body = {
    "contents": [
        {
            "role": "user",
            "parts": [{"text": message}]
        }
    ]
}

# Make the API request
response = requests.post(
    API_URL,
    headers={
        "Content-Type": "application/json",
        "x-goog-api-key": API_KEY
    },
    json=body
)

# Parse and print the response
data = response.json()
answer = data["candidates"][0]["content"]["parts"][0]["text"]
print(f"Question: {message}")
print(f"Answer: {answer}")

Run It

python agent.py

Expected Output

Question: What is the capital of France? Answer in one word.
Answer: Paris

What You Learned


Step 2: Multi-Turn Conversation

Now let’s add memory. The key insight from the fly.io blog:

“The conversation we’re having is an illusion we cast, on ourselves.”

The AI has zero memory between API calls. To have a conversation, WE must store all previous messages and send them every time.

What’s New in This Step

The Code

Replace your agent.py with this:

"""
Step 2: Multi-turn conversation.
The AI has no memory - we store the conversation and replay it each time.
"""

import os
import requests
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
API_KEY = os.getenv("GOOGLE_API_KEY")
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

# THIS IS THE KEY: We store the entire conversation here
# Every time we call the API, we send ALL of this
conversation_history = []


def chat(user_message):
    """Send a message and get a response."""

    # Add the user's message to history
    conversation_history.append({
        "role": "user",
        "parts": [{"text": user_message}]
    })

    # Call the API with the FULL conversation history
    response = requests.post(
        API_URL,
        headers={
            "Content-Type": "application/json",
            "x-goog-api-key": API_KEY
        },
        json={"contents": conversation_history}
    )

    # Get the AI's response
    data = response.json()
    assistant_message = data["candidates"][0]["content"]["parts"][0]["text"]

    # Add the AI's response to history too!
    conversation_history.append({
        "role": "model",  # Gemini uses "model" not "assistant"
        "parts": [{"text": assistant_message}]
    })

    return assistant_message


# Main chat loop
print("Chat with Gemini! Type 'quit' to exit.\n")

while True:
    user_input = input("You: ").strip()

    if user_input.lower() == 'quit':
        print("Goodbye!")
        break

    if not user_input:
        continue

    response = chat(user_input)
    print(f"AI: {response}\n")

Run It

python agent.py

Try This Conversation

You: My name is Alice.
AI: Nice to meet you, Alice! How can I help you today?

You: What's my name?
AI: Your name is Alice!

The AI “remembers” your name because we sent the entire conversation history with the second message.

What You Learned


Step 3: Add a Tool (Ping)

Now let’s give the AI the ability to do things in the real world. We’ll add a ping tool that checks if a website is reachable.

What’s New in This Step

The Code

Replace your agent.py with this:

"""
Step 3: Add a tool (ping).
We tell the AI about the tool, but don't handle tool calls yet.
"""

import os
import subprocess
import requests
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
API_KEY = os.getenv("GOOGLE_API_KEY")
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

conversation_history = []


# ============================================
# NEW: Define the ping function
# ============================================
def ping(host):
    """
    Ping a host to check if it's reachable.

    This actually runs the 'ping' command on your computer!
    """
    result = subprocess.run(
        ["ping", "-c", "4", host],  # -c 4 = send 4 pings
        capture_output=True,
        text=True,
        timeout=30
    )
    return result.stdout + result.stderr


# ============================================
# NEW: Describe the tool for the AI
# ============================================
# This tells Gemini: "You have access to this tool"
# The AI reads this description to understand when/how to use it
TOOLS = {
    "function_declarations": [
        {
            "name": "ping",
            "description": "Ping a hostname or IP address to check if it's reachable on the network. Use this when someone asks to check connectivity to a server or website.",
            "parameters": {
                "type": "object",
                "properties": {
                    "host": {
                        "type": "string",
                        "description": "The hostname (like google.com) or IP address (like 8.8.8.8) to ping"
                    }
                },
                "required": ["host"]
            }
        }
    ]
}


def chat(user_message):
    """Send a message and get a response."""

    conversation_history.append({
        "role": "user",
        "parts": [{"text": user_message}]
    })

    # NEW: Include tools in the request
    response = requests.post(
        API_URL,
        headers={
            "Content-Type": "application/json",
            "x-goog-api-key": API_KEY
        },
        json={
            "contents": conversation_history,
            "tools": [TOOLS]  # <-- Tell the AI about our tools
        }
    )

    data = response.json()
    parts = data["candidates"][0]["content"]["parts"]
    first_part = parts[0]

    # Check if the AI wants to use a tool
    if "functionCall" in first_part:
        # The AI wants to use a tool!
        function_call = first_part["functionCall"]
        print(f"\n[AI wants to call: {function_call['name']}({function_call['args']})]")
        print("[But we haven't implemented tool handling yet...]\n")
        return "I wanted to use a tool but my creator hasn't taught me how yet!"

    # Normal text response
    assistant_message = first_part["text"]
    conversation_history.append({
        "role": "model",
        "parts": [{"text": assistant_message}]
    })

    return assistant_message


# Main chat loop
print("Chat with Gemini! I know about the 'ping' tool but can't use it yet.")
print("Type 'quit' to exit.\n")

while True:
    user_input = input("You: ").strip()

    if user_input.lower() == 'quit':
        break

    if not user_input:
        continue

    response = chat(user_input)
    print(f"AI: {response}\n")

Run It

python agent.py

Try This

You: Can you check if google.com is reachable?

[AI wants to call: ping({'host': 'google.com'})]
[But we haven't implemented tool handling yet...]

AI: I wanted to use a tool but my creator hasn't taught me how yet!

The AI recognizes it should use the ping tool! It even figured out the right argument (host: google.com). But we haven’t written the code to actually run it yet.

What You Learned


Step 4: The Agentic Loop

Now for the magic: let’s actually run the tools and feed the results back to the AI.

This is called the “agentic loop”:

1. User says something
2. Send to AI
3. If AI wants to use a tool:
   a. Run the tool
   b. Send the result back to AI
   c. Go to step 2
4. If AI gives text response:
   → Show it to user, done!

The loop keeps running until the AI gives a final text answer.

What’s New in This Step

The Code

Replace your agent.py with this:

"""
Step 4: The Agentic Loop.
Now we actually RUN the tools and feed results back to the AI.
"""

import os
import subprocess
import requests
from dotenv import load_dotenv

load_dotenv()
API_KEY = os.getenv("GOOGLE_API_KEY")
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

conversation_history = []


def ping(host):
    """Ping a host to check if it's reachable."""
    result = subprocess.run(
        ["ping", "-c", "4", host],
        capture_output=True,
        text=True,
        timeout=30
    )
    return result.stdout + result.stderr


# Map tool names to actual functions
TOOL_FUNCTIONS = {
    "ping": ping,
}

TOOLS = {
    "function_declarations": [
        {
            "name": "ping",
            "description": "Ping a hostname or IP address to check if it's reachable on the network.",
            "parameters": {
                "type": "object",
                "properties": {
                    "host": {
                        "type": "string",
                        "description": "The hostname or IP address to ping"
                    }
                },
                "required": ["host"]
            }
        }
    ]
}


# ============================================
# NEW: Function to execute a tool
# ============================================
def handle_tool_call(function_call):
    """Actually run the tool the AI requested."""
    tool_name = function_call["name"]
    tool_args = function_call.get("args", {})

    print(f"  [Running: {tool_name}({tool_args})]")

    if tool_name in TOOL_FUNCTIONS:
        result = TOOL_FUNCTIONS[tool_name](**tool_args)
        return result
    else:
        return f"Unknown tool: {tool_name}"


def chat(user_message):
    """Send a message and get a response, handling tool calls."""

    conversation_history.append({
        "role": "user",
        "parts": [{"text": user_message}]
    })

    # ============================================
    # NEW: The Agentic Loop
    # Keep calling the API until we get a text response
    # ============================================
    while True:
        response = requests.post(
            API_URL,
            headers={
                "Content-Type": "application/json",
                "x-goog-api-key": API_KEY
            },
            json={
                "contents": conversation_history,
                "tools": [TOOLS]
            }
        )

        data = response.json()
        parts = data["candidates"][0]["content"]["parts"]
        first_part = parts[0]

        # Case 1: AI wants to use a tool
        if "functionCall" in first_part:
            function_call = first_part["functionCall"]

            # Add AI's tool request to history
            conversation_history.append({
                "role": "model",
                "parts": [{"functionCall": function_call}]
            })

            # Actually run the tool
            tool_result = handle_tool_call(function_call)

            # Add tool result to history
            # This is a special format Gemini expects
            conversation_history.append({
                "role": "user",
                "parts": [{
                    "functionResponse": {
                        "name": function_call["name"],
                        "response": {"result": tool_result}
                    }
                }]
            })

            # Loop again - AI will see the result and continue
            continue

        # Case 2: AI gave a text response - we're done!
        if "text" in first_part:
            assistant_message = first_part["text"]
            conversation_history.append({
                "role": "model",
                "parts": [{"text": assistant_message}]
            })
            return assistant_message

        return "Unexpected response format"


# Main chat loop
print("Chat with Gemini! I can now use the ping tool!")
print("Try: 'Can you check if google.com is reachable?'")
print("Type 'quit' to exit.\n")

while True:
    user_input = input("You: ").strip()

    if user_input.lower() == 'quit':
        break

    if not user_input:
        continue

    response = chat(user_input)
    print(f"AI: {response}\n")

Run It

python agent.py

Try This

You: Is google.com reachable?
  [Running: ping({'host': 'google.com'})]
AI: Yes, google.com is reachable! I pinged it 4 times and got responses
    with an average round-trip time of about 12ms. The connection looks good!

You: What about some-fake-website-12345.com?
  [Running: ping({'host': 'some-fake-website-12345.com'})]
AI: No, some-fake-website-12345.com doesn't appear to be reachable.
    The ping command couldn't resolve the hostname.

The AI used the tool, saw the results, and explained them to you in plain English!

What You Learned


Deep Dive: How Tools Actually Work

Before we add more tools, let’s make sure you understand something important:

The AI Never Runs Anything

The AI cannot touch your computer. It can only read and write text. When we say the AI “uses a tool”, here’s what actually happens:

┌─────────────────────────────────────────────────────────────┐
│  WHERE THINGS RUN                                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Google's Servers (far away)      Your Computer (right here)│
│  ┌─────────────────────┐          ┌─────────────────────┐   │
│  │                     │          │                     │   │
│  │  Gemini AI          │          │  agent.py           │   │
│  │  - Thinks           │          │  - Runs ping        │   │
│  │  - Reads text       │          │  - Runs bash        │   │
│  │  - Writes text      │          │  - Touches files    │   │
│  │                     │          │  - Full access!     │   │
│  │  (Cannot touch      │          │                     │   │
│  │   your computer)    │          │                     │   │
│  └─────────────────────┘          └─────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The Two Parts of Every Tool

Part 1: The Tool Declaration (a description for the AI)

# This is just TEXT that tells the AI the tool exists
# It never runs - it's like a menu at a restaurant
{
    "name": "ping",
    "description": "Ping a hostname to check connectivity...",
    "parameters": { ... }
}

Part 2: The Actual Function (code that does real work)

# This is the code that actually runs ON YOUR COMPUTER
def ping(host):
    result = subprocess.run(["ping", "-c", "4", host], ...)
    return result.stdout

The Flow Step by Step

You: "Can you ping google.com?"


┌─────────────────────────────────────────┐
│  We send to Gemini:                     │
│  - Your question                        │
│  - Tool descriptions ("you have ping")  │
└─────────────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Gemini thinks and responds:            │
│  {                                      │
│    "functionCall": {                    │
│      "name": "ping",                    │
│      "args": {"host": "google.com"}     │
│    }                                    │
│  }                                      │
│                                         │
│  (AI is just ASKING us to run ping)     │
└─────────────────────────────────────────┘


┌─────────────────────────────────────────┐
│  OUR CODE sees this request and runs:   │
│                                         │
│  subprocess.run(["ping", "-c", "4",     │
│                  "google.com"])         │
│                                         │
│  This runs ON YOUR COMPUTER             │
│  Result: "64 bytes from 142.250..."     │
└─────────────────────────────────────────┘


┌─────────────────────────────────────────┐
│  We send the result back to Gemini:     │
│  "Here's what ping returned: ..."       │
└─────────────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Gemini reads the result and responds:  │
│  "google.com is reachable! The ping     │
│   took about 12ms on average."          │
└─────────────────────────────────────────┘

The Lookup Table

This code connects the AI’s request to the actual function:

# When AI says "ping", we look up which function to call
TOOL_FUNCTIONS = {
    "ping": ping,           # AI says "ping" → run ping()
    "run_bash": run_bash,   # AI says "run_bash" → run run_bash()
}

def handle_tool_call(function_call):
    tool_name = function_call["name"]       # e.g., "ping"
    tool_args = function_call.get("args", {})  # e.g., {"host": "google.com"}

    # Look up and call the actual function
    func = TOOL_FUNCTIONS[tool_name]
    result = func(**tool_args)  # Same as: ping(host="google.com")

    return result

Key Insight

The AI is like a boss giving orders. Your code is the employee who actually does the work.

In Step 3, we ignored the boss’s orders. In Step 4, we started listening and doing what the boss asks.


Step 5: Add the Bash Tool (Full Power!)

Let’s add a second tool: bash. This lets the AI run ANY shell command.

What is Bash?

Bash is the command line / terminal on Mac and Linux. Instead of clicking buttons, you type text commands. Examples:

CommandWhat it does
lsList files in current folder
pwdPrint current folder path
dateShow current date and time
whoamiShow your username
cat file.txtDisplay contents of a file
echo "hello"Print “hello”
python --versionCheck Python version
curl https://example.comFetch a webpage

Why This is Powerful (and a bit scary!)

The bash tool can run any command. That means:

You: "Delete all files in the current folder"

AI thinks: "That's rm -rf *"
AI responds: {"functionCall": {"name": "run_bash", "args": {"command": "rm -rf *"}}}

Your code runs: subprocess.run("rm -rf *", shell=True)

Your files are gone!

The AI doesn’t have judgment about what’s safe. It just does what you ask.

That’s why we’ll add safety checks in our code to block dangerous commands.

What’s New in This Step

The Code

Replace your agent.py with this:

"""
Step 5: Add the bash tool.
Now our agent can run any shell command!
"""

import os
import subprocess
import requests
from dotenv import load_dotenv

load_dotenv()
API_KEY = os.getenv("GOOGLE_API_KEY")
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

conversation_history = []


# ============================================
# Tool Functions
# ============================================

def ping(host):
    """Ping a host to check if it's reachable."""
    result = subprocess.run(
        ["ping", "-c", "4", host],
        capture_output=True,
        text=True,
        timeout=30
    )
    return result.stdout + result.stderr


def run_bash(command):
    """
    Run a bash command with safety checks.

    We block dangerous commands to prevent accidents!
    """
    # Safety check - block dangerous commands
    dangerous_patterns = [
        "rm -rf",      # Delete files recursively
        "rm -r /",     # Delete root
        "sudo",        # Superuser commands
        "mkfs",        # Format disks
        "> /dev",      # Write to devices
        "dd if=",      # Low-level disk operations
        "chmod -R 777",# Dangerous permissions
        ":(){:|:&};:", # Fork bomb
    ]

    for pattern in dangerous_patterns:
        if pattern in command.lower():
            return f"BLOCKED: I won't run commands containing '{pattern}' for safety reasons."

    result = subprocess.run(
        command,
        shell=True,
        capture_output=True,
        text=True,
        timeout=60
    )
    return result.stdout + result.stderr


# Map tool names to functions
TOOL_FUNCTIONS = {
    "ping": ping,
    "run_bash": run_bash,
}


# ============================================
# Tool Declarations (tell the AI what's available)
# ============================================
TOOLS = {
    "function_declarations": [
        {
            "name": "ping",
            "description": "Ping a hostname or IP address to check network connectivity.",
            "parameters": {
                "type": "object",
                "properties": {
                    "host": {
                        "type": "string",
                        "description": "The hostname or IP address to ping"
                    }
                },
                "required": ["host"]
            }
        },
        {
            "name": "run_bash",
            "description": "Run a bash shell command. Use for: listing files (ls), checking current directory (pwd), reading files (cat), getting system info, running scripts, etc.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "The bash command to execute"
                    }
                },
                "required": ["command"]
            }
        }
    ]
}


def handle_tool_call(function_call):
    """Execute the tool the AI requested."""
    tool_name = function_call["name"]
    tool_args = function_call.get("args", {})

    print(f"  [Running: {tool_name}({tool_args})]")

    if tool_name in TOOL_FUNCTIONS:
        try:
            result = TOOL_FUNCTIONS[tool_name](**tool_args)
            return result
        except Exception as e:
            return f"Error: {str(e)}"
    else:
        return f"Unknown tool: {tool_name}"


def chat(user_message):
    """Send a message and get a response, handling tool calls."""

    conversation_history.append({
        "role": "user",
        "parts": [{"text": user_message}]
    })

    while True:
        response = requests.post(
            API_URL,
            headers={
                "Content-Type": "application/json",
                "x-goog-api-key": API_KEY
            },
            json={
                "contents": conversation_history,
                "tools": [TOOLS]
            }
        )

        data = response.json()

        # Handle API errors
        if "candidates" not in data:
            print(f"API Error: {data}")
            return "Sorry, there was an API error."

        parts = data["candidates"][0]["content"]["parts"]
        first_part = parts[0]

        if "functionCall" in first_part:
            function_call = first_part["functionCall"]

            conversation_history.append({
                "role": "model",
                "parts": [{"functionCall": function_call}]
            })

            tool_result = handle_tool_call(function_call)

            conversation_history.append({
                "role": "user",
                "parts": [{
                    "functionResponse": {
                        "name": function_call["name"],
                        "response": {"result": tool_result}
                    }
                }]
            })
            continue

        if "text" in first_part:
            assistant_message = first_part["text"]
            conversation_history.append({
                "role": "model",
                "parts": [{"text": assistant_message}]
            })
            return assistant_message

        return "Unexpected response format"


# ============================================
# Main Loop
# ============================================
print("=" * 50)
print("GEMINI AGENT - Now with ping AND bash!")
print("=" * 50)
print()
print("Try these:")
print("  'What files are in the current directory?'")
print("  'What is my username?'")
print("  'What time is it?'")
print("  'Check if github.com is reachable'")
print("  'What Python version do I have?'")
print()
print("Type 'quit' to exit, 'clear' to reset conversation.")
print()

while True:
    user_input = input("You: ").strip()

    if user_input.lower() == 'quit':
        print("Goodbye!")
        break

    if user_input.lower() == 'clear':
        conversation_history = []
        print("[Conversation cleared]\n")
        continue

    if not user_input:
        continue

    response = chat(user_input)
    print(f"AI: {response}\n")

Run It

python agent.py

Try These

You: What files are in the current directory?
  [Running: run_bash({'command': 'ls'})]
AI: Here are the files in your current directory:
    - agent.py
    - .env
    - .venv/
    ...

You: What's the current date and time?
  [Running: run_bash({'command': 'date'})]
AI: The current date and time is Monday, January 20, 2025 at 4:30 PM.

You: What Python version do I have?
  [Running: run_bash({'command': 'python --version'})]
AI: You have Python 3.11.5 installed.

You: Can you read the .env file?
  [Running: run_bash({'command': 'cat .env'})]
AI: Your .env file contains: GOOGLE_API_KEY=AIza...

What You Learned


Summary: What We Built

Step 1: Call API once          →  "Hello world" of AI
Step 2: Conversation memory    →  Multi-turn chat (the "illusion")
Step 3: Define a tool          →  Tell AI about ping
Step 4: Agentic loop           →  Actually run tools, feed results back
Step 5: Add bash               →  Full power shell access

The entire agent is about 300 lines of code. The fly.io blog was right:

“You’d be surprisingly close to having a working coding agent.”

Key Concepts

  1. The AI is stateless - WE manage conversation history
  2. Tools are just descriptions - AI says what to call, WE run it
  3. The agentic loop - Keep calling AI until it gives a text answer
  4. Tools are powerful - With bash, you have a real assistant

What’s Next?

Ideas to extend your agent:

Congratulations - you’ve built an AI agent!



Next Post
Why I Find Coding Agents Easier than n8n