AI agents, AI agents everywhere. This is one of the most popular and quickly evolving technologies out there. I'm not sure about you, but to me it seems like everyone is trying to use AI for literally everything: collecting data, writing letters, booking hotels, and even shopping. While I still prefer doing many of these things manually, automating boring tasks seems really tempting. Thus, in this article, we're going to see how to automate browser interactions with the help of BrowserUse.
We'll write a few Python scripts to achieve some impressive results: scrape data, logging into websites, sending emails, and assigning todo cards. Finally, we will learn how to start a web interface allowing users to write their own prompts without coding knowledge.
So, let's dive in!
What are AI agents?
I'm sure you've heard the term AI agents thrown around. Yeah, it's a new buzzword but what does it actually mean?
To put it simply, an AI agent is a digital assistant that can observe and act on its own without you having to micromanage every step. Instead of following a script (like a "classic" auto-test program), it can adapt to different situations, making it more flexible. In other words, you don't have to spell out every minor detail: agents are somewhat clever. Though, I'd say that any software is only as smart as the person who created it.
To summarize, we have:
- Basic scripts with very specific steps. It's like a train that follows the same track every time.
- AI agents that (to a certain extent) can make their own decisions. It's like a self-driving car that checks the road, avoids obstacles, and adjusts its path accordingly.
In the context of browser automation, AI agents can:
- Navigate websites like us humans typically do. They click buttons, scroll, read text, and flood on forums.
- Handle pop-ups or layout changes (yes, include your "favorite" annoying cookie consent pop-ups).
- Recognize and extract information from complex or poorly organized pages.
- Make decisions based on the collected data.
AI agents can understand things on the fly: just like you would. Still, that doesn't mean they can do everything on their own. Agents need structured prompts that clearly explain the task and provide the necessary steps.
Why automate browser interactions?
So, why would you want to automate your browser interactions in the first place? After all, we've been using the internet for years without any issues. Why use some obscure AI tools?
When it comes to tedious and time-consuming tasks, AI browser automation absolutely makes sense. Think about it this way: AI should help you file taxes, not write a novel for you (well, unless you really need to have it written asap). It's here to handle the boring stuff so you don't have to. Or, at least, we're getting there. Slowly but steadily.
Here’s what it can do for you:
Auto-filling forms – Filling out the same form over and over? Job applications, registrations, or surveys — you name it. Automation can save you quite a bit of time.
Scraping useful info – Need to collect prices, stock availability, or news updates? AI can fetch and organize data way faster than you could manually. For example, in one of my previous articles I've demonstrated how to scrape Amazon with the help of AI.
Checking for updates – Waiting for tickets to drop or a product to restock? AI can revisit the page for you on a regular basis and notify you when something changes.
Managing social media – Scheduling posts, auto-liking, or filtering content? AI can handle routine tasks while you're focusing on the important stuff or just chilling.
Bulk downloading – If you need to download a bunch of files but clicking one by one is a nightmare, automation can grab them all at once.
Filtering & sorting data – Need to extract only the useful bits from a cluttered website? AI can pull what you need and skip the rest.
Automating logins – Logging into multiple accounts across different sites? A simple script can do it for you safely.
Quick demo
Check out this short GIF to see the AI agent in action. In this example, it summarizes the latest revisions of a Wikipedia page, logs into Trello, creates a new to-do card, provides information about the edits, and assigns a reviewer.
We'll see how to create the corresponding script later in this article.
But how does it really work?
Now that's a fair question! How in the world can AI do something on the internet — open pages, perform searches, fill out forms — if it literally doesn't have hands to click buttons?
Well, as long as we're using browsers to surf the web, and these browsers are just regular software, AI can interact with them programmatically. This means it can perform the same actions a human user would, just without physically clicking anywhere. Instead, AI sends commands to the browser telling it what to do. Just like how you might give instructions to a friend over the phone.
At its core, browser automation works by using scripts to control a web browser. It can open sites, visit different pages, click buttons, work with forms, extract texts and images, and handle various interactive elements.
Actually, this isn't a new concept. Developers have been using browser automation for years to run automated tests. These are just scripts that check if a website works as expected. Such tests simulate user actions like logging in, making a purchase, or searching for content. AI agents build on this existing functionality but take it further by adding cool decision-making capabilities.
Prerequisites for AI browser automation
Before we dive into coding, let's make sure you have the necessary tools and a bit of background knowledge. You don't need to be an expert, but basic Python skills like installing packages, writing simple scripts, and using the terminal will definitely help.
Python installed – Python 3.9 or later should work. Check your version by running
python --version
orpython3 --version
in your terminal.A terminal or command-line tool – You'll need this to run scripts and install dependencies.
A code editor – Any editor will do, but something like VS Code or PyCharm is recommended.
Playwright – We'll be using Playwright to automate browser interactions. Follow the official setup guide to install and configure it.
Poetry (optional) – I'll be using Poetry to manage dependencies and the Python environment. It's not mandatory, but it can make dependency management smoother. Take your pick as there are other similar managers available.
Create and configure a new project
Let's start by creating a new Poetry project. First, initialize a new project and navigate to the newly created directory.
poetry new browser-demo
cd browser-demo
Next, install browser-use
as a dependency. This package allows us to automate browser interactions with AI.
poetry add browser-use
Or
pip install browser-use
playwright install
BrowserUse supports multiple AI engines. For this guide, we'll be using ChatGPT by OpenAI. To work with it, you'll need an API key. Visit platform.openai.com/api-keys, click Create new secret key, choose Owned by – you, give it a name, then click Create secret key and copy its value.
Now, return to your Python project and create a .env
file in the root directory. Add the OpenAI API key inside.
OPENAI_API_KEY=YOUR_KEY_HERE
Finally, let's add some boilerplate code to the src/browser-demo/main.py
file.
from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
from dotenv import load_dotenv
load_dotenv()
async def main():
task = ""
agent = Agent(
task=task,
llm=ChatOpenAI(model="gpt-4o"),
)
await agent.run()
input('Press Enter to close...')
asyncio.run(main())
This script initializes an AI agent, sets up a task
variable for the instructions, configures the model, and starts the agent. Of course, currently the agent won't do anything, therefore let's assign it some real work!
Customizing browser
After installing Playwright, you'll have a default Chrome installation ready to use. However, you might prefer to use the up-to-date Chrome already installed on your PC. This can be done by customizing the Agent
in your script.
To achieve this, you need to specify the path to your Chrome executable and pass it to the Agent
. Adjust the path as needed based on where Chrome is installed on your system.
# ... other imports ...
from browser_use.browser.browser import Browser, BrowserConfig
load_dotenv()
chrome_path = r'C:\Program Files\Google\Chrome\Application\chrome.exe'
browser = Browser(
config=BrowserConfig(
chrome_instance_path=chrome_path,
)
)
async def main():
task = ""
agent = Agent(
task=task,
llm=ChatOpenAI(model="gpt-4o"),
browser=browser,
)
await agent.run()
await browser.close()
input('Press Enter to close...')
asyncio.run(main())
Using a custom browser can be beneficial not just because it's regularly updated, but also because it may already have your login details saved, making automation smoother and more convenient.
Example: Summarize product information and send an email via GMail
As our first example, let's create a task for the AI agent to scrape and summarize shopping information and send it via email. While this might sound like a complex goal, it's actually quite manageable if you know how to structure a proper prompt.
Suppose, we're choosing a new microphone for podcasting. After all, I do record videos quite often so having a solid microphone sounds like a good idea!
Writing AI prompt
A prompt is simply a set of instructions given to the AI. Writing good prompts is a skill in itself, and there are even specialized resources dedicated to it. As someone once said, "English will become the next big programming language" which is quite true, since AI instructions are typically written in plain English.
One important thing to understand: while AI might interact like a human, it's still a complex computer program. This means you need to be quite specific about what you want it to do. Vague or incomplete instructions can lead to incorrect or failed tasks, especially for more complex operations.
So, let's make sure we provide clear and detailed instructions:
# ...
async def main():
google_login = os.getenv("GOOGLE_LOGIN")
google_password = os.getenv("GOOGLE_PASSWORD")
task = f"""
### **AI Agent Task: Podcast Microphone Pricing Summary & Email Report via Gmail**
#### **Objective:**
Gather pricing information on available podcast microphones from [Microcenter](https://www.microcenter.com) and generate a summarized report. Send the report via Gmail.
---
### **Step 1: Retrieve Pricing Information**
1. Open [Microcenter](https://www.microcenter.com).
- Wait until the page **fully loads** before proceeding.
- If loading takes too long, retry **up to 3 times** before failing.
2. Locate the **search box** in the top menu.
- It should have a **magnifying glass ("lens") icon** next to it.
- If the search box is not found, **log an error and stop execution**.
3. Click inside the search box and type **"podcast microphone"**, then **press Enter**.
- If the search button needs to be clicked manually, **click the magnifying glass icon**.
- Wait for the search results to load before continuing.
4. Identify the **search results area**.
- If no products are listed, **log an error and stop execution**.
- Scroll down if necessary to load additional results.
5. **Verify at least 1 product is found before proceeding.**
- If no products are found, **log an error and stop execution**.
---
### **Step 2: Summarize Pricing**
1. Collect pricing information for **up to 5 microphones**.
- Extract **name, price, rating, and availability**.
2. Format the summary as **clean, readable text**.
- Each microphone should be listed **in a separate paragraph**.
- Avoid raw data dumps—this should look like a human-generated report.
3. **Verify that the summary is not empty** before proceeding.
- If the summary is empty or incomplete, **log an error and stop execution**.
---
### **Step 3: Send the Report via Gmail**
1. Open [Gmail](https://mail.google.com/).
- Wait for the page to **fully load** before proceeding.
2. Log in using the provided credentials:
- **Email:** `{google_login}`
- **Password:** `{google_password}`
- If login fails, **log an error and stop execution**.
3. Click **Compose** to start a new email.
- Wait for the compose window to appear before continuing.
4. In the **To** field, enter `{google_login}`.
- After typing, Gmail may display a **suggested email selection below the field**.
- **You must click on `{google_login}`** in this suggestion box to confirm the recipient.
- **Verify that `{google_login}` appears in the To field** before continuing.
- If the recipient is not confirmed, **log an error and stop execution**.
5. In the **Subject** field (located below the **To** field), enter:
- `Microcenter: available microphones`
6. In the **Email Body**, enter the summarized information from Step 2.
- **The Email Body is the large, empty text area located directly below the Subject field.**
- Ensure the **email is formatted for human readability**.
- Use **proper paragraphs** and **line breaks** (press **Enter**, avoid inserting `\n`).
- **Verify that the text appears correctly in the email body** before proceeding.
7. Click **Send** to complete the process.
- Wait for a **confirmation message** that the email was sent successfully.
- If sending fails, **log an error and stop execution**.
---
### **Key Requirements & Error Handling**
- **Ensure the search results page actually loads** before scraping data.
- **Verify extracted pricing details** before proceeding to email.
- **Do not skip required interactions** (clicking, typing, confirming selections).
- **Log errors properly** instead of continuing with missing data.
- **Handle login credentials securely**, as they are highly sensitive.
"""
# ... other code here ...
In this example, we're asking the agent to:
- Visit the Microcenter website.
- Search for podcasting microphones.
- Summarize fetched information.
- Open Gmail, log in with the provided credentials, and compose an email containing the summary.
However, AI can sometimes struggle with certain actions unless you're quite specific. For example, it failed to enter a recipient when I simply said, "enter this value into the To field." Being explicit in every step is crucial for success. Sometimes you might need to try different prompts and see which performs better.
My Google login and password are stored in a .env
file, so be sure to add yours there:
GOOGLE_LOGIN=YOUR_LOGIN
GOOGLE_PASSWORD=YOUR_PASSWORD
Seeing it in action
Now that we've set everything up, run the script:
poetry run python src\browser_demo\main.py
Before running it, make sure you don't have an active Chrome instance open.
You'll see the browser open automatically, AI navigating to the specified page, browsing it (and even scrolling!), and then composing a new email.
It might feel like magic, but it works surprisingly well. Brilliant job!
Example: Checking page history and create a Trello card
Now let's try something different. Suppose we're working on a Wikipedia article and want to create a todo task on Trello for recent edits.
Planning the steps
Which steps should we perform?
- Open English Wikipedia.
- Find the article we're working on (let's say "Artificial Intelligence").
- Open the edit history and locate the latest revisions.
- Log into Trello.
- Open the relevant board.
- Create a new card (aka "todo").
- Add relevant details about the edits.
- Assign the task to a reviewer.
- Save the changes.
Not too shabby, right? Luckily, it's pretty straightforward to automate. Let's get to writing a new prompt.
Writing the prompt
Open your Python script and adjust the task
variable accordingly.
async def main():
google_login = os.getenv("GOOGLE_LOGIN")
google_password = os.getenv("GOOGLE_PASSWORD")
task = f"""
### **AI Agent Task: Wikipedia Edit History & Trello Card Creation**
#### **Objective:**
Retrieve the last three edits from the **"Artificial intelligence"** article on [Wikipedia](https://en.wikipedia.org) and create a **new Trello card** on [Trello](https://trello.com) with links to these revisions for review.
---
### **Step 1: Retrieve Recent Edits from Wikipedia**
1. Proceed to [English Wikipedia](https://en.wikipedia.org) website.
2. On Wikipedia, use the **search bar** to find the article **"Artificial intelligence"**.
3. Open the article and navigate to the **"View history"** tab.
4. Locate the last **three revisions** in the history log.
- Each revision includes the **date, username, and change summary**.
5. Extract links to these three most recent edits.
---
### **Step 2: Create a Trello Card**
1. Open [Trello](https://trello.com).
2. Locate the **"Ilya Demo Board"** on the main page.
- If the board is not visible, navigate to the **login page** and log in using:
- **Email:** {google_login}
- **Password:** {google_password}
3. Once logged in, open **"Ilya Demo Board"**.
4. Find the **"To-Do"** list and click **"Add a card"**.
5. In the title field, enter: **"Artificial Intelligence Edits"** and press **Enter** to save.
6. Click on the newly created card to open it.
7. In the **Description** field, add a brief summary of the last three revisions, including the extracted links.
- The **Description field** will initially be empty with a placeholder like *"Need formatting help?"*.
- Use unordered list. Note that Trello might automatically insert bullet points when proceeding to the next list item.
8. Click the **"Members"** button to assign the task.
9. In the **Board members** list, locate **Ilya** and click on the name.
10. Confirm that **Ilya's avatar** (I) appears under the **Members** section.
11. Click **Save** to store the changes.
---
### **Key Requirements:**
- Ensure all extracted **revision links are accurate and relevant**.
- Keep the **Trello card description clear and well-structured**.
- **Handle login credentials securely**, as they are sensitive.
"""
Make sure to adjust your credentials and board details as needed. That's it!
Using web-based UI
Writing prompts for AI directly in the Python script can be a bit tedious, but luckily, we can use the web UI to make things easier.
Installing web UI for browser automation
To get started, make sure you have Git installed on your PC, then run the following commands to clone and navigate to the web UI repository.
git clone https://github.com/browser-use/web-ui.git
cd web-ui
Next, install all required dependencies. This might take a little time, depending on your system.
pip install -r requirements.txt
Once the installation is complete, copy .env.example
to .env
.
# Windows
copy .env.example .env
# nix
cp .env.example .env
Inside the .env
file, paste your OpenAI API key that we obtained earlier.
OPENAI_API_KEY=YOUR_KEY
You can also adjust other configuration values as needed.
Running web UI
Now you're all set! Start the web UI locally on port 7788
.
python webui.py --ip 127.0.0.1 --port 7788
Once it's running, open your browser and go to http://127.0.0.1:7788/
. You'll see the following page:
Here, you can configure various settings. For example, you can adjust the maximum number of steps the agent will attempt. You can also tweak your language model (LLM) settings under the corresponding tab, though the default configuration should work fine.
Example: Browsing cinema schedule
Now, switch to the Run agent tab:
Here, you'll see a text box where you can enter a prompt.
For example, suppose I'd like to go to a local cinema tomorrow to watch a new animation film, "Straume". If you haven't heard about it, it's a Latvian animated film about a cute black cat. What makes it unique is that it has no voice-over, relying entirely on music and visual storytelling to guide the viewer through the story.
So, let's enter the following prompt to find its showtimes.
### **AI Agent Task: Find Cinema Showtimes for "Flow" (Straume)**
#### **Objective:**
Check the local cinema schedule for the movie **"Flow" (original title: "Straume")**, and retrieve available showtimes for tomorrow.
---
### **Step 1: Access the Cinema Website**
1. Open [Forum Cinemas Latvia](https://www.forumcinemas.lv/eng/).
---
### **Step 2: Find the Movie "Flow" (Straume)**
1. Click on the "Today" dropdown in the menu and select "Tomorrow" from the options list.
1. In the list find the film titled **"Flow"** (original title: "Straume").
2. Open the movie's information page to access details such as its description, duration, and screening schedule.
---
### **Step 3: Retrieve Showtimes for Tomorrow**
1. Locate the **schedule or showtimes section** on the movie's page.
2. Extract the **available screening times** along with relevant details such as:
- Cinema hall (if available).
- Any special formats (e.g., IMAX, 3D, VIP).
---
### **Step 4: Return the Information**
1. Provide a **clear and structured summary** of the available showtimes.
2. If no showtimes are available for tomorrow, return a message stating that no screenings are scheduled.
3. Ensure accuracy by verifying that the extracted data corresponds to the correct date and movie title.
---
### **Key Requirements:**
- Confirm that the retrieved showtimes are specifically for **tomorrow**.
- Provide the information in a **clean and readable format**.
- If any steps fail (e.g., the movie is not listed, or no showtimes exist), return an appropriate message.
So, I'm asking it to visit the cinema website, switch to the English version (for the readers' convenience), and find information about the film.
Once you're ready, hit Run Agent.
The browser will open automatically, and the AI agent will perform all the steps as needed. It will even create a GIF file (agent_history.gif
) in the project root, showing its reasoning and actions in real-time. Pretty neat!
🤖 Having trouble scraping websites that end up blocking you? Bypass anti-web scraping technology such as Cloudflare with our guide on Web Scraping without getting blocked
Conclusion
So, in this post we explored AI-driven browser automation using BrowserUse. We've seen how to configure an AI agent and write effective prompts to automate tedious tasks.
Hopefully, by now you're feeling a bit more confident about AI usage and understand that it's not some kind of magic that only the chosen ones can use.
Thank you for staying with me, and until next time!

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.