FieldWorkArena
An evaluation environment for video analytics AI agent service.
Install / Use
/learn @FujitsuResearch/FieldWorkArenaREADME
FieldWorkArena
Overview
The introduction of AI agents is being considered to address the challenges faced by many workplaces, such as the aging of the population, lack of human resources, and delays in decision-making. In order to improve the functionality of AI agents, we have developed and provided a benchmark suite to evaluate AI agents by extending the evaluation method of web operations to field operations.
FieldWorkArena is a groundbreaking benchmark suite for evaluating AI agents. By using data and tasks from Fujitsu's actual factories and warehouses, we quantitatively evaluate how effectively AI agents work in the field. This clarifies the challenges of AI adoption and ensures evidence when applied in the field.
See below for more details.
https://en-documents.research.global.fujitsu.com/fieldworkarena/
Update
- 2025-06-30: The Retail dataset has been released on Hugging Face. If you would like to obtain it, please apply here.
- 2025-06-30: The Warehouse dataset has been released on Hugging Face. If you would like to obtain it, please apply here.
- 2025-02-27: The Factory dataset has been released on Hugging Face.
Getting Started
The current reporting functionality of FieldWorkArena utilizes
Browsergym and WorkArena. Therefore, it is necessary to use ServiceNow instance in this implementation.
In the future, the implementation may change in line with modification to the action space and task definitions.
Create ServiceNow Instance
- Go to https://developer.servicenow.com/ and create an account.
- Click on
Request an instanceand select theWashingtonrelease (initializing the instance will take a few minutes), If you can't select release, once you request an instance for default release, doRelease instanceand clickRequest an instanceagain. - Once the instance is ready, you should see your instance URL and credentials. If not, click Return to the Developer Portal, then navigate to Manage instance password and click Reset instance password.
- You should now see your URL and credentials. Based on this information, set the following environment variables:
SNOW_INSTANCE_URL: The URL of your ServiceNow developer instanceSNOW_INSTANCE_UNAME: The username, should be "admin"SNOW_INSTANCE_PWD: The password, make sure you place the value in quotes "" and be mindful of escaping special shell characters. Runningecho $SNOW_INSTANCE_PWDshould print the correct password.
- Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics).
Warning: Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process.
Install FieldWorkArena and Initialize your instance
git clone https://github.com/FujitsuResearch/FieldWorkArena.git
cd FieldWorkArena
pip install -r requirements.txt
pip install .
Then, install Playwright
playwright install
Finally, run this command in a terminal to upload the benchmark data to your ServiceNow instance:
workarena-install
Download dataset
- Go to https://en-documents.research.global.fujitsu.com/fieldworkarena/ .
- Click link on
Evaluation datasetand apply from Forms page, - Confirm the download URL in email sent from FieldWorkArena. (It may take a few business days.)
- Unzip downloaded file. The files should be organized in the following directory structure:
FieldWorkArena \
├── ...\
├── data\
│ ├── document \
│ ├── image\
│ └── movie\
└── ...
Use Sample Agent
OpenAI API setting (for demo agent)
set environment variable
OPENAI_API_KEY=You OpenAI API key
Demo
To run the demo, Xserver environment is required.
In these demos, the tasks is to search for incidents in the image according to the query and to report any incidents found.
python demo/run_demo.py --task_name fieldworkarena.demo.1.report
python demo/run_demo.py --task_name fieldworkarena.demo.2.report
python demo/run_demo.py --task_name fieldworkarena.demo.3.report
python demo/run_demo.py --task_name fieldworkarena.demo.4.report
Benchmark
Run the following script, the results will be saved in the results directory.
Linux
All tasks
bash run_tasks.sh all
Each tasks
# for factory
bash run_tasks.sh factory
# for warehouse
bash run_tasks.sh warehouse
# for retail
bash run_tasks.sh retail
Windows
All tasks
.\run_all_tasks.bat all
Each tasks
# for factory
.\run_tasks.bat factory
# for warehouse
.\run_tasks.bat warehouse
# for retail
.\run_tasks.bat retail
Test Your Agent
Edit Agent
Agent is defined in 'demo/agent.py'. For testing your agent, you should mainly modify 'get_action()' method.
<span style="color: red; ">Attention!!</span>
After you added your own agents and scenarios, Plase call pip install . .
Submit Your Result
Compress the results directory and reply it to the email address with the download URL of the evaluation data .
Inquiries and Support
To submit an inquiry, please follow these steps:
- Visit our page
- Click the "Inquiry" button on the bottom.
- Fill out the form completely and accurately.
It may take a few business days to reply.
Acknowledment
This implementation was created with reference to the source code for WorkArena, developed by ServiceNow Research.
- github: https://github.com/ServiceNow/WorkArena
- arxiv:
Trouble shooting
When the browser launched and the proxy auth dialog blocks the startup, please install chrome extension "Proxy Helper". After that, fill PAC URL and your account/password.
https://chromewebstore.google.com/detail/%E3%83%97%E3%83%AD%E3%82%AD%E3%82%B7%E3%83%BC%E3%83%98%E3%83%AB%E3%83%91%E3%83%BC/mnloefcpaepkpmhaoipjkpikbnkmbnic?hl=ja&pli=1
