Exploring Microsoft’s Copilot Actions: Automating Web Tasks and Making Reservations – An Early Look

At Microsoft’s Build conference in April, AI agents took center stage, with Microsoft Copilot Actions being a standout feature. This tool, integrated with various third-party web services, operates as an AI that acts on your behalf rather than merely responding to your prompts. It is now accessible for free usage by anyone, enabling it to perform tasks such as booking show tickets or making restaurant reservations. Google’s Project Mariner promises a similar service.
While Copilot Actions offers an enticing glimpse into the future of AI technology, its current state reveals it isn’t quite ready for mainstream adoption – interacting with websites and services directly remains quicker in many cases. Nonetheless, I encourage readers to explore Copilot Actions, as it may eventually become the standard method for web interaction.
This article focuses on the consumer version of Copilot Actions, which aims to automate everyday tasks. To use this feature, sign in to your Microsoft account and navigate to the web-based Copilot. Click within the prompt text box and select ‘Action’. However, please note that this feature is unavailable in the EU due to privacy regulations.
Accounts receive a limited number of interactions, but a Copilot Pro subscription offers more. The specific limit isn’t disclosed by Microsoft, though my testing resulted in being cut off after four sessions using a free account.
Once you select ‘Action’, Copilot provides suggestions below the text entry box. The feature supports integration with various public websites, except for those deemed harmful, illegal, or offensive. Although the microphone icon remains visible in the input box when switching to Action mode, voice functionality is not supported during this time.
To initiate an action, simply ask Copilot to perform a specific task on a designated website. If no site is specified, Copilot uses Bing for search purposes. I tested a dinner reservation at a Japanese restaurant nearby using OpenTable as an example – it’s essential to be as detailed as possible with your request.
Following the entry of the prompt, watch as Copilot takes action, opening a new web browser window alongside the original Copilot browser. This cloud-based virtual machine-driven browser supports multiple tabs and handles clicks and entries based on your prompt. Ultimately, Copilot completes the reservation process on OpenTable after entering personal information and responding to verification codes sent to the phone number provided.
One peculiarity of this interaction was that Copilot wasn’t aware of my location in Action mode, assuming I was in Chicago instead. This may be due to the virtual machine running the Action’s browser being located in a different region. In standard Copilot mode, however, it correctly identified my location and even recommended local restaurants, albeit unable to make reservations itself.
During another test, I asked Copilot to find and purchase a recent book on Barnes & Noble’s website. Upon entering “literary” as my preferred genre, Copilot suggested the 2024 best-selling novel by Chris Whitaker called All the Colors of the Dark.
Users can take control of Copilot’s virtual web browser at any time to enter necessary information, such as a phone number or personal details. Although the AI’s chat panel on the right can often be used for this purpose, direct page interaction may occasionally be required.
Ideally, I’d prefer a fully autonomous experience that eliminates the need for user intervention. While Copilot does guide you through processes, it hasn’t yet become time-saving due to the numerous websites requiring checks or permissions that prevent task completion without human interaction.
As a feature Microsoft classifies as “experimental” and “early stage”, Copilot Actions encounters several obstacles. Its sluggish performance and inability to determine user location are notable issues. Privacy concerns may also arise when interacting with third-party sites and services, as screenshots of visited websites are taken for analysis purposes.
Whether Copilot Actions can overcome these challenges remains uncertain, but its potential usefulness is undeniable. Improvements in site navigation, access to personal details, and a more proactive approach could revolutionize the way we interact with the web. Only time will tell if this technology fulfills its promise.