OpenAI's New Product Takes the Spotlight: What Are Agents and Their Impact?

Introduction

On January 24, 2025, OpenAI released its first AI Agent product, Operator, which has garnered significant attention from both domestic and international media for its ability to perform tasks such as booking flights and ordering food. Shortly thereafter, the renowned AI search software Perplexity also launched an Agent product capable of automatically invoking other apps on Android devices to perform similar functions. On January 23, Wang Yuquan shared the latest trends in the AI industry during a live broadcast, mentioning that 2025 will be the year when Agents become the hottest trend, and the curtain has been lifted on the competition in this field.

In this article, we will delve into what Agents are, why OpenAI's Agent is an industry signal, and the potential impact of this technology.

1. What Exactly Are Agents?

Before diving into OpenAI's latest product, it is essential to understand what an Agent is and the evolution of this technological concept:

Historical Development of Agents

Early Theoretical Stage (1950s - 1990s): The concept of Intelligent Agents emerged alongside the birth of artificial intelligence. In 1959, Selfridge formally proposed the idea of Agents in a paper. At this stage, Agents were primarily theoretical and referred to computer systems capable of perceiving the environment and responding to it.
Software Agent Era (1990s - 2010s): With the development of the internet, the first generation of software Agents appeared, such as automated crawlers and email filters. These systems could complete tasks automatically based on predefined rules.
Intelligent Assistant Era (2010s - 2022): The emergence of smart assistants like Siri and Alexa brought about a new wave of interest in Agents. Leveraging early voice recognition technology, IT products gained some simple natural language interaction capabilities, giving the appearance of more intelligent Agent functions.
AI Large Model - Driven Agent Revolution (2022 - Present): The breakthroughs in AI large models at the end of 2022 provided Agents with powerful understanding and reasoning capabilities. Agents began to handle complex tasks, understand context, and make relatively intelligent decisions. With some framework designs, they also gained the ability to reflect. Agents have evolved from simple programs to intelligent entities with a certain degree of autonomy.

2. OpenAI's Operator

OpenAI's Operator is officially defined as "an AI agent that can perform web tasks for you." It is a highly autonomous intelligent system capable of automatically executing user tasks, such as ordering groceries, booking flights, and filling out forms. Users simply need to give Operator a command, and it can understand the user's intent and perform the corresponding operation.

Core Technology of Operator

CUA (Computer-Using Agent) Model: Operator's core technology is the CUA model, which combines OpenAI's multimodal GPT-4o large language model with reinforcement learning technology. This enables Operator to "see" and "operate" the computer screen like a human.
Safety Enhancements: To improve safety, Operator accesses websites through an embedded browser and performs operations using a virtual mouse and keyboard. It regularly takes screenshots to check the status of task execution.

Current Availability and Partnerships

Currently, Operator is available to ChatGPT Pro users in the United States. OpenAI is also collaborating with companies such as DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to ensure that Operator can truly assist users in completing tasks.

3. Will Operator Lead the Agent Revolution?

Many media outlets and users believe that OpenAI is about to spark a new revolution with Operator. However, we see this more as an important signal of industry爆发 (outbreak) rather than a guaranteed leadership position for OpenAI in the Agent revolution.

Challenges and Limitations

Non -首创 (Not the First to Innovate): The Computer-Use Agent paradigm was not首创 (first) by OpenAI. Anthropic launched its own Computer Use paradigm in October 2024. Anthropic's Claude 3.5 Sonnet model can already move the mouse and click on relevant positions according to user instructions, mimicking the way humans interact with computers.
Performance Gaps: While Operator has made significant technological progress, its real - world performance is still far from perfect. OpenAI's official test data shows a maximum accuracy rate of only 87%. There are still many errors, and users often need to intervene and manage or even start over.
Application Difficulties: OpenAI has acknowledged that Operator faces many challenges in its applications. For example, it has a high error rate when dealing with complex interfaces like calendars, and some websites block OpenAI's web crawlers, preventing Operator from accessing them.

The True Value of Operator

Empowering Developers: The true value of Operator lies in providing more opportunities for developers to join the Agent entrepreneurship wave. Before the release of Operator and Claude's "computer - using" functions, the concept of Agents had already garnered widespread attention in the ToB (business - to - business) field.
Existing Frameworks and Services: Agent development frameworks such as Langchain and Dify have already been applied in enterprise development. Cloud giants like Microsoft, Google, and Amazon have also been providing Agent development services. However, these technologies are too specialized, and the high cost of frequently invoking AI large models means they can only serve a small number of developers.
Industry Signal: OpenAI's entry into the Agent product market is more like Apple releasing a demonstration app for a new phone feature or Microsoft launching an official computer. It showcases the potential of new functions to downstream integrators and developers, encouraging more people to invest in development and launch more innovative products.

Conclusion

While many view Agents as a key breakthrough in new technology, it is important to recognize that the essence of Agents is not a single technology but a product composed of multiple technologies. Creating truly revolutionary products requires not only technological advancements but also a deep understanding of user needs and the ability to quickly iterate products to gain a first - mover advantage in the market. For OpenAI, which is in the midst of the large - model competition, these efforts may be too resource - intensive to become a priority direction.

Page updated

Google Sites

Report abuse