Intelligent Agents (AIMA Ch. 2)

Created December 16, 2025 at 02:21 PM
Russell and Norvig - 2022 - Intelligent Agents.pdf (233.0 KB)
2022
2022
Intelligent Agents (AIMA Ch. 2)
Stuart Russell, Peter Norvig
2022-01-01
978-0134610993
Content 79.1 kB
CHAPTER
INTELLIGENT AGENTS
Inwhichwediscussthenatureofagents,perfectorotherwise,thediversityofenvironments,
andtheresultingmenagerie ofagenttypes.
Chapter 1 identified the concept of rational agents as central to our approach to artificial
intelligence. Inthischapter,wemakethisnotionmoreconcrete. Wewillseethattheconcept
ofrationality canbeappliedtoawidevarietyofagentsoperating inanyimaginable environment. Ourplan inthisbook istousethis concept todevelop asmallset ofdesign principles
forbuilding successful agents—systems thatcanreasonably becalledintelligent.
Webegin by examining agents, environments, and the coupling between them. The observation that some agents behave better than others leads naturally to the idea of a rational
agent—one that behaves as well as possible. How well an agent can behave depends on the
natureoftheenvironment;someenvironmentsaremoredifficultthanothers. Wegiveacrude
categorization ofenvironments andshowhowproperties ofanenvironment influencethedesignofsuitable agentsforthatenvironment. Wedescribeanumberofbasic“skeleton” agent
designs, whichwefleshoutintherestofthebook.
2.1 Agents and Environments
Environment Anagentisanything thatcanbeviewedasperceiving itsenvironmentthrough sensorsand
Sensor actinguponthatenvironmentthroughactuators. Thissimpleideaisillustratedin Figure2.1.
Actuator A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract,
and so on for actuators. A robotic agent might have cameras and infrared range finders for
sensors and various motors for actuators. A software agent receives file contents, network
packets, andhumaninput(keyboard/mouse/touchscreen/voice) assensory inputsandactson
the environment by writing files, sending network packets, and displaying information or
generating sounds. Theenvironment could beeverything—the entire universe! Inpractice it
isjustthatpartoftheuniversewhosestatewecareaboutwhendesigningthisagent—thepart
thataffectswhattheagentperceivesandthatisaffectedbytheagent’s actions.
Percept We use the term percept to refer to the content an agent’s sensors are perceiving. An
Percept sequenc◮e agent’sperceptsequenceisthecompletehistoryofeverything theagenthaseverperceived.
Ingeneral, anagent’s choice ofaction atanygiven instant candepend onitsbuilt-in knowledge and on the entire percept sequence observed to date, but not on anything it hasn’t perceived. By specifying the agent’s choice of action for every possible percept sequence, we
have said more or less everything there is to say about the agent. Mathematically speak Agentfunction ing, wesay that an agent’s behavior is described by the agent function that maps any given
perceptsequence toanaction.

Section2.1 Agentsand Environments 37
Agent
Sensors
Actuators
Environment
Percepts
?
Actions
Figure2.1 Agentsinteractwithenvironmentsthroughsensorsandactuators.
We can imagine tabulating the agent function that describes any given agent; for most
agents, this would be a very large table—infinite, in fact, unless we place a bound on the
lengthofperceptsequenceswewanttoconsider. Givenanagenttoexperimentwith,wecan,
in principle, construct this table by trying out all possible percept sequences and recording
whichactionstheagentdoesinresponse.1 Thetableis,ofcourse,anexternalcharacterization
of the agent. Internally, the agent function for an artificial agent will be implemented by an
agent program. It is important to keep these two ideas distinct. The agent function is an Agentprogram
abstract mathematical description; the agent program is a concrete implementation, running
withinsomephysicalsystem.
To illustrate these ideas, we use a simple example—the vacuum-cleaner world, which
consists of a robotic vacuum-cleaning agent in a world consisting of squares that can be
either dirty or clean. Figure 2.2 shows a configuration with just two squares, A and B. The
vacuum agent perceives which square it is in and whether there is dirt in the square. The
agentstartsinsquare A. Theavailable actionsaretomovetotheright,movetotheleft,suck
up the dirt, or do nothing.2 One very simple agent function is the following: if the current
square is dirty, then suck; otherwise, move to the other square. A partial tabulation of this
agent function is shown in Figure 2.3 and an agent program that implements it appears in
Figure2.8onpage49.
Looking at Figure 2.3, we see that various vacuum-world agents can be defined simply ◭
byfillingintheright-handcolumninvariousways. Theobviousquestion,then,isthis: What
is the right way to fill out the table? In other words, what makes an agent good or bad,
intelligent orstupid? Weanswerthesequestions inthenextsection.
1 Iftheagentusessomerandomizationtochoose itsactions, thenwewouldhavetotryeachsequence many
timestoidentifytheprobabilityofeachaction. Onemightimaginethatactingrandomlyisrathersilly,butwe
showlaterinthischapterthatitcanbeveryintelligent.
2 Inarealrobot,itwouldbeunlikelytohaveanactionslike“moveright”and“moveleft.” Insteadtheactions
wouldbe“spinwheelsforward”and“spinwheelsbackward.” Wehavechosentheactionstobeeasiertofollow
onthepage,notforeaseofimplementationinanactualrobot.

38 Chapter 2 Intelligent Agents
A B
Figure2.2 Avacuum-cleanerworldwithjusttwolocations. Eachlocationcanbecleanor
dirty,andtheagentcanmoveleftorrightandcancleanthesquarethatitoccupies.Different
versions of the vacuum world allow for different rules about what the agent can perceive,
whetheritsactionsalwayssucceed,andsoon.
Perceptsequence Action
[A,Clean] Right
[A,Dirty] Suck
[B,Clean] Left
[B,Dirty] Suck
[A,Clean],[A,Clean] Right
[A,Clean],[A,Dirty] Suck
. .
. .
. .
[A,Clean],[A,Clean],[A,Clean] Right
[A,Clean],[A,Clean],[A,Dirty] Suck
. .
. .
. .
Figure2.3 Partialtabulationofasimpleagentfunctionforthevacuum-cleanerworldshown
in Figure2.2.Theagentcleansthecurrentsquareifitisdirty,otherwiseitmovestotheother
square. Notethatthetableisofunboundedsizeunlessthereisarestrictiononthelengthof
possibleperceptsequences.
Before closing this section, weshould emphasize that thenotion ofanagent ismeant to
be a tool for analyzing systems, not an absolute characterization that divides the world into
agents and non-agents. One could view a hand-held calculator as an agent that chooses the
action of displaying “4” when given the percept sequence “2 + 2 =,” but such an analysis
wouldhardlyaidourunderstanding ofthecalculator. Inasense, allareasofengineering can
be seen as designing artifacts that interact with the world; AI operates at (what the authors
consider to be) the most interesting end of the spectrum, where the artifacts have significant
computational resources andthetaskenvironment requiresnontrivial decision making.

Section2.2 Good Behavior:The Conceptof Rationality 39
2.2 Good Behavior: The Concept of Rationality
A rational agent is one that does the right thing. Obviously, doing the right thing is better Rationalagent
thandoingthewrongthing,butwhatdoesitmeantodotherightthing?
2.2.1 Performance measures
Moral philosophy has developed several different notions of the “right thing,” but AI has
generallystucktoonenotioncalledconsequentialism: weevaluateanagent’sbehaviorbyits Consequentialism
consequences. Whenanagentisplunkeddowninanenvironment, itgeneratesasequenceof
actionsaccordingtotheperceptsitreceives. Thissequenceofactionscausestheenvironment
togothroughasequence ofstates. Ifthesequence isdesirable, thentheagenthasperformed
well. This notion of desirability is captured by a performance measure that evaluates any Performance
measure
givensequence ofenvironment states.
Humanshavedesiresandpreferences oftheirown,sothenotionofrationality asapplied
to humans has to do with their success in choosing actions that produce sequences of environmentstatesthataredesirablefromtheirpointofview. Machines,ontheotherhand,donot
havedesiresandpreferencesoftheirown;theperformancemeasureis,initiallyatleast,inthe
mindofthedesigner ofthemachine, orinthemindoftheusers themachine isdesigned for.
Wewillseethatsomeagent designs haveanexplicitrepresentation of(aversion of)theperformance measure, whileinother designs theperformance measure isentirely implicit—the
agentmaydotherightthing,butitdoesn’tknowwhy.
Recalling Norbert Wiener’s warning to ensure that “the purpose put into the machine is
the purpose which we really desire” (page 33), notice that it can be quite hard to formulate
aperformance measurecorrectly. Consider, forexample, thevacuum-cleaner agent fromthe
preceding section. Wemightpropose tomeasure performance bytheamount ofdirtcleaned
up in a single eight-hour shift. With a rational agent, of course, what you ask for is what
you get. A rational agent can maximize this performance measure by cleaning up the dirt,
then dumping it all on the floor, then cleaning it up again, and so on. A more suitable performance measure would reward the agent for having a clean floor. For example, one point
could be awarded for each clean square at each time step (perhaps with a penalty for elec- ◭
tricity consumed and noise generated). As a general rule, it is better to design performance
measures according to what one actually wants to be achieved in the environment, rather
thanaccording tohowonethinkstheagentshouldbehave.
Evenwhentheobviouspitfallsareavoided, someknottyproblemsremain. Forexample,
the notion of “clean floor” in the preceding paragraph is based on average cleanliness over
time. Yetthesameaveragecleanliness canbeachievedbytwodifferentagents,oneofwhich
does a mediocre job all the time while the other cleans energetically but takes long breaks.
Which is preferable might seem to be a fine point of janitorial science, but in fact it is a
deep philosophical question with far-reaching implications. Which is better—a reckless life
of highs and lows, or a safe but humdrum existence? Which is better—an economy where
everyonelivesinmoderatepoverty,oroneinwhichsomeliveinplentywhileothersarevery
poor? Weleavethesequestions asanexerciseforthediligent reader.
For most of the book, we will assume that the performance measure can be specified
correctly. Forthereasonsgivenabove,however,wemustacceptthepossibilitythatwemight
put the wrong purpose into the machine—precisely the King Midas problem described on

40 Chapter 2 Intelligent Agents
page 33. Moreover, when designing one piece of software, copies of which will belong to
different users, wecannot anticipate the exact preferences ofeach individual user. Thus, we
may need to build agents that reflect initial uncertainty about the true performance measure
andlearnmoreaboutitastimegoesby;suchagentsaredescribedin Chapters16,18,and22.
2.2.2 Rationality
Whatisrational atanygiventimedepends onfourthings:
• Theperformance measurethatdefinesthecriterionofsuccess.
• Theagent’spriorknowledge oftheenvironment.
• Theactionsthattheagentcanperform.
• Theagent’sperceptsequence todate.
D ra e t fi io n n i a ti l o a n g o en f t a ◮ Thisleadstoadefinitionofarationalagent:
Foreachpossible perceptsequence, a rationalagentshouldselect anaction thatis expectedtomaximizeitsperformancemeasure,giventheevidenceprovidedbythepercept
sequenceandwhateverbuilt-inknowledgetheagenthas.
Consider thesimplevacuum-cleaner agentthat cleans asquare ifitisdirty andmovestothe
other square ifnot; thisistheagentfunction tabulated in Figure2.3. Isthisarational agent?
That depends! First, we need to say what the performance measure is, what is known about
theenvironment, andwhatsensorsandactuators theagenthas. Letusassumethefollowing:
• The performance measure awards one point for each clean square at each time step,
overa“lifetime” of1000timesteps.
• The “geography” of the environment is known a priori (Figure 2.2) but the dirt distributionandtheinitiallocationoftheagentarenot. Cleansquaresstaycleanandsucking
cleans the current square. The Right and Left actions move the agent one square except when this would take the agent outside the environment, in which case the agent
remainswhereitis.
• Theonlyavailable actionsare Right,Left,and Suck.
• Theagentcorrectly perceives itslocation andwhetherthatlocation contains dirt.
Under these circumstances the agent is indeed rational; its expected performance is at least
asgoodasanyotheragent’s.
Onecanseeeasilythatthesameagentwouldbeirrationalunderdifferentcircumstances.
Forexample,onceallthedirtiscleanedup,theagentwilloscillateneedlesslybackandforth;
iftheperformancemeasureincludesapenaltyofonepointforeachmovement,theagentwill
fare poorly. A better agent for this case would do nothing once it is sure that all the squares
are clean. If clean squares can become dirty again, the agent should occasionally check and
re-cleanthemifneeded. Ifthegeographyoftheenvironmentisunknown,theagentwillneed
toexploreit. Exercise2.VACR asksyoutodesignagentsforthesecases.
2.2.3 Omniscience, learning, and autonomy
Omniscience We need to be careful to distinguish between rationality and omniscience. An omniscient
agent knows the actual outcome of its actions and can act accordingly; but omniscience is
impossible in reality. Consider the following example: I am walking along the Champs
Elyse´es one day and I see an old friend across the street. There is no traffic nearby and I’m

Section2.2 Good Behavior:The Conceptof Rationality 41
not otherwise engaged, so, being rational, I start to cross the street. Meanwhile, at 33,000
feet, a cargo door falls off a passing airliner,3 and before I make it to the other side of the
street Iamflattened. Was Iirrationaltocrossthestreet? Itisunlikelythatmyobituarywould
read“Idiotattemptstocrossstreet.”
Thisexampleshowsthatrationality isnotthesameasperfection. Rationalitymaximizes
expected performance, while perfection maximizes actual performance. Retreating from a
requirement ofperfection isnotjustaquestion ofbeingfairtoagents. Thepointisthatifwe
expect an agent to dowhatturns outafter thefact tobe thebest action, it willbeimpossible
todesignanagenttofulfillthisspecification—unless weimprovetheperformance ofcrystal
ballsortimemachines.
Our definition of rationality does not require omniscience, then, because the rational
choice depends only on the percept sequence to date. We must also ensure that we haven’t
inadvertently allowedtheagenttoengageindecidedly underintelligent activities. Forexample,ifanagentdoesnotlookbothwaysbeforecrossingabusyroad,thenitsperceptsequence
will not tell it that there is a large truck approaching at high speed. Does our definition of
rationality saythatit’snow OKtocrosstheroad? Farfromit!
First,itwouldnotberationaltocrosstheroadgiventhisuninformativeperceptsequence:
theriskofaccidentfromcrossingwithoutlookingistoogreat. Second,arationalagentshould
choose the“looking” action before stepping into thestreet, because looking helps maximize
the expected performance. Doing actions in order to modify future percepts—sometimes
called information gathering—is animportant partofrationality andiscovered indepth in Information
gathering
Chapter 16. Asecond example ofinformation gathering is provided bythe exploration that
mustbeundertaken byavacuum-cleaning agentinaninitiallyunknownenvironment.
Ourdefinitionrequiresarationalagentnotonlytogatherinformationbutalsotolearnas Learning
muchaspossible fromwhatitperceives. Theagent’sinitial configuration couldreflectsome
prior knowledge of the environment, but asthe agent gains experience this maybe modified
and augmented. There are extreme cases in which the environment is completely known a
priori and completely predictable. In such cases, the agent need not perceive or learn; it
simplyactscorrectly.
Ofcourse,suchagentsarefragile. Considerthelowlydungbeetle. Afterdiggingitsnest
and laying its eggs, it fetches a ball of dung from a nearby heap to plug the entrance. If the
ballofdungisremovedfromitsgraspenroute,thebeetlecontinuesitstaskandpantomimes
plugging the nest with the nonexistent dung ball, never noticing that it is missing. Evolution has built an assumption into the beetle’s behavior, and when itis violated, unsuccessful
behavior results.
Slightly more intelligent is the sphex wasp. The female sphex will dig a burrow, go out
and sting a caterpillar and drag it to the burrow, enter the burrow again to check all is well,
drag the caterpillar inside, and lay its eggs. Thecaterpillar serves as afood source when the
eggs hatch. So far so good, but if an entomologist moves the caterpillar a few inches away
while the sphex is doing the check, it will revert to the “drag the caterpillar” step of its plan
andwillcontinuetheplanwithoutmodification,re-checkingtheburrow,evenafterdozensof
caterpillar-moving interventions. The sphex is unable to learn that its innate plan is failing,
andthuswillnotchange it.
3 See N.Henderson,“Newdoorlatchesurgedfor Boeing747jumbojets,”Washington Post,August24,1989.

42 Chapter 2 Intelligent Agents
Totheextentthatanagentreliesonthepriorknowledgeofitsdesigner ratherthanonits
Autonomy ownperceptsandlearningprocesses,wesaythattheagentlacksautonomy. Arationalagent
should be autonomous—it should learn what it can to compensate for partial or incorrect
prior knowledge. For example, a vacuum-cleaning agent that learns to predict where and
whenadditional dirtwillappearwilldobetterthanonethatdoesnot.
As a practical matter, one seldom requires complete autonomy from the start: when the
agent hashadlittle ornoexperience, itwouldhave toactrandomly unless thedesigner gave
some assistance. Just as evolution provides animals with enough built-in reflexes to survive
longenoughtolearnforthemselves,itwouldbereasonabletoprovideanartificialintelligent
agent with some initial knowledge as well as an ability to learn. After sufficient experience
ofitsenvironment, thebehaviorofarationalagentcanbecomeeffectivelyindependent ofits
prior knowledge. Hence, the incorporation of learning allows one todesign asingle rational
agentthatwillsucceedinavastvarietyofenvironments.
2.3 The Nature of Environments
Now that we have a definition of rationality, we are almost ready to think about building
Taskenvironment rational agents. First, however, we must think about task environments, which are essentiallythe“problems” towhichrational agentsarethe“solutions.” Webeginbyshowinghow
to specify a task environment, illustrating the process with a number of examples. We then
showthattaskenvironments comeinavarietyofflavors. Thenatureofthetaskenvironment
directly affectstheappropriate designfortheagentprogram.
2.3.1 Specifying the task environment
In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify
theperformance measure, theenvironment, andtheagent’s actuators and sensors. Wegroup
allthese undertheheading ofthetaskenvironment. Fortheacronymically minded, wecall
PEAS thisthe PEAS(Performance, Environment, Actuators, Sensors)description. Indesigning an
agent,thefirststepmustalwaysbetospecifythetaskenvironment asfullyaspossible.
The vacuum world was a simple example; let us consider a more complex problem:
an automated taxi driver. Figure 2.4 summarizes the PEAS description for the taxi’s task
environment. Wediscusseachelementinmoredetailinthefollowingparagraphs.
First, what is the performance measure to which we would like our automated driver
toaspire? Desirablequalities include gettingtothecorrectdestination; minimizingfuelconsumptionandwearandtear;minimizingthetriptimeorcost;minimizingviolationsoftraffic
laws and disturbances to other drivers; maximizing safety and passenger comfort; maximizingprofits. Obviously, someofthesegoalsconflict,sotradeoffs willberequired.
Next,whatisthedriving environment that thetaxiwillface? Anytaxidriver mustdeal
with a variety of roads, ranging from rural lanes and urban alleys to 12-lane freeways. The
roads contain other traffic, pedestrians, stray animals, road works, police cars, puddles, and
potholes. The taxi must also interact with potential and actual passengers. There are also
some optional choices. The taxi might need to operate in Southern California, where snow
is seldom aproblem, orin Alaska, where it seldom is not. It could always be driving on the
right, orwemightwantittobeflexibleenough todriveontheleftwhenin Britain or Japan.
Obviously, themorerestricted theenvironment, theeasierthedesignproblem.

Section2.3 The Natureof Environments 43
Agent Type Performance Environment Actuators Sensors
Measure
Taxidriver Safe,fast, Roads,other Steering, Cameras,radar,
legal, traffic,police, accelerator, speedometer,GPS,engine
comfortable pedestrians, brake,signal, sensors,accelerometer,
trip,maximize customers, horn,display, microphones,touchscreen
profits, weather speech
minimize
impacton
otherroad
users
Figure2.4 PEASdescriptionofthetaskenvironmentforanautomatedtaxidriver.
The actuators for an automated taxi include those available to a human driver: control
overtheengine through theaccelerator andcontrol oversteering andbraking. Inaddition, it
will need output to a display screen or voice synthesizer to talk back to the passengers, and
perhapssomewaytocommunicatewithothervehicles, politely orotherwise.
Thebasicsensorsforthetaxiwillincludeoneormorevideocamerassothatitcansee,as
wellas lidar and ultrasound sensors todetect distances toother cars and obstacles. Toavoid
speeding tickets, the taxi should have a speedometer, and to control the vehicle properly,
especially on curves, it should have an accelerometer. To determine the mechanical state of
the vehicle, it will need the usual array of engine, fuel, and electrical system sensors. Like
many human drivers, it might want to access GPSsignals so that it doesn’t get lost. Finally,
itwillneedtouchscreen orvoiceinputforthepassenger torequestadestination.
In Figure 2.5, we have sketched the basic PEAS elements for a number of additional
agent types. Further examples appear in Exercise 2.PEAS. The examples include physical
as well as virtual environments. Note that virtual task environments can be just as complex
asthe“real” world: forexample, asoftware agent(orsoftware robot orsoftbot) that trades Softwareagent
on auction and reselling Websites deals with millions of other users and billions of objects, Softbot
manywithrealimages.
2.3.2 Properties of task environments
The range of task environments that might arise in AI is obviously vast. We can, however,
identify a fairly small number of dimensions along which task environments can be categorized. These dimensions determine, to a large extent, the appropriate agent design and the
applicability of each of the principal families of techniques for agent implementation. First
welistthedimensions, thenweanalyzeseveraltaskenvironments toillustrate theideas. The
definitionshereareinformal;laterchaptersprovidemoreprecisestatementsandexamplesof
eachkindofenvironment.
Fully observable vs. partially observable: If an agent’s sensors give it access to the Fullyobservable
complete state of the environment at each point in time, then we say that the task environ- Partiallyobservable
ment is fully observable. A task environment is effectively fully observable if the sensors
detect all aspects that are relevant tothe choice of action; relevance, inturn, depends on the

44 Chapter 2 Intelligent Agents
Agent Type Performance Environment Actuators Sensors
Measure
Medical Healthypatient, Patient,hospital, Displayof
Touchscreen/voice
diagnosissystem reducedcosts staff questions,tests,
entryof
diagnoses,
symptomsand
treatments
findings
Satelliteimage Correct Orbitingsatellite, Displayofscene High-resolution
analysissystem categorizationof downlink, categorization digitalcamera
objects,terrain weather
Part-picking Percentageof Conveyorbelt Jointedarmand Camera,tactile
robot partsincorrect withparts;bins hand andjointangle
bins sensors
Refinery Purity,yield, Refinery,raw Valves,pumps, Temperature,
controller safety materials, heaters,stirrers, pressure,flow,
operators displays chemicalsensors
Interactive Student’sscore Setofstudents, Displayof Keyboardentry,
Englishtutor ontest testingagency exercises, voice
feedback,speech
Figure2.5 Examplesofagenttypesandtheir PEASdescriptions.
performance measure. Fullyobservable environments areconvenient because theagentneed
notmaintainanyinternal statetokeeptrackoftheworld. Anenvironment mightbepartially
observable because of noisy and inaccurate sensors or because parts of the state are simply
missing from the sensor data—for example, a vacuum agent with only a local dirt sensor
cannottellwhetherthereisdirtinothersquares,andanautomatedtaxicannotseewhatother
drivers are thinking. If the agent has no sensors at all then the environment is unobserv Unobservable able. One might think that in such cases the agent’s plight is hopeless, but, as wediscuss in
Chapter4,theagent’s goalsmaystillbeachievable, sometimeswithcertainty.
Single-agent Single-agent vs. multiagent: The distinction between single-agent and multiagent en Multiagent vironments may seem simple enough. Forexample, anagent solving acrossword puzzle by
itself is clearly in a single-agent environment, whereas an agent playing chess is in a twoagent environment. However, there are some subtle issues. First, wehave described how an
entity may be viewed as an agent, but we have not explained which entities must be viewed
as agents. Does an agent A (the taxi driver for example) have to treat an object B (another
vehicle)asanagent,orcanitbetreatedmerelyasanobjectbehavingaccordingtothelawsof
physics, analogous towavesatthebeach orleaves blowing inthewind? Thekeydistinction
iswhether B’sbehavior isbestdescribedasmaximizingaperformancemeasurewhosevalue
depends onagent A’sbehavior.

Section2.3 The Natureof Environments 45
Forexample,inchess, theopponent entity Bistrying tomaximizeitsperformance measure, which,bytherulesofchess, minimizesagent A’sperformance measure. Thus,chessis
a competitive multiagent environment. On the other hand, in the taxi-driving environment, Competitive
avoiding collisions maximizes the performance measure of all agents, so it is a partially cooperativemultiagentenvironment. Itisalsopartiallycompetitivebecause,forexample,only Cooperative
onecarcanoccupyaparking space.
The agent-design problems in multiagent environments are often quite different from
thoseinsingle-agent environments; forexample, communication oftenemerges asarational
behaviorinmultiagentenvironments; insomecompetitiveenvironments, randomizedbehaviorisrational becauseitavoids thepitfallsofpredictability.
Deterministic vs. nondeterministic. If the next state of the environment is completely Deterministic
determined by the current state and the action executed by the agent(s), then we say the Nondeterministic
environmentisdeterministic;otherwise,itisnondeterministic. Inprinciple,anagentneednot
worryaboutuncertainty inafullyobservable, deterministic environment. Iftheenvironment
ispartially observable, however,thenitcouldappear tobenondeterministic.
Mostrealsituationsaresocomplexthatitisimpossibletokeeptrackofalltheunobserved
aspects; for practical purposes, they must be treated as nondeterministic. Taxi driving is
clearly nondeterministic in this sense, because one can never predict the behavior of traffic
exactly; moreover, one’s tires may blow out unexpectedly and one’s engine may seize up
without warning. The vacuum world as we described it is deterministic, but variations can
includenondeterministic elementssuchasrandomlyappearing dirtandanunreliable suction
mechanism (Exercise2.VFIN).
Onefinalnote: thewordstochasticisusedbysomeasasynonymfor“nondeterministic,” Stochastic
but we make a distinction between the two terms; we say that a model of the environment
is stochastic if it explicitly deals with probabilities (e.g., “there’s a 25% chance of rain tomorrow”)and“nondeterministic” ifthepossibilities arelistedwithoutbeingquantified (e.g.,
“there’sachanceofraintomorrow”).
Episodic vs. sequential: In an episodic task environment, the agent’s experience is di- Episodic
vided into atomic episodes. In each episode the agent receives a percept and then performs Sequential
a single action. Crucially, the next episode does not depend on the actions taken in previous episodes. Many classification tasks are episodic. For example, an agent that has to
spot defective parts on an assembly line bases each decision on the current part, regardless
of previous decisions; moreover, the current decision doesn’t affect whether the next part is
defective. In sequential environments, on the other hand, the current decision could affect
allfuture decisions.4 Chessand taxidriving aresequential: inboth cases, short-term actions
can have long-term consequences. Episodic environments are much simpler than sequential
environments becausetheagentdoesnotneedtothinkahead.
Static vs. dynamic: If the environment can change while an agent is deliberating, then Static
wesaytheenvironment isdynamic forthatagent;otherwise, itisstatic. Staticenvironments Dynamic
areeasytodealwithbecausetheagentneednotkeeplookingattheworldwhileitisdeciding
on an action, nor need it worry about the passage of time. Dynamic environments, on the
other hand, are continuously asking the agent what it wants to do; if it hasn’t decided yet,
4 Theword“sequential”isalsousedincomputerscienceastheantonymof“parallel.” Thetwomeaningsare
largelyunrelated.

46 Chapter 2 Intelligent Agents
that counts as deciding to do nothing. If the environment itself does not change with the
passage of time but the agent’s performance score does, then we say the environment is
Semidynamic semidynamic. Taxidrivingisclearlydynamic: theothercarsandthetaxiitselfkeepmoving
whilethe driving algorithm dithers about whattodonext. Chess, whenplayed withaclock,
issemidynamic. Crosswordpuzzlesarestatic.
Discrete Discrete vs. continuous: The discrete/continuous distinction applies to the state of the
Continuous environment, to the way time is handled, and to the percepts and actions of the agent. For
example, the chess environment has a finite number of distinct states (excluding the clock).
Chess also has a discrete set of percepts and actions. Taxi driving is a continuous-state and
continuous-time problem: the speed and location ofthe taxi and ofthe other vehicles sweep
through arangeofcontinuous values anddososmoothlyovertime. Taxi-driving actions are
also continuous (steering angles, etc.). Input from digital cameras isdiscrete, strictly speaking,butistypically treatedasrepresenting continuously varyingintensities andlocations.
Known Known vs. unknown: Strictly speaking, this distinction refers not to the environment
Unknown itself but to the agent’s (or designer’s) state of knowledge about the “laws of physics” of
the environment. In a known environment, the outcomes (or outcome probabilities if the
environment is nondeterministic) for all actions are given. Obviously, if the environment is
unknown, theagentwillhavetolearnhowitworksinordertomakegooddecisions.
The distinction between known and unknown environments is not the same as the one
betweenfullyandpartiallyobservableenvironments. Itisquitepossibleforaknownenvironment to be partially observable—for example, in solitaire card games, I know the rules but
am still unable to see the cards that have not yet been turned over. Conversely, an unknown
environment can be fully observable—in a new video game, the screen may show the entire
gamestatebut Istilldon’tknowwhatthebuttonsdountil Itrythem.
As noted on page 39, the performance measure itself may be unknown, either because
the designer is not sure how to write it down correctly or because the ultimate user—whose
preferences matter—is not known. Forexample, ataxi driver usually won’t know whether a
new passenger prefers a leisurely or speedy journey, a cautious or aggressive driving style.
A virtual personal assistant starts out knowing nothing about the personal preferences of its
newowner. Insuchcases,theagentmaylearnmoreabouttheperformancemeasurebasedon
furtherinteractionswiththedesigneroruser. This,inturn,suggeststhatthetaskenvironment
isnecessarily viewedasamultiagentenvironment.
The hardest case is partially observable, multiagent, nondeterministic, sequential, dynamic, continuous, and unknown. Taxi driving is hard in all these senses, except that the
driver’senvironment ismostlyknown. Drivingarentedcarinanewcountry withunfamiliar
geography, differenttrafficlaws,andnervouspassengers isalotmoreexciting.
Figure2.6 lists theproperties of anumber offamiliar environments. Note thatthe properties are not always cut and dried. For example, we have listed the medical-diagnosis task
assingle-agent becausethediseaseprocessinapatientisnotprofitablymodeledasanagent;
but amedical-diagnosis system might also haveto dealwithrecalcitrant patients andskepticalstaff,sotheenvironment couldhaveamultiagentaspect. Furthermore, medicaldiagnosis
isepisodic ifone conceives ofthetask asselecting adiagnosis given alistofsymptoms; the
problem is sequential if the task can include proposing a series of tests, evaluating progress
overthecourseoftreatment, handling multiplepatients, andsoon.

Section2.4 The Structureof Agents 47
Task Environment Observable Agents Deterministic Episodic Static Discrete
Crosswordpuzzle Fully Single Deterministic Sequential Static Discrete
Chesswithaclock Fully Multi Deterministic Sequential Semi Discrete
Poker Partially Multi Stochastic Sequential Static Discrete
Backgammon Fully Multi Stochastic Sequential Static Discrete
Taxidriving Partially Multi Stochastic Sequential Dynamic Continuous
Medicaldiagnosis Partially Single Stochastic Sequential Dynamic Continuous
Imageanalysis Fully Single Deterministic Episodic Semi Continuous
Part-pickingrobot Partially Single Stochastic Episodic Dynamic Continuous
Refinerycontroller Partially Single Stochastic Sequential Dynamic Continuous
Englishtutor Partially Multi Stochastic Sequential Dynamic Discrete
Figure2.6 Examplesoftaskenvironmentsandtheircharacteristics.
We have not included a “known/unknown” column because, as explained earlier, this is
not strictly aproperty of the environment. Forsome environments, such aschess and poker,
it is quite easy to supply the agent with full knowledge of the rules, but it is nonetheless
interestingtoconsiderhowanagentmightlearntoplaythesegameswithoutsuchknowledge.
Thecoderepository associatedwiththisbook(aima.cs.berkeley.edu)includesmultiple environment implementations, together with a general-purpose environment simulator
for evaluating an agent’s performance. Experiments are often carried out not for a single
environment butformanyenvironments drawnfrom anenvironmentclass. Forexample, to Environment class
evaluate a taxi driver in simulated traffic, we would want to run many simulations with different traffic, lighting, and weather conditions. Weare then interested inthe agent’s average
performance overtheenvironment class.
2.4 The Structure of Agents
Sofarwehavetalkedaboutagentsbydescribingbehavior—theactionthatisperformedafter
any given sequence ofpercepts. Nowwemust bite the bullet and talk about how theinsides
work. The job of AI is to design an agent program that implements the agent function— Agentprogram
the mapping from percepts to actions. We assume this program will run on some sort of
computing devicewithphysicalsensorsandactuators—we callthistheagentarchitecture: Agentarchitecture
agent=architecture+program.
Obviously,theprogramwechoosehastobeonethatisappropriateforthearchitecture. Ifthe
program isgoing torecommend actions like Walk,thearchitecture hadbetterhave legs. The
architecture might be just an ordinary PC, or it might be a robotic car with several onboard
computers, cameras, and other sensors. In general, the architecture makes the percepts from
thesensorsavailabletotheprogram,runstheprogram,andfeedstheprogram’sactionchoices
to the actuators asthey are generated. Most ofthis book is about designing agent programs,
although Chapters25and26dealdirectly withthesensorsandactuators.

48 Chapter 2 Intelligent Agents
function TABLE-DRIVEN-AGENT(percept)returnsanaction
persistent: percepts,asequence,initiallyempty
table,atableofactions,indexedbyperceptsequences,initiallyfullyspecified
appendpercepttotheendofpercepts
action←LOOKUP(percepts,table)
returnaction
Figure2.7 The TABLE-DRIVEN-AGENT programisinvokedforeach newperceptandreturnsanactioneachtime. Itretainsthecompleteperceptsequenceinmemory.
2.4.1 Agent programs
The agent programs that we design in this book all have the same skeleton: they take the
current percept as input from the sensors and return an action to the actuators.5 Notice the
differencebetweentheagentprogram,whichtakesthecurrentperceptasinput,andtheagent
function, which maydepend on theentire percept history. The agent program has no choice
but totake just the current percept asinput because nothing moreisavailable from the environment; iftheagent’s actions need todepend onthe entire percept sequence, theagent will
havetorememberthepercepts.
We describe the agent programs in the simple pseudocode language that is defined in
Appendix B. (The online code repository contains implementations in real programming
languages.) Forexample, Figure 2.7shows arather trivial agent program thatkeeps track of
the percept sequence and then uses it to index into a table of actions to decide what to do.
The table—an example of which is given for the vacuum world in Figure 2.3—represents
explicitly the agent function that the agent program embodies. To build a rational agent in
thisway,weasdesignersmustconstructatablethatcontainstheappropriateactionforevery
possible perceptsequence.
Itisinstructivetoconsiderwhythetable-drivenapproachtoagentconstructionisdoomed
tofailure. Let P bethesetofpossibleperceptsandlet T bethelifetimeoftheagent(thetotal
numberofperceptsitwillreceive). Thelookuptablewillcontain∑T |P|t entries. Consider
t=1
the automated taxi: the visual input from a single camera (eight cameras is typical) comes
in at the rate of roughly 70 megabytes per second (30 frames per second, 1080×720 pixels
with24bitsofcolorinformation). Thisgivesalookuptablewithover10600,000,000,000 entries
foranhour’s driving. Eventhelookup table forchess—a tiny, well-behaved fragment ofthe
real world—has (it turns out) at least 10150 entries. In comparison, the number of atoms in
theobservable universe islessthan1080. Thedaunting sizeofthesetables meansthat(a)no
physical agent in this universe will have the space to store the table; (b) the designer would
not have time to create the table; and (c) no agent could ever learn all the right table entries
fromitsexperience.
◮ Despite all this, TABLE-DRIVEN-AGENT does do what we want, assuming the table is
filledincorrectly: itimplementsthedesiredagentfunction.
5 Thereareotherchoices fortheagent program skeleton; forexample, wecouldhavetheagent programsbe
coroutinesthatrunasynchronouslywiththeenvironment. Eachsuchcoroutinehasaninputandoutputportand
consistsofaloopthatreadstheinputportforperceptsandwritesactionstotheoutputport.

Section2.4 The Structureof Agents 49
function REFLEX-VACUUM-AGENT([location,status])returnsanaction
ifstatus=Dirtythenreturn Suck
elseiflocation=Athenreturn Right
elseiflocation=Bthenreturn Left
Figure2.8 Theagentprogramforasimplereflexagentinthetwo-locationvacuumenvironment. Thisprogramimplementstheagentfunctiontabulatedin Figure2.3.
Thekeychallenge for AIistofindouthowtowriteprograms that, totheextent possible,
produce rationalbehavior fromasmallishprogram ratherthanfromavasttable.
We have many examples showing that this can be done successfully in other areas: for
example, the huge tables of square roots used by engineers and schoolchildren prior to the
1970s havenowbeen replaced byafive-lineprogram for Newton’smethodrunning onelectronic calculators. The question is, can AI do for general intelligent behavior what Newton
didforsquareroots? Webelievetheanswerisyes.
In the remainder of this section, we outline four basic kinds of agent programs that embodytheprinciples underlying almostallintelligent systems:
• Simplereflexagents;
• Model-based reflexagents;
• Goal-basedagents; and
• Utility-based agents.
Each kind of agent program combines particular components in particular ways to generate
actions. Section2.4.6explains ingeneral termshowtoconvert alltheseagents intolearning
agentsthatcanimprovetheperformanceoftheircomponentssoastogeneratebetteractions.
Finally, Section2.4.7describes thevariety ofwaysinwhichthecomponents themselves can
be represented within the agent. This variety provides a major organizing principle for the
fieldandforthebookitself.
2.4.2 Simple reflex agents
Thesimplestkindofagentisthesimplereflexagent. Theseagentsselectactionsonthebasis Simplereflexagent
ofthecurrentpercept,ignoringtherestofthepercepthistory. Forexample,thevacuumagent
whose agent function is tabulated in Figure 2.3is asimple reflexagent, because its decision
is based only on the current location and on whether that location contains dirt. An agent
program forthisagentisshownin Figure2.8.
Noticethatthevacuum agentprogram isverysmallindeed compared tothecorresponding table. The most obvious reduction comes from ignoring the percept history, which cuts
down the number of relevant percept sequences from 4T to just 4. A further, small reduction comesfrom thefact thatwhenthe current square isdirty, theaction does notdepend on
thelocation. Although wehavewritten theagentprogram using if-then-else statements, itis
simpleenough thatitcanalsobeimplementedasa Booleancircuit.
Simplereflexbehaviors occur eveninmorecomplex environments. Imagine yourself as
the driver of the automated taxi. If the car in front brakes and its brake lights come on, then
you should notice this and initiate braking. In other words, some processing is done on the

50 Chapter 2 Intelligent Agents
Agent
Environment
Sensors
What the world
is like now
What action I
Condition-action rules
should do now
Actuators
Figure 2.9 Schematic diagram of a simple reflex agent. We use rectangles to denote the
currentinternalstateoftheagent’sdecisionprocess,andovalstorepresentthebackground
informationusedintheprocess.
visualinputtoestablishtheconditionwecall“Thecarinfrontisbraking.” Then,thistriggers
some established connection in the agent program to the action “initiate braking.” We call
Condition–action suchaconnection acondition–action rule,6 writtenas
rule
ifcar-in-front-is-braking theninitiate-braking.
Humansalsohavemanysuchconnections, someofwhicharelearned responses(asfordriving)andsomeofwhichareinnatereflexes(suchasblinkingwhensomething approaches the
eye). In the course of the book, we show several different ways in which such connections
canbelearnedandimplemented.
The program in Figure 2.8 is specific to one particular vacuum environment. A more
general and flexible approach is first to build a general-purpose interpreter for condition–
action rules and then to create rule sets for specific task environments. Figure 2.9 gives the
structure ofthisgeneral programinschematicform,showinghowthecondition–action rules
allow the agent to make the connection from percept to action. Do not worry if this seems
trivial;itgetsmoreinteresting shortly.
An agent program for Figure 2.9 is shown in Figure 2.10. The INTERPRET-INPUT
function generates an abstracted description of the current state from the percept, and the
RULE-MATCH function returns the first rule in the set of rules that matches the given state
description. Note that the description in terms of “rules” and “matching” is purely conceptual; as noted above, actual implementations can be as simple as a collection of logic gates
implementinga Booleancircuit. Alternatively,a“neural”circuitcanbeused,wherethelogic
gatesarereplaced bythenonlinear unitsofartificialneuralnetworks(see Chapter21).
◮ Simplereflexagentshavetheadmirableproperty ofbeingsimple,buttheyareoflimited
intelligence. Theagent in Figure 2.10willwork only ifthecorrect decision canbe madeon
thebasisofjustthecurrentpercept—that is,onlyiftheenvironment isfullyobservable.
6 Alsocalledsituation–actionrules,productions,orif–thenrules.

Section2.4 The Structureof Agents 51
function SIMPLE-REFLEX-AGENT(percept)returnsanaction
persistent: rules,asetofcondition–actionrules
state←INTERPRET-INPUT(percept)
rule←RULE-MATCH(state,rules)
action←rule.ACTION
returnaction
Figure2.10 Asimplereflexagent. Itactsaccordingtoarulewhoseconditionmatchesthe
currentstate,asdefinedbythepercept.
Even a little bit of unobservability can cause serious trouble. For example, the braking
rule given earlier assumes that the condition car-in-front-is-braking can be determined from
the current percept—a single frame of video. This works if the car in front has a centrally
mounted (and hence uniquely identifiable) brake light. Unfortunately, older models have
different configurations of taillights, brake lights, and turn-signal lights, and it is not always
possible to tell from a single image whether the car is braking or simply has its taillights
on. A simple reflex agent driving behind such a car would either brake continuously and
unnecessarily, or,worse,neverbrakeatall.
Wecan seeasimilar problem arising inthe vacuum world. Suppose that asimple reflex
vacuum agent is deprived of its location sensor and has only a dirt sensor. Such an agent
has just two possible percepts: [Dirty] and [Clean]. It can Suck in response to [Dirty]; what
shoulditdoinresponseto[Clean]? Moving Leftfails(forever)ifithappenstostartinsquare
A, and moving Right fails (forever) ifithappens to start insquare B. Infinite loops are often
unavoidable forsimplereflexagentsoperating inpartially observable environments.
Escape from infinite loops ispossible ifthe agent canrandomize itsactions. Forexam- Randomization
ple, if the vacuum agent perceives [Clean], it might flip a coin to choose between Right and
Left. It is easy to show that the agent will reach the other square in an average of twosteps.
Then, if that square is dirty, the agent will clean it and the task will be complete. Hence, a
randomized simplereflexagentmightoutperform adeterministic simplereflexagent.
Wementionedin Section2.3thatrandomizedbehavioroftherightkindcanberationalin
some multiagent environments. In single-agent environments, randomization is usually not
rational. It is a useful trick that helps a simple reflex agent in some situations, but in most
caseswecandomuchbetterwithmoresophisticated deterministic agents.
2.4.3 Model-based reflex agents
The most effective way to handle partial observability is for the agent to keep track of the
part of the world it can’t see now. That is, the agent should maintain some sort of internal
statethatdepends onthepercepthistoryandtherebyreflectsatleastsomeoftheunobserved Internal state
aspects ofthecurrentstate. Forthebraking problem, theinternal stateisnottooextensive—
just the previous frame from the camera, allowing the agent to detect when twored lights at
theedgeofthevehicle goonoroffsimultaneously. Forotherdriving taskssuchaschanging
lanes,theagentneedstokeeptrackofwheretheothercarsareifitcan’tseethemallatonce.
Andforanydrivingtobepossibleatall,theagentneedstokeeptrackofwhereitskeysare.

52 Chapter 2 Intelligent Agents
Agent
Environment
Sensors
State
How the world evolves What the world
is like now
What my actions do
What action I
Condition-action rules
should do now
Actuators
Figure2.11 Amodel-basedreflexagent.
Updatingthisinternalstateinformationastimegoesbyrequirestwokindsofknowledge
tobeencodedintheagentprograminsomeform. First,weneedsomeinformationabouthow
the world changes over time, which can be divided roughly into twoparts: the effects of the
agent’sactionsandhowtheworldevolvesindependentlyoftheagent. Forexample,whenthe
agent turns the steering wheelclockwise, the car turns to theright, and when it’s raining the
car’s cameras can get wet. This knowledge about “how the world works”—whether implemented in simple Boolean circuits or in complete scientific theories—is called a transition
Transitionmodel modeloftheworld.
Second, we need some information about how the state of the world is reflected in the
agent’s percepts. For example, when the car in front initiates braking, one or more illuminated red regions appear in the forward-facing camera image, and, when the camera gets
wet, droplet-shaped objects appear in the image partially obscuring the road. This kind of
Sensormodel knowledgeiscalledasensormodel.
Together, thetransition modelandsensormodelallowanagenttokeeptrackofthestate
of the world—to the extent possible given the limitations of the agent’s sensors. An agent
Model-basedagent thatusessuchmodelsiscalledamodel-basedagent.
Figure2.11givesthestructure ofthemodel-based reflexagentwithinternal state,showing how the current percept is combined with the old internal state to generate the updated
descriptionofthecurrentstate,basedontheagent’smodelofhowtheworldworks. Theagent
programisshownin Figure2.12. Theinterestingpartisthefunction UPDATE-STATE,which
is responsible for creating the new internal state description. Thedetails of how models and
states are represented vary widely depending on the type of environment and the particular
technology usedintheagentdesign.
Regardlessofthekindofrepresentation used,itisseldompossiblefortheagenttodeterminethecurrent stateofapartially observable environment exactly. Instead, theboxlabeled
“whattheworldislikenow”(Figure2.11)represents theagent’s“bestguess”(orsometimes
best guesses, if the agent entertains multiple possibilities). For example, an automated taxi

Section2.4 The Structureof Agents 53
function MODEL-BASED-REFLEX-AGENT(percept)returnsanaction
persistent: state,theagent’scurrentconceptionoftheworldstate
transition model,adescriptionofhowthenextstatedependson
thecurrentstateandaction
sensor model,adescriptionofhowthecurrentworldstateisreflected
intheagent’spercepts
rules,asetofcondition–actionrules
action,themostrecentaction,initiallynone
state←UPDATE-STATE(state,action,percept,transition model,sensor model)
rule←RULE-MATCH(state,rules)
action←rule.ACTION
returnaction
Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world,
usinganinternalmodel.Itthenchoosesanactioninthesamewayasthereflexagent.
maynotbeabletoseearoundthelargetruckthathasstoppedinfrontofitandcanonlyguess
about what may be causing the hold-up. Thus, uncertainty about the current state may be
unavoidable, buttheagentstillhastomakeadecision.
2.4.4 Goal-based agents
Knowingsomethingaboutthecurrentstateoftheenvironmentisnotalwaysenoughtodecide
what to do. For example, at a road junction, the taxi can turn left, turn right, or go straight
on. The correct decision depends on where the taxi is trying to get to. In other words,
as well as a current state description, the agent needs some sort of goal information that Goal
describes situations that are desirable—for example, being at a particular destination. The
agent program can combine this with the model (the same information as was used in the
model-based reflex agent) to choose actions that achieve the goal. Figure 2.13 shows the
goal-based agent’sstructure.
Sometimes goal-based action selection is straightforward—for example, when goal satisfaction results immediately from a single action. Sometimes it will be more tricky—for
example,whentheagenthastoconsider longsequences oftwistsandturnsinordertofinda
waytoachievethegoal. Search(Chapters3to5)andplanning(Chapter11)arethesubfields
of AIdevoted tofindingactionsequences thatachievetheagent’sgoals.
Notice that decision making of this kind is fundamentally different from the condition–
actionrulesdescribed earlier,inthatitinvolvesconsideration ofthefuture—both“Whatwill
happen if Idosuch-and-such?” and“Willthatmakemehappy?” Inthereflexagentdesigns,
this information is not explicitly represented, because the built-in rules map directly from
percepts to actions. The reflex agent brakes when it sees brake lights, period. It has no idea
why. A goal-based agent brakes whenit sees brake lights because that’s the only action that
itpredicts willachieveitsgoalofnothittingothercars.
Although the goal-based agent appears less efficient, it is more flexible because the
knowledge that supports its decisions is represented explicitly and can be modified. For
example,agoal-basedagent’sbehaviorcaneasilybechangedtogotoadifferentdestination,

54 Chapter 2 Intelligent Agents
Agent
Environment
Sensors
State
What the world
How the world evolves is like now
What it will be like
What my actions do if I do action A
What action I
Goals should do now
Actuators
Figure 2.13 A model-based,goal-basedagent. It keepstrack of the world state as well as
asetofgoalsitistryingtoachieve,andchoosesanactionthatwill(eventually)leadtothe
achievementofitsgoals.
simply by specifying that destination as the goal. The reflex agent’s rules for when to turn
and when to go straight will work only for a single destination; they must all be replaced to
gosomewherenew.
2.4.5 Utility-based agents
Goals alone are not enough to generate high-quality behavior in most environments. For
example, many action sequences will get the taxi to its destination (thereby achieving the
goal), butsomearequicker, safer,morereliable, orcheaper thanothers. Goalsjustprovide a
crudebinarydistinctionbetween“happy”and“unhappy”states. Amoregeneralperformance
measureshouldallowacomparison ofdifferentworldstatesaccordingtoexactlyhowhappy
theywouldmaketheagent. Because“happy” doesnotsoundveryscientific, economists and
Utility computerscientists usethetermutilityinstead.7
Wehave already seenthat aperformance measure assigns ascoretoanygivensequence
of environment states, so it can easily distinguish between more and less desirable ways of
Utilityfunction getting to the taxi’s destination. An agent’s utility function is essentially an internalization
of the performance measure. Provided that the internal utility function and the external performancemeasureareinagreement,anagentthatchoosesactionstomaximizeitsutilitywill
berationalaccording totheexternalperformance measure.
Letusemphasizeagainthatthisisnottheonlywaytoberational—wehavealreadyseen
a rational agent program for the vacuum world (Figure 2.8) that has no idea what its utility
function is—but, like goal-based agents, autility-based agent has many advantages in terms
of flexibility and learning. Furthermore, in two kinds of cases, goals are inadequate but a
utility-based agent can still make rational decisions. First, when there are conflicting goals,
only some of which can be achieved (for example, speed and safety), the utility function
specifies the appropriate tradeoff. Second, when there are several goals that the agent can
7 Theword“utility”hererefersto“thequalityofbeinguseful,”nottotheelectriccompanyorwaterworks.

Section2.4 The Structureof Agents 55
Agent
Environment
Sensors
State
What the world
How the world evolves is like now
What it will be like
What my actions do if I do action A
How happy I will be
Utility
in such a state
What action I
should do now
Actuators
Figure2.14 Amodel-based,utility-basedagent. Itusesamodeloftheworld,alongwitha
utilityfunctionthatmeasuresitspreferencesamongstatesoftheworld. Thenitchoosesthe
actionthatleadstothebestexpectedutility,whereexpectedutilityiscomputedbyaveraging
overallpossibleoutcomestates,weightedbytheprobabilityoftheoutcome.
aim for, none of which can be achieved with certainty, utility provides a way in which the
likelihood ofsuccesscanbeweighedagainsttheimportance ofthegoals.
Partial observability and nondeterminism areubiquitous inthereal world, and so, therefore, is decision making under uncertainty. Technically speaking, a rational utility-based
agent chooses the action that maximizes the expected utility of the action outcomes—that Expected utility
is, the utility the agent expects to derive, on average, given the probabilities and utilities of
each outcome. (Appendix A defines expectation more precisely.) In Chapter 16, we show
thatanyrational agentmustbehave asifitpossesses autilityfunction whoseexpected value
ittriestomaximize. Anagentthatpossessesanexplicitutilityfunctioncanmakerationaldecisionswithageneral-purpose algorithmthatdoesnotdependonthespecificutilityfunction
being maximized. Inthis way, the “global” definition ofrationality—designating asrational
those agent functions that have the highest performance—is turned into a “local” constraint
onrational-agent designs thatcanbeexpressed inasimpleprogram.
The utility-based agent structure appears in Figure 2.14. Utility-based agent programs
appearin Chapters16and17,wherewedesigndecision-making agentsthatmusthandle the
uncertaintyinherentinnondeterministicorpartiallyobservableenvironments. Decisionmakinginmultiagentenvironmentsisalsostudiedintheframeworkofutilitytheory,asexplained
in Chapter18.
Atthis point, the reader maybe wondering, “Is it that simple? Wejust build agents that
maximize expected utility, and we’re done?” It’s true that such agents would be intelligent,
but it’s not simple. A utility-based agent has to model and keep track of its environment,
tasks that have involved a great deal of research on perception, representation, reasoning,
and learning. The results of this research fill many of the chapters of this book. Choosing
theutility-maximizing courseofactionisalsoadifficulttask,requiringingeniousalgorithms
that fill several more chapters. Even with these algorithms, perfect rationality is usually

56 Chapter 2 Intelligent Agents
Performance standard
Agent
Environment
Critic Sensors
feedback
changes
Learning Performance
element element
knowledge
learning
goals
Problem
generator
Actuators
Figure2.15 Agenerallearningagent. The“performanceelement”boxrepresentswhatwe
havepreviouslyconsideredtobethewholeagentprogram.Now,the“learningelement”box
getstomodifythatprogramtoimproveitsperformance.
unachievable inpracticebecauseofcomputational complexity,aswenotedin Chapter1. We
alsonotethatnotallutility-based agentsaremodel-based; wewillseein Chapters22and26
Model-freeagent that a model-free agent can learn what action is best in a particular situation without ever
learning exactlyhowthatactionchangestheenvironment.
Finally, all of this assumes that the designer can specify the utility function correctly;
Chapters17,18,and22consider theissueofunknownutilityfunctions inmoredepth.
2.4.6 Learning agents
We have described agent programs with various methods for selecting actions. We have
not, so far, explained how the agent programs come into being. In his famous early paper,
Turing (1950) considers the idea of actually programming his intelligent machines by hand.
Heestimateshowmuchworkthismighttakeandconcludes,“Somemoreexpeditiousmethod
seems desirable.” The method he proposes is to build learning machines and then to teach
them. In many areas of AI, this is now the preferred method for creating state-of-the-art
systems. Any type of agent (model-based, goal-based, utility-based, etc.) can be built as a
learning agent(ornot).
Learninghasanotheradvantage, aswenotedearlier: itallowstheagenttooperateininitiallyunknownenvironments andtobecomemorecompetentthanitsinitialknowledgealone
mightallow. Inthissection,webrieflyintroducethemainideasoflearningagents. Throughout the book, we comment on opportunities and methods for learning in particular kinds of
agents. Chapters19–22gointomuchmoredepthonthelearning algorithms themselves.
A learning agent can be divided into four conceptual components, as shown in Fig Learningelement ure 2.15. The most important distinction is between the learning element, which is re Performance sponsibleformakingimprovements,andtheperformanceelement,whichisresponsiblefor
element
selecting external actions. The performance element is what we have previously considered

Section2.4 The Structureof Agents 57
tobetheentire agent: ittakes inpercepts and decides onactions. Thelearning element uses
feedback from the critic on how the agent is doing and determines how the performance Critic
elementshouldbemodifiedtodobetterinthefuture.
Thedesignofthelearning elementdependsverymuchonthedesignoftheperformance
element. When trying to design an agent that learns a certain capability, the first question is
not“Howam Igoingtogetittolearnthis?” but“Whatkindofperformanceelementwillmy
agent use to do this once it has learned how?” Given a design for the performance element,
learning mechanismscanbeconstructed toimproveeverypartoftheagent.
The critic tells the learning element how well the agent is doing with respect to a fixed
performance standard. The critic is necessary because the percepts themselves provide no
indication of the agent’s success. For example, a chess program could receive a percept
indicating that it has checkmated its opponent, but it needs a performance standard to know
thatthisisagoodthing;theperceptitselfdoesnotsayso. Itisimportantthattheperformance
standard be fixed. Conceptually, one should think of it as being outside the agent altogether
becausetheagentmustnotmodifyittofititsownbehavior.
The last component of the learning agent is the problem generator. It is responsible Problem generator
for suggesting actions that willlead to new and informative experiences. If the performance
element had its way, it would keep doing the actions that are best, given what it knows, but
iftheagent iswilling toexplore alittle anddosomeperhaps suboptimal actions intheshort
run,itmightdiscovermuchbetteractions forthelongrun. Theproblemgenerator’s jobisto
suggesttheseexploratoryactions. Thisiswhatscientistsdowhentheycarryoutexperiments.
Galileodidnotthinkthatdroppingrocksfromthetopofatowerin Pisawasvaluableinitself.
Hewasnot trying to break the rocks orto modify the brains of unfortunate pedestrians. His
aimwastomodifyhisownbrainbyidentifying abettertheoryofthemotionofobjects.
The learning element can make changes to any of the “knowledge” components shown
intheagentdiagrams(Figures2.9,2.11,2.13,and2.14). Thesimplestcasesinvolvelearning
directly from the percept sequence. Observation of pairs of successive states of the environment can allow the agent to learn “What my actions do” and “How the world evolves” in
response to its actions. For example, if the automated taxi exerts a certain braking pressure
when driving on a wet road, then it will soon find out how much deceleration is actually
achieved, and whether it skids off the road. The problem generator might identify certain
parts of the model that are in need of improvement and suggest experiments, such as trying
outthebrakesondifferent roadsurfaces underdifferentconditions.
Improving the model components of a model-based agent so that they conform better
with reality is almost always a good idea, regardless of the external performance standard.
(In some cases, it is better from a computational point of view to have a simple but slightly
inaccurate model rather than a perfect but fiendishly complex model.) Information from the
externalstandard isneededwhentryingtolearnareflexcomponent orautilityfunction.
For example, suppose the taxi-driving agent receives no tips from passengers who have
been thoroughly shaken up during the trip. The external performance standard must inform
the agent that the loss of tips is a negative contribution to its overall performance; then the
agent might be able to learn that violent maneuvers do not contribute to its own utility. In
a sense, the performance standard distinguishes part of the incoming percept as a reward Reward
(orpenalty)thatprovides directfeedback onthequality oftheagent’s behavior. Hard-wired Penalty
performance standards suchaspainandhungerinanimalscanbeunderstood inthisway.

58 Chapter 2 Intelligent Agents
More generally, human choices can provide information about human preferences. For
example, suppose the taxi does not know that people generally don’t like loud noises, and
settles on the idea of blowing its horn continuously as a way of ensuring that pedestrians
knowit’scoming. Theconsequent humanbehavior—covering ears,usingbadlanguage, and
possibly cutting the wires to the horn—would provide evidence to the agent with which to
updateitsutilityfunction. Thisissueisdiscussed furtherin Chapter22.
In summary, agents have a variety of components, and those components can be represented in many ways within the agent program, so there appears to be great variety among
learning methods. Thereis, however, asingle unifying theme. Learning inintelligent agents
canbesummarized asaprocess ofmodification ofeachcomponent oftheagent tobring the
components into closer agreement withthe available feedback information, thereby improvingtheoverallperformance oftheagent.
2.4.7 How the components of agent programs work
Wehavedescribedagentprograms(inveryhigh-levelterms)asconsistingofvariouscomponents,whosefunctionitistoanswerquestionssuchas: “Whatistheworldlikenow?” “What
action should I do now?” “What do my actions do?” The next question for a student of AI
is, “How on Earth do these components work?” It takes about a thousand pages to begin to
answer that question properly, buthere wewantto draw thereader’s attention to somebasic
distinctions among thevarious waysthatthecomponents canrepresent theenvironment that
theagentinhabits.
Roughlyspeaking,wecanplacetherepresentations alonganaxisofincreasingcomplexityandexpressive power—atomic, factored, andstructured. Toillustrate theseideas, ithelps
to consider a particular agent component, such as the one that deals with “What my actions
do.” Thiscomponent describes thechanges thatmightoccur intheenvironment astheresult
of taking an action, and Figure 2.16 provides schematic depictions of how those transitions
mightberepresented.
B C
B C
(a) Atomic (b) Factored (c) Structured
Figure 2.16 Three ways to represent states and the transitions between them. (a) Atomic
representation:astate(suchas Bor C)isablackboxwithnointernalstructure;(b)Factored
representation: a state consists of a vectorof attribute values; valuescan be Boolean, realvalued, or one of a fixed set of symbols. (c) Structured representation: a state includes
objects,eachofwhichmayhaveattributesofitsownaswellasrelationshipstootherobjects.

Section2.4 The Structureof Agents 59
In an atomic representation each state of the world is indivisible—it has no internal Atomic
representation
structure. Consider thetask offindingadriving route from oneendofacountry totheother
viasomesequence ofcities(weaddress thisproblem in Figure3.1onpage64). Forthepurposesofsolvingthisproblem,itmaysufficetoreducethestateoftheworldtojustthenameof
thecity wearein—asingle atom ofknowledge, a“black box” whoseonly discernible propertyisthatofbeingidenticaltoordifferentfromanotherblackbox. Thestandardalgorithms
underlying search and game-playing (Chapters 3–5), hidden Markov models (Chapter 14),
and Markovdecision processes (Chapter17)allworkwithatomicrepresentations.
Afactoredrepresentationsplitsupeachstateintoafixedsetofvariablesorattributes, Factored
representation
each of which can have a value. Consider a higher-fidelity description for the same driving Variable
problem, where we need to be concerned with more than just atomic location in one city or Attribute
another; we might need to pay attention to how much gas is in the tank, our current GPS Value
coordinates, whether or not the oil warning light is working, how much money we have for
tolls,whatstationisontheradio,andsoon. Whiletwodifferentatomicstateshavenothingin
common—they are just different black boxes—two different factored states can share some
attributes (suchasbeingatsomeparticular GPSlocation) andnotothers(suchashaving lots
ofgasorhaving nogas);thismakesitmucheasiertoworkouthowtoturnonestateintoanother. Manyimportantareasof AIarebasedonfactoredrepresentations, includingconstraint
satisfaction algorithms (Chapter 6), propositional logic (Chapter 7), planning (Chapter 11),
Bayesiannetworks(Chapters12–16), andvariousmachinelearning algorithms.
For many purposes, we need to understand the world as having things in it that are related to each other, not just variables withvalues. Forexample, wemightnotice that alarge
truck ahead of us is reversing into the driveway of a dairy farm, but a loose cow is blocking the truck’s path. A factored representation is unlikely to be pre-equipped with the attribute Truck Ahead Backing Into Dairy Farm Driveway Blocked By Loose Cow with value true or
false. Instead, we would need a structured representation, in which objects such as cows Structured
representation
and trucks and their various and varying relationships can be described explicitly (see Figure 2.16(c)). Structured representations underlie relational databases and first-order logic
(Chapters8,9,and10),first-orderprobability models(Chapter15),andmuchofnaturallanguage understanding (Chapters 23 and24). Infact, muchofwhathumans express innatural
language concerns objectsandtheirrelationships.
As we mentioned earlier, the axis along which atomic, factored, and structured representations lieistheaxis ofincreasing expressiveness. Roughly speaking, amoreexpressive Expressiveness
representation cancapture,atleastasconcisely, everythingalessexpressiveonecancapture,
plussomemore. Often,themoreexpressivelanguageismuchmoreconcise;forexample,the
rules of chess can be written in a page or two of a structured-representation language such
as first-order logic but require thousands of pages when written in a factored-representation
language such as propositional logic and around 1038 pages when written in an atomic languagesuchasthatoffinite-stateautomata. Ontheotherhand,reasoningandlearningbecome
more complex as the expressive power of the representation increases. To gain the benefits
ofexpressive representations whileavoiding their drawbacks, intelligent systems forthereal
worldmayneedtooperateatallpointsalongtheaxissimultaneously.
Anotheraxisforrepresentation involvesthemappingofconceptstolocationsinphysical
memory, whether in a computer or in a brain. If there is a one-to-one mapping between
concepts and memory locations, we call that a localist representation. On the other hand, Localist
representation

60 Chapter 2 Intelligent Agents
if the representation of a concept is spread over many memory locations, and each memory
location is employed as part of the representation of multiple different concepts, we call
Distributed thatadistributedrepresentation. Distributed representations aremorerobust against noise
representation
and information loss. With a localist representation, the mapping from concept to memory
location is arbitrary, and if a transmission error garbles a few bits, we might confuse Truck
withtheunrelatedconcept Truce. Butwithadistributedrepresentation, youcanthinkofeach
conceptrepresentingapointinmultidimensionalspace,andifyougarbleafewbitsyoumove
toanearbypointinthatspace,whichwillhavesimilarmeaning.
Summary
This chapter has been something of a whirlwind tour of AI, which we have conceived of as
thescienceofagentdesign. Themajorpointstorecallareasfollows:
• Anagent issomething that perceives and acts in an environment. The agent function
foranagentspecifiestheactiontakenbytheagentinresponsetoanyperceptsequence.
• The performance measure evaluates the behavior of the agent in an environment. A
rational agentactssoastomaximize theexpected valueoftheperformance measure,
giventheperceptsequence ithasseensofar.
• A task environment specification includes the performance measure, the external environment, the actuators, and the sensors. In designing an agent, the first step must
alwaysbetospecifythetaskenvironment asfullyaspossible.
• Taskenvironments varyalongseveralsignificantdimensions. Theycanbefullyorpartiallyobservable,single-agentormultiagent,deterministicornondeterministic,episodic
orsequential, staticordynamic, discreteorcontinuous, andknownorunknown.
• Incaseswheretheperformance measureisunknown orhardtospecify correctly, there
isasignificantriskoftheagentoptimizingthewrongobjective. Insuchcasestheagent
designshouldreflectuncertainty aboutthetrueobjective.
• The agent program implements the agent function. There exists a variety of basic
agent program designs reflecting thekindofinformation madeexplicit and usedinthe
decision process. The designs vary in efficiency, compactness, and flexibility. The
appropriate designoftheagentprogram depends onthenatureoftheenvironment.
• Simplereflexagentsresponddirectlytopercepts,whereasmodel-basedreflexagents
maintain internal state to track aspects of the world that are not evident in the current
percept. Goal-based agents act to achieve their goals, and utility-based agents try to
maximizetheirownexpected “happiness.”
• Allagentscanimprovetheirperformance throughlearning.
Bibliographical and Historical Notes
The central role of action in intelligence—the notion of practical reasoning—goes back at
least as far as Aristotle’s Nicomachean Ethics. Practical reasoning was also the subject of
Mc Carthy’sinfluential paper“Programswith Common Sense”(1958). Thefieldsofrobotics
andcontrol theory are, bytheir verynature, concerned principally withphysical agents. The

Bibliographicaland Historical Notes 61
concept of a controller in control theory is identical to that of an agent in AI. Perhaps sur- Controller
prisingly, AI has concentrated for most of its history on isolated components of agents—
question-answering systems, theorem-provers, vision systems, and so on—rather than on
wholeagents. Thediscussion ofagents inthetextby Genesereth and Nilsson(1987) wasan
influentialexception. Thewhole-agentviewisnowwidelyacceptedandisacentralthemein
recenttexts(Padghamand Winikoff, 2004;Jones,2007;Pooleand Mackworth,2017).
Chapter 1traced the roots ofthe concept of rationality in philosophy and economics. In
AI,theconceptwasofperipheralinterestuntilthemid-1980s,whenitbegantosuffusemany
discussions aboutthepropertechnical foundations ofthefield. Apaperby Jon Doyle(1983)
predicted that rational agent design would come to be seen as the core mission of AI, while
otherpopulartopicswouldspinofftoformnewdisciplines.
Careful attention to the properties of the environment and their consequences for rational agent design is most apparent in the control theory tradition—for example, classical
control systems (Dorf and Bishop, 2004; Kirk, 2004) handle fully observable, deterministic
environments; stochastic optimal control (Kumar and Varaiya, 1986; Bertsekas and Shreve,
2007) handles partially observable, stochastic environments; and hybrid control (Henzinger
and Sastry, 1998; Cassandras and Lygeros, 2006) deals with environments containing both
discrete andcontinuous elements. Thedistinction betweenfully andpartially observable environments is also central in the dynamic programming literature developed in the field of
operations research (Puterman,1994), whichwediscuss in Chapter17.
Although simple reflex agents were central to behaviorist psychology (see Chapter 1),
most AIresearchersviewthemastoosimpletoprovidemuchleverage. (Rosenschein (1985)
and Brooks (1986) questioned this assumption; see Chapter 26.) A great deal of work
has gone into finding efficient algorithms for keeping track of complex environments (Bar Shalometal.,2001;Chosetetal.,2005;Simon,2006),mostofitintheprobabilistic setting.
Goal-based agents are presupposed in everything from Aristotle’s view of practical reasoning to Mc Carthy’s early papers on logical AI. Shakey the Robot (Fikes and Nilsson,
1971; Nilsson, 1984) was the first robotic embodiment of a logical, goal-based agent. A
full logical analysis of goal-based agents appeared in Genesereth and Nilsson (1987), and a
goal-basedprogrammingmethodologycalledagent-orientedprogrammingwasdevelopedby
Shoham (1993). The agent-based approach is now extremely popular in software engineering (Ciancarini and Wooldridge, 2001). It has also infiltrated the area of operating systems,
whereautonomiccomputingreferstocomputersystemsandnetworksthatmonitorandcon- Autonomic
computing
trolthemselves withaperceive–act loopandmachinelearning methods(Kephartand Chess,
2003). Noting that a collection of agent programs designed to work well together in a true
multiagentenvironmentnecessarilyexhibitsmodularity—theprogramssharenointernalstate
and communicate with each other only through the environment—it is common within the
field of multiagent systems to design the agent program of asingle agent as a collection of
autonomous sub-agents. In some cases, one can even prove that the resulting system gives
thesameoptimalsolutions asamonolithicdesign.
The goal-based view of agents also dominates the cognitive psychology tradition in the
area of problem solving, beginning with the enormously influential Human Problem Solving(Newelland Simon,1972)andrunningthroughallof Newell’slaterwork(Newell,1990).
Goals, further analyzed asdesires (general) andintentions (currently pursued), arecentral to
theinfluentialtheoryofagentsdevelopedby Michael Bratman(1987).

62 Chapter 2 Intelligent Agents
Asnoted in Chapter 1, the development of utility theory as a basis for rational behavior
goes back hundreds of years. In AI, early research eschewed utilities in favor of goals, with
some exceptions (Feldman and Sproull, 1977). The resurgence of interest in probabilistic
methods in the 1980s led to the acceptance of maximization of expected utility as the most
general framework for decision making (Horvitz etal., 1988). The text by Pearl(1988) was
thefirstin AItocoverprobabilityandutilitytheoryindepth;itsexpositionofpracticalmethods for reasoning and decision making under uncertainty was probably the single biggest
factor in the rapid shift towards utility-based agents in the 1990s (see Chapter 16). The formalization of reinforcement learning within adecision-theoretic framework also contributed
tothisshift(Sutton, 1988). Somewhatremarkably, almostall AIresearch until veryrecently
hasassumedthattheperformance measurecanbeexactlyandcorrectlyspecifiedintheform
ofautilityfunctionorrewardfunction(Hadfield-Menell etal.,2017a;Russell,2019).
Thegeneral design forlearning agents portrayed in Figure2.15isclassic inthemachine
learning literature (Buchanan et al., 1978; Mitchell, 1997). Examples of the design, as embodiedinprograms,gobackatleastasfaras Arthur Samuel’s(1959,1967)learningprogram
forplayingcheckers. Learningagentsarediscussed indepthin Chapters19–22.
Someearly papers on agent-based approaches are collected by Huhns and Singh (1998)
and Wooldridgeand Rao(1999). Textsonmultiagentsystemsprovideagoodintroduction to
many aspects of agent design (Weiss, 2000a; Wooldridge, 2009). Several conference series
devoted to agents began in the 1990s, including the International Workshop on Agent Theories, Architectures, and Languages (ATAL), the International Conference on Autonomous
Agents (AGENTS),and the International Conference on Multi-Agent Systems (ICMAS).In
2002, thesethree mergedtoformthe International Joint Conference on Autonomous Agents
and Multi-Agent Systems (AAMAS). From 2000 to 2012 there were annual workshops on
Agent-Oriented Software Engineering (AOSE). The journal Autonomous Agents and Multi Agent Systems was founded in 1998. Finally, Dung Beetle Ecology (Hanski and Cambefort,
1991)providesawealthofinterestinginformation onthebehaviorofdungbeetles. You Tube
hasinspiring videorecordings oftheiractivities.
Statistics

8720

Words

79115

Characters

1

Experiments

3

Processing Results
Analysis

Create an experiment to analyze this document with processing tools like embeddings, segmentation, and entity extraction.

Related Experiments

agent

Temporal Evolution 7 docs
12/16/2025