agent Temporal Evolution
Processing: Russell and Norvig - 2022 - Intelligent Agents.pdf Document 3 of 7
Russell and Norvig - 2022 - Intelligent Agents.pdf
Created December 16, 2025 at 02:21 PM
Russell and Norvig - 2022 - Intelligent Agents.pdf (233.0 KB)
Russell and Norvig - 2022 - Intelligent Agents.pdf (233.0 KB)
Processing Operations
Select operations to run on this document. Use the action buttons in the sidebar to execute.
Note: Processing accuracy depends on underlying NLP tools (spaCy, NLTK, transformers, etc.).
Future versions will include adjustable settings and LLM augmentation for improved quality control.
Segmentation
Embeddings
Entity & Concept Extraction
Temporal Analysis
Content 79.1 kB
Cleaned
CHAPTER INTELLIGENT AGENTS Inwhichwediscussthenatureofagents,perfectorotherwise,thediversityofenvironments, andtheresultingmenagerie ofagenttypes. Chapter 1 identified the concept of rational agents as central to our approach to artificial intelligence. Inthischapter,wemakethisnotionmoreconcrete. Wewillseethattheconcept ofrationality canbeappliedtoawidevarietyofagentsoperating inanyimaginable environment. Ourplan inthisbook istousethis concept todevelop asmallset ofdesign principles forbuilding successful agentsâsystems thatcanreasonably becalledintelligent. Webegin by examining agents, environments, and the coupling between them. The observation that some agents behave better than others leads naturally to the idea of a rational agentâone that behaves as well as possible. How well an agent can behave depends on the natureoftheenvironment;someenvironmentsaremoredifficultthanothers. Wegiveacrude categorization ofenvironments andshowhowproperties ofanenvironment influencethedesignofsuitable agentsforthatenvironment. Wedescribeanumberofbasicâskeletonâ agent designs, whichwefleshoutintherestofthebook. 2.1 Agents and Environments Environment Anagentisanything thatcanbeviewedasperceiving itsenvironmentthrough sensorsand Sensor actinguponthatenvironmentthroughactuators. Thissimpleideaisillustratedin Figure2.1. Actuator A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract, and so on for actuators. A robotic agent might have cameras and infrared range finders for sensors and various motors for actuators. A software agent receives file contents, network packets, andhumaninput(keyboard/mouse/touchscreen/voice) assensory inputsandactson the environment by writing files, sending network packets, and displaying information or generating sounds. Theenvironment could beeverythingâthe entire universe! Inpractice it isjustthatpartoftheuniversewhosestatewecareaboutwhendesigningthisagentâthepart thataffectswhattheagentperceivesandthatisaffectedbytheagentâs actions. Percept We use the term percept to refer to the content an agentâs sensors are perceiving. An Percept sequencâŽe agentâsperceptsequenceisthecompletehistoryofeverything theagenthaseverperceived. Ingeneral, anagentâs choice ofaction atanygiven instant candepend onitsbuilt-in knowledge and on the entire percept sequence observed to date, but not on anything it hasnât perceived. By specifying the agentâs choice of action for every possible percept sequence, we have said more or less everything there is to say about the agent. Mathematically speak Agentfunction ing, wesay that an agentâs behavior is described by the agent function that maps any given perceptsequence toanaction. Section2.1 Agentsand Environments 37 Agent Sensors Actuators Environment Percepts ? Actions Figure2.1 Agentsinteractwithenvironmentsthroughsensorsandactuators. We can imagine tabulating the agent function that describes any given agent; for most agents, this would be a very large tableâinfinite, in fact, unless we place a bound on the lengthofperceptsequenceswewanttoconsider. Givenanagenttoexperimentwith,wecan, in principle, construct this table by trying out all possible percept sequences and recording whichactionstheagentdoesinresponse.1 Thetableis,ofcourse,anexternalcharacterization of the agent. Internally, the agent function for an artificial agent will be implemented by an agent program. It is important to keep these two ideas distinct. The agent function is an Agentprogram abstract mathematical description; the agent program is a concrete implementation, running withinsomephysicalsystem. To illustrate these ideas, we use a simple exampleâthe vacuum-cleaner world, which consists of a robotic vacuum-cleaning agent in a world consisting of squares that can be either dirty or clean. Figure 2.2 shows a configuration with just two squares, A and B. The vacuum agent perceives which square it is in and whether there is dirt in the square. The agentstartsinsquare A. Theavailable actionsaretomovetotheright,movetotheleft,suck up the dirt, or do nothing.2 One very simple agent function is the following: if the current square is dirty, then suck; otherwise, move to the other square. A partial tabulation of this agent function is shown in Figure 2.3 and an agent program that implements it appears in Figure2.8onpage49. Looking at Figure 2.3, we see that various vacuum-world agents can be defined simply â byfillingintheright-handcolumninvariousways. Theobviousquestion,then,isthis: What is the right way to fill out the table? In other words, what makes an agent good or bad, intelligent orstupid? Weanswerthesequestions inthenextsection. 1 Iftheagentusessomerandomizationtochoose itsactions, thenwewouldhavetotryeachsequence many timestoidentifytheprobabilityofeachaction. Onemightimaginethatactingrandomlyisrathersilly,butwe showlaterinthischapterthatitcanbeveryintelligent. 2 Inarealrobot,itwouldbeunlikelytohaveanactionslikeâmoverightâandâmoveleft.â Insteadtheactions wouldbeâspinwheelsforwardâandâspinwheelsbackward.â Wehavechosentheactionstobeeasiertofollow onthepage,notforeaseofimplementationinanactualrobot. 38 Chapter 2 Intelligent Agents A B Figure2.2 Avacuum-cleanerworldwithjusttwolocations. Eachlocationcanbecleanor dirty,andtheagentcanmoveleftorrightandcancleanthesquarethatitoccupies.Different versions of the vacuum world allow for different rules about what the agent can perceive, whetheritsactionsalwayssucceed,andsoon. Perceptsequence Action [A,Clean] Right [A,Dirty] Suck [B,Clean] Left [B,Dirty] Suck [A,Clean],[A,Clean] Right [A,Clean],[A,Dirty] Suck . . . . . . [A,Clean],[A,Clean],[A,Clean] Right [A,Clean],[A,Clean],[A,Dirty] Suck . . . . . . Figure2.3 Partialtabulationofasimpleagentfunctionforthevacuum-cleanerworldshown in Figure2.2.Theagentcleansthecurrentsquareifitisdirty,otherwiseitmovestotheother square. Notethatthetableisofunboundedsizeunlessthereisarestrictiononthelengthof possibleperceptsequences. Before closing this section, weshould emphasize that thenotion ofanagent ismeant to be a tool for analyzing systems, not an absolute characterization that divides the world into agents and non-agents. One could view a hand-held calculator as an agent that chooses the action of displaying â4â when given the percept sequence â2 + 2 =,â but such an analysis wouldhardlyaidourunderstanding ofthecalculator. Inasense, allareasofengineering can be seen as designing artifacts that interact with the world; AI operates at (what the authors consider to be) the most interesting end of the spectrum, where the artifacts have significant computational resources andthetaskenvironment requiresnontrivial decision making. Section2.2 Good Behavior:The Conceptof Rationality 39 2.2 Good Behavior: The Concept of Rationality A rational agent is one that does the right thing. Obviously, doing the right thing is better Rationalagent thandoingthewrongthing,butwhatdoesitmeantodotherightthing? 2.2.1 Performance measures Moral philosophy has developed several different notions of the âright thing,â but AI has generallystucktoonenotioncalledconsequentialism: weevaluateanagentâsbehaviorbyits Consequentialism consequences. Whenanagentisplunkeddowninanenvironment, itgeneratesasequenceof actionsaccordingtotheperceptsitreceives. Thissequenceofactionscausestheenvironment togothroughasequence ofstates. Ifthesequence isdesirable, thentheagenthasperformed well. This notion of desirability is captured by a performance measure that evaluates any Performance measure givensequence ofenvironment states. Humanshavedesiresandpreferences oftheirown,sothenotionofrationality asapplied to humans has to do with their success in choosing actions that produce sequences of environmentstatesthataredesirablefromtheirpointofview. Machines,ontheotherhand,donot havedesiresandpreferencesoftheirown;theperformancemeasureis,initiallyatleast,inthe mindofthedesigner ofthemachine, orinthemindoftheusers themachine isdesigned for. Wewillseethatsomeagent designs haveanexplicitrepresentation of(aversion of)theperformance measure, whileinother designs theperformance measure isentirely implicitâthe agentmaydotherightthing,butitdoesnâtknowwhy. Recalling Norbert Wienerâs warning to ensure that âthe purpose put into the machine is the purpose which we really desireâ (page 33), notice that it can be quite hard to formulate aperformance measurecorrectly. Consider, forexample, thevacuum-cleaner agent fromthe preceding section. Wemightpropose tomeasure performance bytheamount ofdirtcleaned up in a single eight-hour shift. With a rational agent, of course, what you ask for is what you get. A rational agent can maximize this performance measure by cleaning up the dirt, then dumping it all on the floor, then cleaning it up again, and so on. A more suitable performance measure would reward the agent for having a clean floor. For example, one point could be awarded for each clean square at each time step (perhaps with a penalty for elec- â tricity consumed and noise generated). As a general rule, it is better to design performance measures according to what one actually wants to be achieved in the environment, rather thanaccording tohowonethinkstheagentshouldbehave. Evenwhentheobviouspitfallsareavoided, someknottyproblemsremain. Forexample, the notion of âclean floorâ in the preceding paragraph is based on average cleanliness over time. Yetthesameaveragecleanliness canbeachievedbytwodifferentagents,oneofwhich does a mediocre job all the time while the other cleans energetically but takes long breaks. Which is preferable might seem to be a fine point of janitorial science, but in fact it is a deep philosophical question with far-reaching implications. Which is betterâa reckless life of highs and lows, or a safe but humdrum existence? Which is betterâan economy where everyonelivesinmoderatepoverty,oroneinwhichsomeliveinplentywhileothersarevery poor? Weleavethesequestions asanexerciseforthediligent reader. For most of the book, we will assume that the performance measure can be specified correctly. Forthereasonsgivenabove,however,wemustacceptthepossibilitythatwemight put the wrong purpose into the machineâprecisely the King Midas problem described on 40 Chapter 2 Intelligent Agents page 33. Moreover, when designing one piece of software, copies of which will belong to different users, wecannot anticipate the exact preferences ofeach individual user. Thus, we may need to build agents that reflect initial uncertainty about the true performance measure andlearnmoreaboutitastimegoesby;suchagentsaredescribedin Chapters16,18,and22. 2.2.2 Rationality Whatisrational atanygiventimedepends onfourthings: ⢠Theperformance measurethatdefinesthecriterionofsuccess. ⢠Theagentâspriorknowledge oftheenvironment. ⢠Theactionsthattheagentcanperform. ⢠Theagentâsperceptsequence todate. D ra e t fi io n n i a ti l o a n g o en f t a ⎠Thisleadstoadefinitionofarationalagent: Foreachpossible perceptsequence, a rationalagentshouldselect anaction thatis expectedtomaximizeitsperformancemeasure,giventheevidenceprovidedbythepercept sequenceandwhateverbuilt-inknowledgetheagenthas. Consider thesimplevacuum-cleaner agentthat cleans asquare ifitisdirty andmovestothe other square ifnot; thisistheagentfunction tabulated in Figure2.3. Isthisarational agent? That depends! First, we need to say what the performance measure is, what is known about theenvironment, andwhatsensorsandactuators theagenthas. Letusassumethefollowing: ⢠The performance measure awards one point for each clean square at each time step, overaâlifetimeâ of1000timesteps. ⢠The âgeographyâ of the environment is known a priori (Figure 2.2) but the dirt distributionandtheinitiallocationoftheagentarenot. Cleansquaresstaycleanandsucking cleans the current square. The Right and Left actions move the agent one square except when this would take the agent outside the environment, in which case the agent remainswhereitis. ⢠Theonlyavailable actionsare Right,Left,and Suck. ⢠Theagentcorrectly perceives itslocation andwhetherthatlocation contains dirt. Under these circumstances the agent is indeed rational; its expected performance is at least asgoodasanyotheragentâs. Onecanseeeasilythatthesameagentwouldbeirrationalunderdifferentcircumstances. Forexample,onceallthedirtiscleanedup,theagentwilloscillateneedlesslybackandforth; iftheperformancemeasureincludesapenaltyofonepointforeachmovement,theagentwill fare poorly. A better agent for this case would do nothing once it is sure that all the squares are clean. If clean squares can become dirty again, the agent should occasionally check and re-cleanthemifneeded. Ifthegeographyoftheenvironmentisunknown,theagentwillneed toexploreit. Exercise2.VACR asksyoutodesignagentsforthesecases. 2.2.3 Omniscience, learning, and autonomy Omniscience We need to be careful to distinguish between rationality and omniscience. An omniscient agent knows the actual outcome of its actions and can act accordingly; but omniscience is impossible in reality. Consider the following example: I am walking along the Champs Elyse´es one day and I see an old friend across the street. There is no traffic nearby and Iâm Section2.2 Good Behavior:The Conceptof Rationality 41 not otherwise engaged, so, being rational, I start to cross the street. Meanwhile, at 33,000 feet, a cargo door falls off a passing airliner,3 and before I make it to the other side of the street Iamflattened. Was Iirrationaltocrossthestreet? Itisunlikelythatmyobituarywould readâIdiotattemptstocrossstreet.â Thisexampleshowsthatrationality isnotthesameasperfection. Rationalitymaximizes expected performance, while perfection maximizes actual performance. Retreating from a requirement ofperfection isnotjustaquestion ofbeingfairtoagents. Thepointisthatifwe expect an agent to dowhatturns outafter thefact tobe thebest action, it willbeimpossible todesignanagenttofulfillthisspecificationâunless weimprovetheperformance ofcrystal ballsortimemachines. Our definition of rationality does not require omniscience, then, because the rational choice depends only on the percept sequence to date. We must also ensure that we havenât inadvertently allowedtheagenttoengageindecidedly underintelligent activities. Forexample,ifanagentdoesnotlookbothwaysbeforecrossingabusyroad,thenitsperceptsequence will not tell it that there is a large truck approaching at high speed. Does our definition of rationality saythatitâsnow OKtocrosstheroad? Farfromit! First,itwouldnotberationaltocrosstheroadgiventhisuninformativeperceptsequence: theriskofaccidentfromcrossingwithoutlookingistoogreat. Second,arationalagentshould choose theâlookingâ action before stepping into thestreet, because looking helps maximize the expected performance. Doing actions in order to modify future perceptsâsometimes called information gatheringâis animportant partofrationality andiscovered indepth in Information gathering Chapter 16. Asecond example ofinformation gathering is provided bythe exploration that mustbeundertaken byavacuum-cleaning agentinaninitiallyunknownenvironment. Ourdefinitionrequiresarationalagentnotonlytogatherinformationbutalsotolearnas Learning muchaspossible fromwhatitperceives. Theagentâsinitial configuration couldreflectsome prior knowledge of the environment, but asthe agent gains experience this maybe modified and augmented. There are extreme cases in which the environment is completely known a priori and completely predictable. In such cases, the agent need not perceive or learn; it simplyactscorrectly. Ofcourse,suchagentsarefragile. Considerthelowlydungbeetle. Afterdiggingitsnest and laying its eggs, it fetches a ball of dung from a nearby heap to plug the entrance. If the ballofdungisremovedfromitsgraspenroute,thebeetlecontinuesitstaskandpantomimes plugging the nest with the nonexistent dung ball, never noticing that it is missing. Evolution has built an assumption into the beetleâs behavior, and when itis violated, unsuccessful behavior results. Slightly more intelligent is the sphex wasp. The female sphex will dig a burrow, go out and sting a caterpillar and drag it to the burrow, enter the burrow again to check all is well, drag the caterpillar inside, and lay its eggs. Thecaterpillar serves as afood source when the eggs hatch. So far so good, but if an entomologist moves the caterpillar a few inches away while the sphex is doing the check, it will revert to the âdrag the caterpillarâ step of its plan andwillcontinuetheplanwithoutmodification,re-checkingtheburrow,evenafterdozensof caterpillar-moving interventions. The sphex is unable to learn that its innate plan is failing, andthuswillnotchange it. 3 See N.Henderson,âNewdoorlatchesurgedfor Boeing747jumbojets,âWashington Post,August24,1989. 42 Chapter 2 Intelligent Agents Totheextentthatanagentreliesonthepriorknowledgeofitsdesigner ratherthanonits Autonomy ownperceptsandlearningprocesses,wesaythattheagentlacksautonomy. Arationalagent should be autonomousâit should learn what it can to compensate for partial or incorrect prior knowledge. For example, a vacuum-cleaning agent that learns to predict where and whenadditional dirtwillappearwilldobetterthanonethatdoesnot. As a practical matter, one seldom requires complete autonomy from the start: when the agent hashadlittle ornoexperience, itwouldhave toactrandomly unless thedesigner gave some assistance. Just as evolution provides animals with enough built-in reflexes to survive longenoughtolearnforthemselves,itwouldbereasonabletoprovideanartificialintelligent agent with some initial knowledge as well as an ability to learn. After sufficient experience ofitsenvironment, thebehaviorofarationalagentcanbecomeeffectivelyindependent ofits prior knowledge. Hence, the incorporation of learning allows one todesign asingle rational agentthatwillsucceedinavastvarietyofenvironments. 2.3 The Nature of Environments Now that we have a definition of rationality, we are almost ready to think about building Taskenvironment rational agents. First, however, we must think about task environments, which are essentiallytheâproblemsâ towhichrational agentsaretheâsolutions.â Webeginbyshowinghow to specify a task environment, illustrating the process with a number of examples. We then showthattaskenvironments comeinavarietyofflavors. Thenatureofthetaskenvironment directly affectstheappropriate designfortheagentprogram. 2.3.1 Specifying the task environment In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify theperformance measure, theenvironment, andtheagentâs actuators and sensors. Wegroup allthese undertheheading ofthetaskenvironment. Fortheacronymically minded, wecall PEAS thisthe PEAS(Performance, Environment, Actuators, Sensors)description. Indesigning an agent,thefirststepmustalwaysbetospecifythetaskenvironment asfullyaspossible. The vacuum world was a simple example; let us consider a more complex problem: an automated taxi driver. Figure 2.4 summarizes the PEAS description for the taxiâs task environment. Wediscusseachelementinmoredetailinthefollowingparagraphs. First, what is the performance measure to which we would like our automated driver toaspire? Desirablequalities include gettingtothecorrectdestination; minimizingfuelconsumptionandwearandtear;minimizingthetriptimeorcost;minimizingviolationsoftraffic laws and disturbances to other drivers; maximizing safety and passenger comfort; maximizingprofits. Obviously, someofthesegoalsconflict,sotradeoffs willberequired. Next,whatisthedriving environment that thetaxiwillface? Anytaxidriver mustdeal with a variety of roads, ranging from rural lanes and urban alleys to 12-lane freeways. The roads contain other traffic, pedestrians, stray animals, road works, police cars, puddles, and potholes. The taxi must also interact with potential and actual passengers. There are also some optional choices. The taxi might need to operate in Southern California, where snow is seldom aproblem, orin Alaska, where it seldom is not. It could always be driving on the right, orwemightwantittobeflexibleenough todriveontheleftwhenin Britain or Japan. Obviously, themorerestricted theenvironment, theeasierthedesignproblem. Section2.3 The Natureof Environments 43 Agent Type Performance Environment Actuators Sensors Measure Taxidriver Safe,fast, Roads,other Steering, Cameras,radar, legal, traffic,police, accelerator, speedometer,GPS,engine comfortable pedestrians, brake,signal, sensors,accelerometer, trip,maximize customers, horn,display, microphones,touchscreen profits, weather speech minimize impacton otherroad users Figure2.4 PEASdescriptionofthetaskenvironmentforanautomatedtaxidriver. The actuators for an automated taxi include those available to a human driver: control overtheengine through theaccelerator andcontrol oversteering andbraking. Inaddition, it will need output to a display screen or voice synthesizer to talk back to the passengers, and perhapssomewaytocommunicatewithothervehicles, politely orotherwise. Thebasicsensorsforthetaxiwillincludeoneormorevideocamerassothatitcansee,as wellas lidar and ultrasound sensors todetect distances toother cars and obstacles. Toavoid speeding tickets, the taxi should have a speedometer, and to control the vehicle properly, especially on curves, it should have an accelerometer. To determine the mechanical state of the vehicle, it will need the usual array of engine, fuel, and electrical system sensors. Like many human drivers, it might want to access GPSsignals so that it doesnât get lost. Finally, itwillneedtouchscreen orvoiceinputforthepassenger torequestadestination. In Figure 2.5, we have sketched the basic PEAS elements for a number of additional agent types. Further examples appear in Exercise 2.PEAS. The examples include physical as well as virtual environments. Note that virtual task environments can be just as complex astheârealâ world: forexample, asoftware agent(orsoftware robot orsoftbot) that trades Softwareagent on auction and reselling Websites deals with millions of other users and billions of objects, Softbot manywithrealimages. 2.3.2 Properties of task environments The range of task environments that might arise in AI is obviously vast. We can, however, identify a fairly small number of dimensions along which task environments can be categorized. These dimensions determine, to a large extent, the appropriate agent design and the applicability of each of the principal families of techniques for agent implementation. First welistthedimensions, thenweanalyzeseveraltaskenvironments toillustrate theideas. The definitionshereareinformal;laterchaptersprovidemoreprecisestatementsandexamplesof eachkindofenvironment. Fully observable vs. partially observable: If an agentâs sensors give it access to the Fullyobservable complete state of the environment at each point in time, then we say that the task environ- Partiallyobservable ment is fully observable. A task environment is effectively fully observable if the sensors detect all aspects that are relevant tothe choice of action; relevance, inturn, depends on the 44 Chapter 2 Intelligent Agents Agent Type Performance Environment Actuators Sensors Measure Medical Healthypatient, Patient,hospital, Displayof Touchscreen/voice diagnosissystem reducedcosts staff questions,tests, entryof diagnoses, symptomsand treatments findings Satelliteimage Correct Orbitingsatellite, Displayofscene High-resolution analysissystem categorizationof downlink, categorization digitalcamera objects,terrain weather Part-picking Percentageof Conveyorbelt Jointedarmand Camera,tactile robot partsincorrect withparts;bins hand andjointangle bins sensors Refinery Purity,yield, Refinery,raw Valves,pumps, Temperature, controller safety materials, heaters,stirrers, pressure,flow, operators displays chemicalsensors Interactive Studentâsscore Setofstudents, Displayof Keyboardentry, Englishtutor ontest testingagency exercises, voice feedback,speech Figure2.5 Examplesofagenttypesandtheir PEASdescriptions. performance measure. Fullyobservable environments areconvenient because theagentneed notmaintainanyinternal statetokeeptrackoftheworld. Anenvironment mightbepartially observable because of noisy and inaccurate sensors or because parts of the state are simply missing from the sensor dataâfor example, a vacuum agent with only a local dirt sensor cannottellwhetherthereisdirtinothersquares,andanautomatedtaxicannotseewhatother drivers are thinking. If the agent has no sensors at all then the environment is unobserv Unobservable able. One might think that in such cases the agentâs plight is hopeless, but, as wediscuss in Chapter4,theagentâs goalsmaystillbeachievable, sometimeswithcertainty. Single-agent Single-agent vs. multiagent: The distinction between single-agent and multiagent en Multiagent vironments may seem simple enough. Forexample, anagent solving acrossword puzzle by itself is clearly in a single-agent environment, whereas an agent playing chess is in a twoagent environment. However, there are some subtle issues. First, wehave described how an entity may be viewed as an agent, but we have not explained which entities must be viewed as agents. Does an agent A (the taxi driver for example) have to treat an object B (another vehicle)asanagent,orcanitbetreatedmerelyasanobjectbehavingaccordingtothelawsof physics, analogous towavesatthebeach orleaves blowing inthewind? Thekeydistinction iswhether Bâsbehavior isbestdescribedasmaximizingaperformancemeasurewhosevalue depends onagent Aâsbehavior. Section2.3 The Natureof Environments 45 Forexample,inchess, theopponent entity Bistrying tomaximizeitsperformance measure, which,bytherulesofchess, minimizesagent Aâsperformance measure. Thus,chessis a competitive multiagent environment. On the other hand, in the taxi-driving environment, Competitive avoiding collisions maximizes the performance measure of all agents, so it is a partially cooperativemultiagentenvironment. Itisalsopartiallycompetitivebecause,forexample,only Cooperative onecarcanoccupyaparking space. The agent-design problems in multiagent environments are often quite different from thoseinsingle-agent environments; forexample, communication oftenemerges asarational behaviorinmultiagentenvironments; insomecompetitiveenvironments, randomizedbehaviorisrational becauseitavoids thepitfallsofpredictability. Deterministic vs. nondeterministic. If the next state of the environment is completely Deterministic determined by the current state and the action executed by the agent(s), then we say the Nondeterministic environmentisdeterministic;otherwise,itisnondeterministic. Inprinciple,anagentneednot worryaboutuncertainty inafullyobservable, deterministic environment. Iftheenvironment ispartially observable, however,thenitcouldappear tobenondeterministic. Mostrealsituationsaresocomplexthatitisimpossibletokeeptrackofalltheunobserved aspects; for practical purposes, they must be treated as nondeterministic. Taxi driving is clearly nondeterministic in this sense, because one can never predict the behavior of traffic exactly; moreover, oneâs tires may blow out unexpectedly and oneâs engine may seize up without warning. The vacuum world as we described it is deterministic, but variations can includenondeterministic elementssuchasrandomlyappearing dirtandanunreliable suction mechanism (Exercise2.VFIN). Onefinalnote: thewordstochasticisusedbysomeasasynonymforânondeterministic,â Stochastic but we make a distinction between the two terms; we say that a model of the environment is stochastic if it explicitly deals with probabilities (e.g., âthereâs a 25% chance of rain tomorrowâ)andânondeterministicâ ifthepossibilities arelistedwithoutbeingquantified (e.g., âthereâsachanceofraintomorrowâ). Episodic vs. sequential: In an episodic task environment, the agentâs experience is di- Episodic vided into atomic episodes. In each episode the agent receives a percept and then performs Sequential a single action. Crucially, the next episode does not depend on the actions taken in previous episodes. Many classification tasks are episodic. For example, an agent that has to spot defective parts on an assembly line bases each decision on the current part, regardless of previous decisions; moreover, the current decision doesnât affect whether the next part is defective. In sequential environments, on the other hand, the current decision could affect allfuture decisions.4 Chessand taxidriving aresequential: inboth cases, short-term actions can have long-term consequences. Episodic environments are much simpler than sequential environments becausetheagentdoesnotneedtothinkahead. Static vs. dynamic: If the environment can change while an agent is deliberating, then Static wesaytheenvironment isdynamic forthatagent;otherwise, itisstatic. Staticenvironments Dynamic areeasytodealwithbecausetheagentneednotkeeplookingattheworldwhileitisdeciding on an action, nor need it worry about the passage of time. Dynamic environments, on the other hand, are continuously asking the agent what it wants to do; if it hasnât decided yet, 4 Thewordâsequentialâisalsousedincomputerscienceastheantonymofâparallel.â Thetwomeaningsare largelyunrelated. 46 Chapter 2 Intelligent Agents that counts as deciding to do nothing. If the environment itself does not change with the passage of time but the agentâs performance score does, then we say the environment is Semidynamic semidynamic. Taxidrivingisclearlydynamic: theothercarsandthetaxiitselfkeepmoving whilethe driving algorithm dithers about whattodonext. Chess, whenplayed withaclock, issemidynamic. Crosswordpuzzlesarestatic. Discrete Discrete vs. continuous: The discrete/continuous distinction applies to the state of the Continuous environment, to the way time is handled, and to the percepts and actions of the agent. For example, the chess environment has a finite number of distinct states (excluding the clock). Chess also has a discrete set of percepts and actions. Taxi driving is a continuous-state and continuous-time problem: the speed and location ofthe taxi and ofthe other vehicles sweep through arangeofcontinuous values anddososmoothlyovertime. Taxi-driving actions are also continuous (steering angles, etc.). Input from digital cameras isdiscrete, strictly speaking,butistypically treatedasrepresenting continuously varyingintensities andlocations. Known Known vs. unknown: Strictly speaking, this distinction refers not to the environment Unknown itself but to the agentâs (or designerâs) state of knowledge about the âlaws of physicsâ of the environment. In a known environment, the outcomes (or outcome probabilities if the environment is nondeterministic) for all actions are given. Obviously, if the environment is unknown, theagentwillhavetolearnhowitworksinordertomakegooddecisions. The distinction between known and unknown environments is not the same as the one betweenfullyandpartiallyobservableenvironments. Itisquitepossibleforaknownenvironment to be partially observableâfor example, in solitaire card games, I know the rules but am still unable to see the cards that have not yet been turned over. Conversely, an unknown environment can be fully observableâin a new video game, the screen may show the entire gamestatebut Istilldonâtknowwhatthebuttonsdountil Itrythem. As noted on page 39, the performance measure itself may be unknown, either because the designer is not sure how to write it down correctly or because the ultimate userâwhose preferences matterâis not known. Forexample, ataxi driver usually wonât know whether a new passenger prefers a leisurely or speedy journey, a cautious or aggressive driving style. A virtual personal assistant starts out knowing nothing about the personal preferences of its newowner. Insuchcases,theagentmaylearnmoreabouttheperformancemeasurebasedon furtherinteractionswiththedesigneroruser. This,inturn,suggeststhatthetaskenvironment isnecessarily viewedasamultiagentenvironment. The hardest case is partially observable, multiagent, nondeterministic, sequential, dynamic, continuous, and unknown. Taxi driving is hard in all these senses, except that the driverâsenvironment ismostlyknown. Drivingarentedcarinanewcountry withunfamiliar geography, differenttrafficlaws,andnervouspassengers isalotmoreexciting. Figure2.6 lists theproperties of anumber offamiliar environments. Note thatthe properties are not always cut and dried. For example, we have listed the medical-diagnosis task assingle-agent becausethediseaseprocessinapatientisnotprofitablymodeledasanagent; but amedical-diagnosis system might also haveto dealwithrecalcitrant patients andskepticalstaff,sotheenvironment couldhaveamultiagentaspect. Furthermore, medicaldiagnosis isepisodic ifone conceives ofthetask asselecting adiagnosis given alistofsymptoms; the problem is sequential if the task can include proposing a series of tests, evaluating progress overthecourseoftreatment, handling multiplepatients, andsoon. Section2.4 The Structureof Agents 47 Task Environment Observable Agents Deterministic Episodic Static Discrete Crosswordpuzzle Fully Single Deterministic Sequential Static Discrete Chesswithaclock Fully Multi Deterministic Sequential Semi Discrete Poker Partially Multi Stochastic Sequential Static Discrete Backgammon Fully Multi Stochastic Sequential Static Discrete Taxidriving Partially Multi Stochastic Sequential Dynamic Continuous Medicaldiagnosis Partially Single Stochastic Sequential Dynamic Continuous Imageanalysis Fully Single Deterministic Episodic Semi Continuous Part-pickingrobot Partially Single Stochastic Episodic Dynamic Continuous Refinerycontroller Partially Single Stochastic Sequential Dynamic Continuous Englishtutor Partially Multi Stochastic Sequential Dynamic Discrete Figure2.6 Examplesoftaskenvironmentsandtheircharacteristics. We have not included a âknown/unknownâ column because, as explained earlier, this is not strictly aproperty of the environment. Forsome environments, such aschess and poker, it is quite easy to supply the agent with full knowledge of the rules, but it is nonetheless interestingtoconsiderhowanagentmightlearntoplaythesegameswithoutsuchknowledge. Thecoderepository associatedwiththisbook(aima.cs.berkeley.edu)includesmultiple environment implementations, together with a general-purpose environment simulator for evaluating an agentâs performance. Experiments are often carried out not for a single environment butformanyenvironments drawnfrom anenvironmentclass. Forexample, to Environment class evaluate a taxi driver in simulated traffic, we would want to run many simulations with different traffic, lighting, and weather conditions. Weare then interested inthe agentâs average performance overtheenvironment class. 2.4 The Structure of Agents Sofarwehavetalkedaboutagentsbydescribingbehaviorâtheactionthatisperformedafter any given sequence ofpercepts. Nowwemust bite the bullet and talk about how theinsides work. The job of AI is to design an agent program that implements the agent functionâ Agentprogram the mapping from percepts to actions. We assume this program will run on some sort of computing devicewithphysicalsensorsandactuatorsâwe callthistheagentarchitecture: Agentarchitecture agent=architecture+program. Obviously,theprogramwechoosehastobeonethatisappropriateforthearchitecture. Ifthe program isgoing torecommend actions like Walk,thearchitecture hadbetterhave legs. The architecture might be just an ordinary PC, or it might be a robotic car with several onboard computers, cameras, and other sensors. In general, the architecture makes the percepts from thesensorsavailabletotheprogram,runstheprogram,andfeedstheprogramâsactionchoices to the actuators asthey are generated. Most ofthis book is about designing agent programs, although Chapters25and26dealdirectly withthesensorsandactuators. 48 Chapter 2 Intelligent Agents function TABLE-DRIVEN-AGENT(percept)returnsanaction persistent: percepts,asequence,initiallyempty table,atableofactions,indexedbyperceptsequences,initiallyfullyspecified appendpercepttotheendofpercepts actionâLOOKUP(percepts,table) returnaction Figure2.7 The TABLE-DRIVEN-AGENT programisinvokedforeach newperceptandreturnsanactioneachtime. Itretainsthecompleteperceptsequenceinmemory. 2.4.1 Agent programs The agent programs that we design in this book all have the same skeleton: they take the current percept as input from the sensors and return an action to the actuators.5 Notice the differencebetweentheagentprogram,whichtakesthecurrentperceptasinput,andtheagent function, which maydepend on theentire percept history. The agent program has no choice but totake just the current percept asinput because nothing moreisavailable from the environment; iftheagentâs actions need todepend onthe entire percept sequence, theagent will havetorememberthepercepts. We describe the agent programs in the simple pseudocode language that is defined in Appendix B. (The online code repository contains implementations in real programming languages.) Forexample, Figure 2.7shows arather trivial agent program thatkeeps track of the percept sequence and then uses it to index into a table of actions to decide what to do. The tableâan example of which is given for the vacuum world in Figure 2.3ârepresents explicitly the agent function that the agent program embodies. To build a rational agent in thisway,weasdesignersmustconstructatablethatcontainstheappropriateactionforevery possible perceptsequence. Itisinstructivetoconsiderwhythetable-drivenapproachtoagentconstructionisdoomed tofailure. Let P bethesetofpossibleperceptsandlet T bethelifetimeoftheagent(thetotal numberofperceptsitwillreceive). ThelookuptablewillcontainâT |P|t entries. Consider t=1 the automated taxi: the visual input from a single camera (eight cameras is typical) comes in at the rate of roughly 70 megabytes per second (30 frames per second, 1080Ă720 pixels with24bitsofcolorinformation). Thisgivesalookuptablewithover10600,000,000,000 entries foranhourâs driving. Eventhelookup table forchessâa tiny, well-behaved fragment ofthe real worldâhas (it turns out) at least 10150 entries. In comparison, the number of atoms in theobservable universe islessthan1080. Thedaunting sizeofthesetables meansthat(a)no physical agent in this universe will have the space to store the table; (b) the designer would not have time to create the table; and (c) no agent could ever learn all the right table entries fromitsexperience. ⎠Despite all this, TABLE-DRIVEN-AGENT does do what we want, assuming the table is filledincorrectly: itimplementsthedesiredagentfunction. 5 Thereareotherchoices fortheagent program skeleton; forexample, wecouldhavetheagent programsbe coroutinesthatrunasynchronouslywiththeenvironment. Eachsuchcoroutinehasaninputandoutputportand consistsofaloopthatreadstheinputportforperceptsandwritesactionstotheoutputport. Section2.4 The Structureof Agents 49 function REFLEX-VACUUM-AGENT([location,status])returnsanaction ifstatus=Dirtythenreturn Suck elseiflocation=Athenreturn Right elseiflocation=Bthenreturn Left Figure2.8 Theagentprogramforasimplereflexagentinthetwo-locationvacuumenvironment. Thisprogramimplementstheagentfunctiontabulatedin Figure2.3. Thekeychallenge for AIistofindouthowtowriteprograms that, totheextent possible, produce rationalbehavior fromasmallishprogram ratherthanfromavasttable. We have many examples showing that this can be done successfully in other areas: for example, the huge tables of square roots used by engineers and schoolchildren prior to the 1970s havenowbeen replaced byafive-lineprogram for Newtonâsmethodrunning onelectronic calculators. The question is, can AI do for general intelligent behavior what Newton didforsquareroots? Webelievetheanswerisyes. In the remainder of this section, we outline four basic kinds of agent programs that embodytheprinciples underlying almostallintelligent systems: ⢠Simplereflexagents; ⢠Model-based reflexagents; ⢠Goal-basedagents; and ⢠Utility-based agents. Each kind of agent program combines particular components in particular ways to generate actions. Section2.4.6explains ingeneral termshowtoconvert alltheseagents intolearning agentsthatcanimprovetheperformanceoftheircomponentssoastogeneratebetteractions. Finally, Section2.4.7describes thevariety ofwaysinwhichthecomponents themselves can be represented within the agent. This variety provides a major organizing principle for the fieldandforthebookitself. 2.4.2 Simple reflex agents Thesimplestkindofagentisthesimplereflexagent. Theseagentsselectactionsonthebasis Simplereflexagent ofthecurrentpercept,ignoringtherestofthepercepthistory. Forexample,thevacuumagent whose agent function is tabulated in Figure 2.3is asimple reflexagent, because its decision is based only on the current location and on whether that location contains dirt. An agent program forthisagentisshownin Figure2.8. Noticethatthevacuum agentprogram isverysmallindeed compared tothecorresponding table. The most obvious reduction comes from ignoring the percept history, which cuts down the number of relevant percept sequences from 4T to just 4. A further, small reduction comesfrom thefact thatwhenthe current square isdirty, theaction does notdepend on thelocation. Although wehavewritten theagentprogram using if-then-else statements, itis simpleenough thatitcanalsobeimplementedasa Booleancircuit. Simplereflexbehaviors occur eveninmorecomplex environments. Imagine yourself as the driver of the automated taxi. If the car in front brakes and its brake lights come on, then you should notice this and initiate braking. In other words, some processing is done on the 50 Chapter 2 Intelligent Agents Agent Environment Sensors What the world is like now What action I Condition-action rules should do now Actuators Figure 2.9 Schematic diagram of a simple reflex agent. We use rectangles to denote the currentinternalstateoftheagentâsdecisionprocess,andovalstorepresentthebackground informationusedintheprocess. visualinputtoestablishtheconditionwecallâThecarinfrontisbraking.â Then,thistriggers some established connection in the agent program to the action âinitiate braking.â We call Conditionâaction suchaconnection aconditionâaction rule,6 writtenas rule ifcar-in-front-is-braking theninitiate-braking. Humansalsohavemanysuchconnections, someofwhicharelearned responses(asfordriving)andsomeofwhichareinnatereflexes(suchasblinkingwhensomething approaches the eye). In the course of the book, we show several different ways in which such connections canbelearnedandimplemented. The program in Figure 2.8 is specific to one particular vacuum environment. A more general and flexible approach is first to build a general-purpose interpreter for conditionâ action rules and then to create rule sets for specific task environments. Figure 2.9 gives the structure ofthisgeneral programinschematicform,showinghowtheconditionâaction rules allow the agent to make the connection from percept to action. Do not worry if this seems trivial;itgetsmoreinteresting shortly. An agent program for Figure 2.9 is shown in Figure 2.10. The INTERPRET-INPUT function generates an abstracted description of the current state from the percept, and the RULE-MATCH function returns the first rule in the set of rules that matches the given state description. Note that the description in terms of ârulesâ and âmatchingâ is purely conceptual; as noted above, actual implementations can be as simple as a collection of logic gates implementinga Booleancircuit. Alternatively,aâneuralâcircuitcanbeused,wherethelogic gatesarereplaced bythenonlinear unitsofartificialneuralnetworks(see Chapter21). ⎠Simplereflexagentshavetheadmirableproperty ofbeingsimple,buttheyareoflimited intelligence. Theagent in Figure 2.10willwork only ifthecorrect decision canbe madeon thebasisofjustthecurrentperceptâthat is,onlyiftheenvironment isfullyobservable. 6 Alsocalledsituationâactionrules,productions,orifâthenrules. Section2.4 The Structureof Agents 51 function SIMPLE-REFLEX-AGENT(percept)returnsanaction persistent: rules,asetofconditionâactionrules stateâINTERPRET-INPUT(percept) ruleâRULE-MATCH(state,rules) actionârule.ACTION returnaction Figure2.10 Asimplereflexagent. Itactsaccordingtoarulewhoseconditionmatchesthe currentstate,asdefinedbythepercept. Even a little bit of unobservability can cause serious trouble. For example, the braking rule given earlier assumes that the condition car-in-front-is-braking can be determined from the current perceptâa single frame of video. This works if the car in front has a centrally mounted (and hence uniquely identifiable) brake light. Unfortunately, older models have different configurations of taillights, brake lights, and turn-signal lights, and it is not always possible to tell from a single image whether the car is braking or simply has its taillights on. A simple reflex agent driving behind such a car would either brake continuously and unnecessarily, or,worse,neverbrakeatall. Wecan seeasimilar problem arising inthe vacuum world. Suppose that asimple reflex vacuum agent is deprived of its location sensor and has only a dirt sensor. Such an agent has just two possible percepts: [Dirty] and [Clean]. It can Suck in response to [Dirty]; what shoulditdoinresponseto[Clean]? Moving Leftfails(forever)ifithappenstostartinsquare A, and moving Right fails (forever) ifithappens to start insquare B. Infinite loops are often unavoidable forsimplereflexagentsoperating inpartially observable environments. Escape from infinite loops ispossible ifthe agent canrandomize itsactions. Forexam- Randomization ple, if the vacuum agent perceives [Clean], it might flip a coin to choose between Right and Left. It is easy to show that the agent will reach the other square in an average of twosteps. Then, if that square is dirty, the agent will clean it and the task will be complete. Hence, a randomized simplereflexagentmightoutperform adeterministic simplereflexagent. Wementionedin Section2.3thatrandomizedbehavioroftherightkindcanberationalin some multiagent environments. In single-agent environments, randomization is usually not rational. It is a useful trick that helps a simple reflex agent in some situations, but in most caseswecandomuchbetterwithmoresophisticated deterministic agents. 2.4.3 Model-based reflex agents The most effective way to handle partial observability is for the agent to keep track of the part of the world it canât see now. That is, the agent should maintain some sort of internal statethatdepends onthepercepthistoryandtherebyreflectsatleastsomeoftheunobserved Internal state aspects ofthecurrentstate. Forthebraking problem, theinternal stateisnottooextensiveâ just the previous frame from the camera, allowing the agent to detect when twored lights at theedgeofthevehicle goonoroffsimultaneously. Forotherdriving taskssuchaschanging lanes,theagentneedstokeeptrackofwheretheothercarsareifitcanâtseethemallatonce. Andforanydrivingtobepossibleatall,theagentneedstokeeptrackofwhereitskeysare. 52 Chapter 2 Intelligent Agents Agent Environment Sensors State How the world evolves What the world is like now What my actions do What action I Condition-action rules should do now Actuators Figure2.11 Amodel-basedreflexagent. Updatingthisinternalstateinformationastimegoesbyrequirestwokindsofknowledge tobeencodedintheagentprograminsomeform. First,weneedsomeinformationabouthow the world changes over time, which can be divided roughly into twoparts: the effects of the agentâsactionsandhowtheworldevolvesindependentlyoftheagent. Forexample,whenthe agent turns the steering wheelclockwise, the car turns to theright, and when itâs raining the carâs cameras can get wet. This knowledge about âhow the world worksââwhether implemented in simple Boolean circuits or in complete scientific theoriesâis called a transition Transitionmodel modeloftheworld. Second, we need some information about how the state of the world is reflected in the agentâs percepts. For example, when the car in front initiates braking, one or more illuminated red regions appear in the forward-facing camera image, and, when the camera gets wet, droplet-shaped objects appear in the image partially obscuring the road. This kind of Sensormodel knowledgeiscalledasensormodel. Together, thetransition modelandsensormodelallowanagenttokeeptrackofthestate of the worldâto the extent possible given the limitations of the agentâs sensors. An agent Model-basedagent thatusessuchmodelsiscalledamodel-basedagent. Figure2.11givesthestructure ofthemodel-based reflexagentwithinternal state,showing how the current percept is combined with the old internal state to generate the updated descriptionofthecurrentstate,basedontheagentâsmodelofhowtheworldworks. Theagent programisshownin Figure2.12. Theinterestingpartisthefunction UPDATE-STATE,which is responsible for creating the new internal state description. Thedetails of how models and states are represented vary widely depending on the type of environment and the particular technology usedintheagentdesign. Regardlessofthekindofrepresentation used,itisseldompossiblefortheagenttodeterminethecurrent stateofapartially observable environment exactly. Instead, theboxlabeled âwhattheworldislikenowâ(Figure2.11)represents theagentâsâbestguessâ(orsometimes best guesses, if the agent entertains multiple possibilities). For example, an automated taxi Section2.4 The Structureof Agents 53 function MODEL-BASED-REFLEX-AGENT(percept)returnsanaction persistent: state,theagentâscurrentconceptionoftheworldstate transition model,adescriptionofhowthenextstatedependson thecurrentstateandaction sensor model,adescriptionofhowthecurrentworldstateisreflected intheagentâspercepts rules,asetofconditionâactionrules action,themostrecentaction,initiallynone stateâUPDATE-STATE(state,action,percept,transition model,sensor model) ruleâRULE-MATCH(state,rules) actionârule.ACTION returnaction Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world, usinganinternalmodel.Itthenchoosesanactioninthesamewayasthereflexagent. maynotbeabletoseearoundthelargetruckthathasstoppedinfrontofitandcanonlyguess about what may be causing the hold-up. Thus, uncertainty about the current state may be unavoidable, buttheagentstillhastomakeadecision. 2.4.4 Goal-based agents Knowingsomethingaboutthecurrentstateoftheenvironmentisnotalwaysenoughtodecide what to do. For example, at a road junction, the taxi can turn left, turn right, or go straight on. The correct decision depends on where the taxi is trying to get to. In other words, as well as a current state description, the agent needs some sort of goal information that Goal describes situations that are desirableâfor example, being at a particular destination. The agent program can combine this with the model (the same information as was used in the model-based reflex agent) to choose actions that achieve the goal. Figure 2.13 shows the goal-based agentâsstructure. Sometimes goal-based action selection is straightforwardâfor example, when goal satisfaction results immediately from a single action. Sometimes it will be more trickyâfor example,whentheagenthastoconsider longsequences oftwistsandturnsinordertofinda waytoachievethegoal. Search(Chapters3to5)andplanning(Chapter11)arethesubfields of AIdevoted tofindingactionsequences thatachievetheagentâsgoals. Notice that decision making of this kind is fundamentally different from the conditionâ actionrulesdescribed earlier,inthatitinvolvesconsideration ofthefutureâbothâWhatwill happen if Idosuch-and-such?â andâWillthatmakemehappy?â Inthereflexagentdesigns, this information is not explicitly represented, because the built-in rules map directly from percepts to actions. The reflex agent brakes when it sees brake lights, period. It has no idea why. A goal-based agent brakes whenit sees brake lights because thatâs the only action that itpredicts willachieveitsgoalofnothittingothercars. Although the goal-based agent appears less efficient, it is more flexible because the knowledge that supports its decisions is represented explicitly and can be modified. For example,agoal-basedagentâsbehaviorcaneasilybechangedtogotoadifferentdestination, 54 Chapter 2 Intelligent Agents Agent Environment Sensors State What the world How the world evolves is like now What it will be like What my actions do if I do action A What action I Goals should do now Actuators Figure 2.13 A model-based,goal-basedagent. It keepstrack of the world state as well as asetofgoalsitistryingtoachieve,andchoosesanactionthatwill(eventually)leadtothe achievementofitsgoals. simply by specifying that destination as the goal. The reflex agentâs rules for when to turn and when to go straight will work only for a single destination; they must all be replaced to gosomewherenew. 2.4.5 Utility-based agents Goals alone are not enough to generate high-quality behavior in most environments. For example, many action sequences will get the taxi to its destination (thereby achieving the goal), butsomearequicker, safer,morereliable, orcheaper thanothers. Goalsjustprovide a crudebinarydistinctionbetweenâhappyâandâunhappyâstates. Amoregeneralperformance measureshouldallowacomparison ofdifferentworldstatesaccordingtoexactlyhowhappy theywouldmaketheagent. Becauseâhappyâ doesnotsoundveryscientific, economists and Utility computerscientists usethetermutilityinstead.7 Wehave already seenthat aperformance measure assigns ascoretoanygivensequence of environment states, so it can easily distinguish between more and less desirable ways of Utilityfunction getting to the taxiâs destination. An agentâs utility function is essentially an internalization of the performance measure. Provided that the internal utility function and the external performancemeasureareinagreement,anagentthatchoosesactionstomaximizeitsutilitywill berationalaccording totheexternalperformance measure. Letusemphasizeagainthatthisisnottheonlywaytoberationalâwehavealreadyseen a rational agent program for the vacuum world (Figure 2.8) that has no idea what its utility function isâbut, like goal-based agents, autility-based agent has many advantages in terms of flexibility and learning. Furthermore, in two kinds of cases, goals are inadequate but a utility-based agent can still make rational decisions. First, when there are conflicting goals, only some of which can be achieved (for example, speed and safety), the utility function specifies the appropriate tradeoff. Second, when there are several goals that the agent can 7 Thewordâutilityâherereferstoâthequalityofbeinguseful,ânottotheelectriccompanyorwaterworks. Section2.4 The Structureof Agents 55 Agent Environment Sensors State What the world How the world evolves is like now What it will be like What my actions do if I do action A How happy I will be Utility in such a state What action I should do now Actuators Figure2.14 Amodel-based,utility-basedagent. Itusesamodeloftheworld,alongwitha utilityfunctionthatmeasuresitspreferencesamongstatesoftheworld. Thenitchoosesthe actionthatleadstothebestexpectedutility,whereexpectedutilityiscomputedbyaveraging overallpossibleoutcomestates,weightedbytheprobabilityoftheoutcome. aim for, none of which can be achieved with certainty, utility provides a way in which the likelihood ofsuccesscanbeweighedagainsttheimportance ofthegoals. Partial observability and nondeterminism areubiquitous inthereal world, and so, therefore, is decision making under uncertainty. Technically speaking, a rational utility-based agent chooses the action that maximizes the expected utility of the action outcomesâthat Expected utility is, the utility the agent expects to derive, on average, given the probabilities and utilities of each outcome. (Appendix A defines expectation more precisely.) In Chapter 16, we show thatanyrational agentmustbehave asifitpossesses autilityfunction whoseexpected value ittriestomaximize. Anagentthatpossessesanexplicitutilityfunctioncanmakerationaldecisionswithageneral-purpose algorithmthatdoesnotdependonthespecificutilityfunction being maximized. Inthis way, the âglobalâ definition ofrationalityâdesignating asrational those agent functions that have the highest performanceâis turned into a âlocalâ constraint onrational-agent designs thatcanbeexpressed inasimpleprogram. The utility-based agent structure appears in Figure 2.14. Utility-based agent programs appearin Chapters16and17,wherewedesigndecision-making agentsthatmusthandle the uncertaintyinherentinnondeterministicorpartiallyobservableenvironments. Decisionmakinginmultiagentenvironmentsisalsostudiedintheframeworkofutilitytheory,asexplained in Chapter18. Atthis point, the reader maybe wondering, âIs it that simple? Wejust build agents that maximize expected utility, and weâre done?â Itâs true that such agents would be intelligent, but itâs not simple. A utility-based agent has to model and keep track of its environment, tasks that have involved a great deal of research on perception, representation, reasoning, and learning. The results of this research fill many of the chapters of this book. Choosing theutility-maximizing courseofactionisalsoadifficulttask,requiringingeniousalgorithms that fill several more chapters. Even with these algorithms, perfect rationality is usually 56 Chapter 2 Intelligent Agents Performance standard Agent Environment Critic Sensors feedback changes Learning Performance element element knowledge learning goals Problem generator Actuators Figure2.15 Agenerallearningagent. Theâperformanceelementâboxrepresentswhatwe havepreviouslyconsideredtobethewholeagentprogram.Now,theâlearningelementâbox getstomodifythatprogramtoimproveitsperformance. unachievable inpracticebecauseofcomputational complexity,aswenotedin Chapter1. We alsonotethatnotallutility-based agentsaremodel-based; wewillseein Chapters22and26 Model-freeagent that a model-free agent can learn what action is best in a particular situation without ever learning exactlyhowthatactionchangestheenvironment. Finally, all of this assumes that the designer can specify the utility function correctly; Chapters17,18,and22consider theissueofunknownutilityfunctions inmoredepth. 2.4.6 Learning agents We have described agent programs with various methods for selecting actions. We have not, so far, explained how the agent programs come into being. In his famous early paper, Turing (1950) considers the idea of actually programming his intelligent machines by hand. Heestimateshowmuchworkthismighttakeandconcludes,âSomemoreexpeditiousmethod seems desirable.â The method he proposes is to build learning machines and then to teach them. In many areas of AI, this is now the preferred method for creating state-of-the-art systems. Any type of agent (model-based, goal-based, utility-based, etc.) can be built as a learning agent(ornot). Learninghasanotheradvantage, aswenotedearlier: itallowstheagenttooperateininitiallyunknownenvironments andtobecomemorecompetentthanitsinitialknowledgealone mightallow. Inthissection,webrieflyintroducethemainideasoflearningagents. Throughout the book, we comment on opportunities and methods for learning in particular kinds of agents. Chapters19â22gointomuchmoredepthonthelearning algorithms themselves. A learning agent can be divided into four conceptual components, as shown in Fig Learningelement ure 2.15. The most important distinction is between the learning element, which is re Performance sponsibleformakingimprovements,andtheperformanceelement,whichisresponsiblefor element selecting external actions. The performance element is what we have previously considered Section2.4 The Structureof Agents 57 tobetheentire agent: ittakes inpercepts and decides onactions. Thelearning element uses feedback from the critic on how the agent is doing and determines how the performance Critic elementshouldbemodifiedtodobetterinthefuture. Thedesignofthelearning elementdependsverymuchonthedesignoftheperformance element. When trying to design an agent that learns a certain capability, the first question is notâHowam Igoingtogetittolearnthis?â butâWhatkindofperformanceelementwillmy agent use to do this once it has learned how?â Given a design for the performance element, learning mechanismscanbeconstructed toimproveeverypartoftheagent. The critic tells the learning element how well the agent is doing with respect to a fixed performance standard. The critic is necessary because the percepts themselves provide no indication of the agentâs success. For example, a chess program could receive a percept indicating that it has checkmated its opponent, but it needs a performance standard to know thatthisisagoodthing;theperceptitselfdoesnotsayso. Itisimportantthattheperformance standard be fixed. Conceptually, one should think of it as being outside the agent altogether becausetheagentmustnotmodifyittofititsownbehavior. The last component of the learning agent is the problem generator. It is responsible Problem generator for suggesting actions that willlead to new and informative experiences. If the performance element had its way, it would keep doing the actions that are best, given what it knows, but iftheagent iswilling toexplore alittle anddosomeperhaps suboptimal actions intheshort run,itmightdiscovermuchbetteractions forthelongrun. Theproblemgeneratorâs jobisto suggesttheseexploratoryactions. Thisiswhatscientistsdowhentheycarryoutexperiments. Galileodidnotthinkthatdroppingrocksfromthetopofatowerin Pisawasvaluableinitself. Hewasnot trying to break the rocks orto modify the brains of unfortunate pedestrians. His aimwastomodifyhisownbrainbyidentifying abettertheoryofthemotionofobjects. The learning element can make changes to any of the âknowledgeâ components shown intheagentdiagrams(Figures2.9,2.11,2.13,and2.14). Thesimplestcasesinvolvelearning directly from the percept sequence. Observation of pairs of successive states of the environment can allow the agent to learn âWhat my actions doâ and âHow the world evolvesâ in response to its actions. For example, if the automated taxi exerts a certain braking pressure when driving on a wet road, then it will soon find out how much deceleration is actually achieved, and whether it skids off the road. The problem generator might identify certain parts of the model that are in need of improvement and suggest experiments, such as trying outthebrakesondifferent roadsurfaces underdifferentconditions. Improving the model components of a model-based agent so that they conform better with reality is almost always a good idea, regardless of the external performance standard. (In some cases, it is better from a computational point of view to have a simple but slightly inaccurate model rather than a perfect but fiendishly complex model.) Information from the externalstandard isneededwhentryingtolearnareflexcomponent orautilityfunction. For example, suppose the taxi-driving agent receives no tips from passengers who have been thoroughly shaken up during the trip. The external performance standard must inform the agent that the loss of tips is a negative contribution to its overall performance; then the agent might be able to learn that violent maneuvers do not contribute to its own utility. In a sense, the performance standard distinguishes part of the incoming percept as a reward Reward (orpenalty)thatprovides directfeedback onthequality oftheagentâs behavior. Hard-wired Penalty performance standards suchaspainandhungerinanimalscanbeunderstood inthisway. 58 Chapter 2 Intelligent Agents More generally, human choices can provide information about human preferences. For example, suppose the taxi does not know that people generally donât like loud noises, and settles on the idea of blowing its horn continuously as a way of ensuring that pedestrians knowitâscoming. Theconsequent humanbehaviorâcovering ears,usingbadlanguage, and possibly cutting the wires to the hornâwould provide evidence to the agent with which to updateitsutilityfunction. Thisissueisdiscussed furtherin Chapter22. In summary, agents have a variety of components, and those components can be represented in many ways within the agent program, so there appears to be great variety among learning methods. Thereis, however, asingle unifying theme. Learning inintelligent agents canbesummarized asaprocess ofmodification ofeachcomponent oftheagent tobring the components into closer agreement withthe available feedback information, thereby improvingtheoverallperformance oftheagent. 2.4.7 How the components of agent programs work Wehavedescribedagentprograms(inveryhigh-levelterms)asconsistingofvariouscomponents,whosefunctionitistoanswerquestionssuchas: âWhatistheworldlikenow?â âWhat action should I do now?â âWhat do my actions do?â The next question for a student of AI is, âHow on Earth do these components work?â It takes about a thousand pages to begin to answer that question properly, buthere wewantto draw thereaderâs attention to somebasic distinctions among thevarious waysthatthecomponents canrepresent theenvironment that theagentinhabits. Roughlyspeaking,wecanplacetherepresentations alonganaxisofincreasingcomplexityandexpressive powerâatomic, factored, andstructured. Toillustrate theseideas, ithelps to consider a particular agent component, such as the one that deals with âWhat my actions do.â Thiscomponent describes thechanges thatmightoccur intheenvironment astheresult of taking an action, and Figure 2.16 provides schematic depictions of how those transitions mightberepresented. B C B C (a) Atomic (b) Factored (c) Structured Figure 2.16 Three ways to represent states and the transitions between them. (a) Atomic representation:astate(suchas Bor C)isablackboxwithnointernalstructure;(b)Factored representation: a state consists of a vectorof attribute values; valuescan be Boolean, realvalued, or one of a fixed set of symbols. (c) Structured representation: a state includes objects,eachofwhichmayhaveattributesofitsownaswellasrelationshipstootherobjects. Section2.4 The Structureof Agents 59 In an atomic representation each state of the world is indivisibleâit has no internal Atomic representation structure. Consider thetask offindingadriving route from oneendofacountry totheother viasomesequence ofcities(weaddress thisproblem in Figure3.1onpage64). Forthepurposesofsolvingthisproblem,itmaysufficetoreducethestateoftheworldtojustthenameof thecity weareinâasingle atom ofknowledge, aâblack boxâ whoseonly discernible propertyisthatofbeingidenticaltoordifferentfromanotherblackbox. Thestandardalgorithms underlying search and game-playing (Chapters 3â5), hidden Markov models (Chapter 14), and Markovdecision processes (Chapter17)allworkwithatomicrepresentations. Afactoredrepresentationsplitsupeachstateintoafixedsetofvariablesorattributes, Factored representation each of which can have a value. Consider a higher-fidelity description for the same driving Variable problem, where we need to be concerned with more than just atomic location in one city or Attribute another; we might need to pay attention to how much gas is in the tank, our current GPS Value coordinates, whether or not the oil warning light is working, how much money we have for tolls,whatstationisontheradio,andsoon. Whiletwodifferentatomicstateshavenothingin commonâthey are just different black boxesâtwo different factored states can share some attributes (suchasbeingatsomeparticular GPSlocation) andnotothers(suchashaving lots ofgasorhaving nogas);thismakesitmucheasiertoworkouthowtoturnonestateintoanother. Manyimportantareasof AIarebasedonfactoredrepresentations, includingconstraint satisfaction algorithms (Chapter 6), propositional logic (Chapter 7), planning (Chapter 11), Bayesiannetworks(Chapters12â16), andvariousmachinelearning algorithms. For many purposes, we need to understand the world as having things in it that are related to each other, not just variables withvalues. Forexample, wemightnotice that alarge truck ahead of us is reversing into the driveway of a dairy farm, but a loose cow is blocking the truckâs path. A factored representation is unlikely to be pre-equipped with the attribute Truck Ahead Backing Into Dairy Farm Driveway Blocked By Loose Cow with value true or false. Instead, we would need a structured representation, in which objects such as cows Structured representation and trucks and their various and varying relationships can be described explicitly (see Figure 2.16(c)). Structured representations underlie relational databases and first-order logic (Chapters8,9,and10),first-orderprobability models(Chapter15),andmuchofnaturallanguage understanding (Chapters 23 and24). Infact, muchofwhathumans express innatural language concerns objectsandtheirrelationships. As we mentioned earlier, the axis along which atomic, factored, and structured representations lieistheaxis ofincreasing expressiveness. Roughly speaking, amoreexpressive Expressiveness representation cancapture,atleastasconcisely, everythingalessexpressiveonecancapture, plussomemore. Often,themoreexpressivelanguageismuchmoreconcise;forexample,the rules of chess can be written in a page or two of a structured-representation language such as first-order logic but require thousands of pages when written in a factored-representation language such as propositional logic and around 1038 pages when written in an atomic languagesuchasthatoffinite-stateautomata. Ontheotherhand,reasoningandlearningbecome more complex as the expressive power of the representation increases. To gain the benefits ofexpressive representations whileavoiding their drawbacks, intelligent systems forthereal worldmayneedtooperateatallpointsalongtheaxissimultaneously. Anotheraxisforrepresentation involvesthemappingofconceptstolocationsinphysical memory, whether in a computer or in a brain. If there is a one-to-one mapping between concepts and memory locations, we call that a localist representation. On the other hand, Localist representation 60 Chapter 2 Intelligent Agents if the representation of a concept is spread over many memory locations, and each memory location is employed as part of the representation of multiple different concepts, we call Distributed thatadistributedrepresentation. Distributed representations aremorerobust against noise representation and information loss. With a localist representation, the mapping from concept to memory location is arbitrary, and if a transmission error garbles a few bits, we might confuse Truck withtheunrelatedconcept Truce. Butwithadistributedrepresentation, youcanthinkofeach conceptrepresentingapointinmultidimensionalspace,andifyougarbleafewbitsyoumove toanearbypointinthatspace,whichwillhavesimilarmeaning. Summary This chapter has been something of a whirlwind tour of AI, which we have conceived of as thescienceofagentdesign. Themajorpointstorecallareasfollows: ⢠Anagent issomething that perceives and acts in an environment. The agent function foranagentspecifiestheactiontakenbytheagentinresponsetoanyperceptsequence. ⢠The performance measure evaluates the behavior of the agent in an environment. A rational agentactssoastomaximize theexpected valueoftheperformance measure, giventheperceptsequence ithasseensofar. ⢠A task environment specification includes the performance measure, the external environment, the actuators, and the sensors. In designing an agent, the first step must alwaysbetospecifythetaskenvironment asfullyaspossible. ⢠Taskenvironments varyalongseveralsignificantdimensions. Theycanbefullyorpartiallyobservable,single-agentormultiagent,deterministicornondeterministic,episodic orsequential, staticordynamic, discreteorcontinuous, andknownorunknown. ⢠Incaseswheretheperformance measureisunknown orhardtospecify correctly, there isasignificantriskoftheagentoptimizingthewrongobjective. Insuchcasestheagent designshouldreflectuncertainty aboutthetrueobjective. ⢠The agent program implements the agent function. There exists a variety of basic agent program designs reflecting thekindofinformation madeexplicit and usedinthe decision process. The designs vary in efficiency, compactness, and flexibility. The appropriate designoftheagentprogram depends onthenatureoftheenvironment. ⢠Simplereflexagentsresponddirectlytopercepts,whereasmodel-basedreflexagents maintain internal state to track aspects of the world that are not evident in the current percept. Goal-based agents act to achieve their goals, and utility-based agents try to maximizetheirownexpected âhappiness.â ⢠Allagentscanimprovetheirperformance throughlearning. Bibliographical and Historical Notes The central role of action in intelligenceâthe notion of practical reasoningâgoes back at least as far as Aristotleâs Nicomachean Ethics. Practical reasoning was also the subject of Mc Carthyâsinfluential paperâProgramswith Common Senseâ(1958). Thefieldsofrobotics andcontrol theory are, bytheir verynature, concerned principally withphysical agents. The Bibliographicaland Historical Notes 61 concept of a controller in control theory is identical to that of an agent in AI. Perhaps sur- Controller prisingly, AI has concentrated for most of its history on isolated components of agentsâ question-answering systems, theorem-provers, vision systems, and so onârather than on wholeagents. Thediscussion ofagents inthetextby Genesereth and Nilsson(1987) wasan influentialexception. Thewhole-agentviewisnowwidelyacceptedandisacentralthemein recenttexts(Padghamand Winikoff, 2004;Jones,2007;Pooleand Mackworth,2017). Chapter 1traced the roots ofthe concept of rationality in philosophy and economics. In AI,theconceptwasofperipheralinterestuntilthemid-1980s,whenitbegantosuffusemany discussions aboutthepropertechnical foundations ofthefield. Apaperby Jon Doyle(1983) predicted that rational agent design would come to be seen as the core mission of AI, while otherpopulartopicswouldspinofftoformnewdisciplines. Careful attention to the properties of the environment and their consequences for rational agent design is most apparent in the control theory traditionâfor example, classical control systems (Dorf and Bishop, 2004; Kirk, 2004) handle fully observable, deterministic environments; stochastic optimal control (Kumar and Varaiya, 1986; Bertsekas and Shreve, 2007) handles partially observable, stochastic environments; and hybrid control (Henzinger and Sastry, 1998; Cassandras and Lygeros, 2006) deals with environments containing both discrete andcontinuous elements. Thedistinction betweenfully andpartially observable environments is also central in the dynamic programming literature developed in the field of operations research (Puterman,1994), whichwediscuss in Chapter17. Although simple reflex agents were central to behaviorist psychology (see Chapter 1), most AIresearchersviewthemastoosimpletoprovidemuchleverage. (Rosenschein (1985) and Brooks (1986) questioned this assumption; see Chapter 26.) A great deal of work has gone into finding efficient algorithms for keeping track of complex environments (Bar Shalometal.,2001;Chosetetal.,2005;Simon,2006),mostofitintheprobabilistic setting. Goal-based agents are presupposed in everything from Aristotleâs view of practical reasoning to Mc Carthyâs early papers on logical AI. Shakey the Robot (Fikes and Nilsson, 1971; Nilsson, 1984) was the first robotic embodiment of a logical, goal-based agent. A full logical analysis of goal-based agents appeared in Genesereth and Nilsson (1987), and a goal-basedprogrammingmethodologycalledagent-orientedprogrammingwasdevelopedby Shoham (1993). The agent-based approach is now extremely popular in software engineering (Ciancarini and Wooldridge, 2001). It has also infiltrated the area of operating systems, whereautonomiccomputingreferstocomputersystemsandnetworksthatmonitorandcon- Autonomic computing trolthemselves withaperceiveâact loopandmachinelearning methods(Kephartand Chess, 2003). Noting that a collection of agent programs designed to work well together in a true multiagentenvironmentnecessarilyexhibitsmodularityâtheprogramssharenointernalstate and communicate with each other only through the environmentâit is common within the field of multiagent systems to design the agent program of asingle agent as a collection of autonomous sub-agents. In some cases, one can even prove that the resulting system gives thesameoptimalsolutions asamonolithicdesign. The goal-based view of agents also dominates the cognitive psychology tradition in the area of problem solving, beginning with the enormously influential Human Problem Solving(Newelland Simon,1972)andrunningthroughallof Newellâslaterwork(Newell,1990). Goals, further analyzed asdesires (general) andintentions (currently pursued), arecentral to theinfluentialtheoryofagentsdevelopedby Michael Bratman(1987). 62 Chapter 2 Intelligent Agents Asnoted in Chapter 1, the development of utility theory as a basis for rational behavior goes back hundreds of years. In AI, early research eschewed utilities in favor of goals, with some exceptions (Feldman and Sproull, 1977). The resurgence of interest in probabilistic methods in the 1980s led to the acceptance of maximization of expected utility as the most general framework for decision making (Horvitz etal., 1988). The text by Pearl(1988) was thefirstin AItocoverprobabilityandutilitytheoryindepth;itsexpositionofpracticalmethods for reasoning and decision making under uncertainty was probably the single biggest factor in the rapid shift towards utility-based agents in the 1990s (see Chapter 16). The formalization of reinforcement learning within adecision-theoretic framework also contributed tothisshift(Sutton, 1988). Somewhatremarkably, almostall AIresearch until veryrecently hasassumedthattheperformance measurecanbeexactlyandcorrectlyspecifiedintheform ofautilityfunctionorrewardfunction(Hadfield-Menell etal.,2017a;Russell,2019). Thegeneral design forlearning agents portrayed in Figure2.15isclassic inthemachine learning literature (Buchanan et al., 1978; Mitchell, 1997). Examples of the design, as embodiedinprograms,gobackatleastasfaras Arthur Samuelâs(1959,1967)learningprogram forplayingcheckers. Learningagentsarediscussed indepthin Chapters19â22. Someearly papers on agent-based approaches are collected by Huhns and Singh (1998) and Wooldridgeand Rao(1999). Textsonmultiagentsystemsprovideagoodintroduction to many aspects of agent design (Weiss, 2000a; Wooldridge, 2009). Several conference series devoted to agents began in the 1990s, including the International Workshop on Agent Theories, Architectures, and Languages (ATAL), the International Conference on Autonomous Agents (AGENTS),and the International Conference on Multi-Agent Systems (ICMAS).In 2002, thesethree mergedtoformthe International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). From 2000 to 2012 there were annual workshops on Agent-Oriented Software Engineering (AOSE). The journal Autonomous Agents and Multi Agent Systems was founded in 1998. Finally, Dung Beetle Ecology (Hanski and Cambefort, 1991)providesawealthofinterestinginformation onthebehaviorofdungbeetles. You Tube hasinspiring videorecordings oftheiractivities.