Joseph Psotka, Ph.D
. U. S. Army Research Institute
ATTN: PERI-IIC
5001 Eisenhower Avenue
Alexandria, VA 22333-5600
(703)274-5540/5545/5569
Psotka@alexandria-emh2.army.mil or psotka@26.1.0.50
FAX: 274-5461
Exploring the characteristics of virtual space is a psychologists dream. The problems in understanding "immersion" or "presence" are fundamentally psychological and the applied issues depend on good cognitive design and engineering. The accurate location of ones virtual egocenter in a geometric space has really not been possible before the advent of the suite of new technologies we now call Virtual Reality (VR). Yet, understanding the factors that affect the psychological situation of ones sense of self is of critical importance for immersion technologies. Exploring virtual space with a tracked helmet mounted display usually leads to headaches and many missteps. Explorers may have an error-prone time rounding corners or picking up objects because they cannot accurately locate those objects or themselves. The implications of an experiment is described that has taken a unique step forward to increasing our understanding of virtual space. This experiment was conducted to investigate the role of field of view (FOV) and observer station points in the perception of the location of ones egocenter (the personal viewpoint) in virtual space. Fourteen subjects viewed an animated 3D model, of the room in which they sat, binocularly, from different Eye Station Points (ESP). They saw four models of the room designed with four geometric field of view (FOVg) conditions of 18, 48, 86, and 140 degrees. They drew the apparent paths of the camera generating the images in the room on a bitmap of the room as seen from infinity above. Large differences in the paths of the camera were seen as a function of both FOVg and ESP. The results fit well with predictions from an equation that takes the ratio of human eyefield horizontal FOV (roughly 180 degrees) to FOVg times the Geometric Eye Point (GEP ) of the image:
Ecologically, we have not been adequately prepared for pictures, let alone TV, video, and now Cyberspace. These technologies try to recreate reality for us, but inevitably they distort reality. In the doing, they create new wonders that stir us to invest our energies and explore their possibilities (Piantaneda, Boma, and Gille, 1993).
It is a wonder that pictures, TV and movies let us see ourselves immersed in their possibilities at all. Clearly, they are not as effective as live theater in the affordances they provide us to immerse ourselves and indulge in a "willing suspension of disbelief", but they are remarkably effective. A brief look at their limitations compared to theater, or even to a window on the world, makes it surprising that they are as effective as they are. A video has movement, and that generates important differences (such as kinetic depth effects and multiple views of the same object) that we will ignore for the moment. The primary, fundamental difference between a flat representation like a TV or picture and a three dimensional representation like a scene through an aperture such as a glass window or stage is that self-motion produces drastically different results. Looking through a picture, TV, or movie screen, head motion specifies what is actually there: a flat screen (Hochberg, 1986); whereas looking through a glass window onto the world outside, head motion specifies our real relationships with objects. The fact is, we hardly ever hold our heads still, so motion parallax should constantly be telling us that what we are looking at on TV or in a picture is fake. Yet, we ignore this evidence with ease, although its effects are there to be seen: in fact, various effective proposals have been offered to make pictures more realistic by manipulating binocular views to give the two eyes the same view ( e.g. Klein & Dultz, 1992).
============================== ______________________________ ==============================
Since modern TV and movies are filmed with superb optics, the images have an unequivocal, single projection point. Given that we have two eyes, one of them must be in the wrong place, yet we easily ignore this with minor loss of fidelity. In fact, most pictures can be viewed from widely eccentric locations with no appreciable apparent distortions. Psychological theories abound trying to explain this, with varying degrees of success (e.g. Cutting, 1991). Yet, most of them ignore a salient and compelling difference. When you look out a window or other natural aperture, your sense of self remains rooted where you are, without any conflicts. But, look at a picture, and immediately you must adopt an alter - ego, a sense of location at a virtual point in space that really is only accidentally in your real space, but mainly exists in the virtual space of the picture or video. This new viewpoint on that space is a kind of immersion, and the new viewpoint can be said to be at the virtual egocenter.
Immersion might be defined as the degree of compatibility between the location of the sense of self in the real world and in the virtual world. If, when you examine a picture or representation, your sense of self or egocenter remains rooted to the same spot, your degree of immersion ought to be high. If you are forced to adopt a separate egocenter in the space of the representation, immersion should be reduced. Should immersion then reflect the distance in space that the egocenter moves? To some extent the answer might be yes, since for small distances the rivalry should hardly be noticed. Also for smaller distance, adaptation to the discrepancy ought also to take place (c.f. Rock, 1975; Welch, 1986) quickly to remove the rivalry.
However, immersion also depends on the compatibility of different body spaces within an individual. The visual space has to be coordinated with the proprioceptive space of the eyes, head, and hands. This is not always coordinated in virtual environments, nor is it easy to engineer, since the psychological cues for these spaces remain complex and poorly understood.
Yet, there is more to immersion than this. For example, one can become completely immersed by donning a display that takes you to a completely different place, transporting you thousands of miles in an instant. Surely this distance in itself should have little effect on the quality of immersion. So total immersion is somehow different from partial immersion in a local representation like a picture or a movie. The visual coordination of real and virtual spaces becomes less important than the coordination of virtual and proprioceptive spaces.
The new technologies of virtual reality offer experimenters an opportunity to explore many factors that affect the perception and cognition of virtual spaces in ways that were either not possible before, or that were so difficult to control that they were virtually impossible. One important new psychological effect of these technologies is a re- emphasis of research perspectives on internal, cognitive constructs such as egocenter and virtual space, rather than external, objective constructs such as distance, size, and slant (Beer, 1992). Within cyberspace, the accurate location of ones sense of self assumes an importance because it can be manipulated. You can be made to feel vividly that your visual center is not coordinated with your hands and head, as well as the wrong "distance" from objects. Perceived "distance" remains an important psychological variable, but perceived egocenter location assumes a new simplifying functionality for understanding effects within virtual space.
The accurate location of ones virtual egocenter in a geometric space is of critical importance for immersion. Furness (1992) and Howlett (1990) report that immersion is only experienced when the field of view (FOV) is greater than 60 degrees, or at least in the 60 to 90 degree FOV range. Why this should be so is not understood, nor are there theoretical frameworks for beginning to understand this phenomenon. The question is also important for dealing with simulation or motion sickness. Immersion environments are notorious for producing motion sickness, and an inaccurate location of virtual egocenters may be implicated in this noxious effect. Jex (1991) reports that simulator sickness is hardly ever felt with FOV less than 60 degrees (the complement of immersion FOV). Perhaps a key variable is the quality of immersion and the accuracy of self-localization. Informal comments by users of immersion environments have yielded many descriptions of surprising errors of self-localization. As a start this research begins to explore how egocenters are determined from perceptual arrays.
Some work exists that may be helpful to understand the psychology of egocenters (Howard, 1982; Ono, 1981). Kubovy (1986) provides an insightful description of the use of techniques by Renaissance artists to manipulate the location of virtual egocenters, and thus manipulate attitudes and emotions. He and many art critics point out that Davincis "Last Supper" is painted without the usual Trompe Loeil foreshortening even though the painting is well above most observers eye level, some fifteen feet up in the air, in fact.
Figure 2. DaVinci`s
Last Supper. The
viewpoint is straight ahead
even though the picture is actually viewed from fifteen feet
below.
==============================
______________________________
==============================
Figure 3. A
demonstration of
our ability to hold
only one viewpoint
at a time. A Necker
Cube. Imagine looking
"down" at it from above,
then "up" at it from
below. The
appearance of the
cube will transform, although it will also continue to alternate.
==============================
______________________________
==============================
As the art critics point out, observers naturally think that they are looking at the picture straight on, (See Figure 2) and this might give them an elevated feeling, that could be interpreted metaphorically and spiritually. You can get something of this effect with much more mundane images: Try controlling your viewpoint by looking "up" at the Necker cube in Figure 3, or "down" at it. You will see reversals or alternations of these viewpoints continue, but for a brief while, the viewpoint should be in your control. It is a very clear demonstration that we can only hold one viewpoint at a time. A more complicated arrangement of Necker-like rectangles can be arranged into a rising staircase. Because we usually only examine stairs from above, its appearance does not reverse nearly as much as the Necker cube itself. Experience dramatically affects our perceptions. The point can be made even more clearly with the "house" outline (Fig. 4). It is almost impossible to see it from its alternate viewpoint, looking up from below, because houses or house models are generally seen from ground level or from above, never from below.
Franklin, Tversky, and Coon (1992) have conducted a long series of experiments examining the cues that control placement of point of view in spatial mental models derived from textual descriptions. One of their solid replicated findings across many experiments is that readers will adopt only one, unique viewpoint for every described scene, adopting vantage points that can oversee the entire scene whenever possible. When there is no unique point that can encompass an entire scene, only then will readers adopt two viewpoints. Clearly, they hold those viewpoints sequentially, and not at the same time.
The question of whether we hold two viewpoints simultaneously when we examine a picture in a museum: one viewpoint of the picture and the other in the space of the picture; seems much more difficult and tricky to answer. It is not clear what kind of experiment would discern the difference between holding two viewpoints in rapid succession, versus simultaneously. Reaction time differences between variables affected by the distance of the pictures, versus the distance of objects in the pictorial representation might be able to pick up whether a consistent pair of viewpoints (one within the space of the pictorial representation and the other in the space of the room holding the picture) could be held simultaneously or whether the viewer switched back and forth between the two. But reaction time experiments have not been done in this mode, and might not be able to detect such small switching times.
Introspection on the question is fraught with complex biases and speculation; but without empirical data, some speculation is needed to begin the development of an appropriate theoretical framework.
His Confusions: His first and primary confusion was simply that he was surprised to be moved in the fish world by an invisible force, since there was no person in the VR who could move him. Apparently he had become so immersed that he had "forgotten" about the real world where he "really" stood in a room full of people. But even when the memories of the real world flooded back in, he still could not see it, so it remained less than real. He was not "immersed" in the real world but in the fish world. His secondary confusion seemed to be that he thought that if he were moved in the real world he would not be moved in the VR environment. This was wrong and created a conflict of expectations when he realized he was being moved. Since he was in fact moved in the VR environment, there was some confusion about how to reconcile this with his surroundings in the fish world, since there was no one there to move him. It seems that he had separated the two worlds completely and assumed that they could not interact. It appears that a "dual reality" state existed rather than a single reality that was liberally interconnected. We say "dual" reality because he still believed in the real world, but it may in fact be better to call this a single reality state, because he truly was only immersed in the fish world. The real world had disappeared. Since the two realities did not interact, the movement in the fish world was assumed to be caused by a strange *action at a distance*, an invisible force. Somehow he recognized the violation of conservation of causality in the VR environment might be reconciled by realizing that the two worlds were in fact one, but he could not easily make this leap of faith and was confused by it. The difficulty was that although he "knew" he was in a lab room, he could not see it; all he could see was the fish world. His third confusion stemmed from his inability to share his internal state with the person who moved him. He was frustrated by not being able to share this state with his mover in the real world because they could not share views even though he was standing right beside him! Although he "knew" his mover was standing right beside him, he could not "see" him. Even though he could see the "space" right beside him filled with water and fish, there was clearly no person there. Yet, he knew he was there. Actually he knew he was there somewhere, but until he spoke he was not sure of where he stood, because he had already become disoriented relative to the "real" world. This left him confused about what he could do to communicate with his mover. His final confusion was that he thought by moving, he would lose track of where he was in the real world. In fact, he had already moved himself around a great deal, but it seemed that he was only moving in the fish world, not the real world. He had already lost track of where he was in the real world, so this added motion was not going to make any difference. But somehow he was under the misapprehension that he had not moved in the real world and when he took his helmet mounted display (HMD) off, he would find himself in the same place where he had been. Obviously this was a momentary confusion, and something illogical; but we point it out not to show how stupid he is, but to present something of an insight into the default workings of our presence analyzer when it is preoccupied with synthetic reality tasks. His final confusion arose because the hands around his waist moving him also pulled him back into that reality. This was the real source of confusion, because literally he could see himself swapping in all the memories of his real surroundings into the presence store,and quite visibly swapping out the vision of the fish and water in the VR. He began seeing in his minds eye the computer monitors and people standing around him, and of course that meant not seeing the fish and water. But at the same time he realized how useless his vision of all those real surroundings were. From their voices he could roughly tell where the people were but some were silent and so it was not possible to locate them accurately in space. Besides, his point of view (POV) or frame of reference (FOR) was still the old one when he first put the HMD on, and he realized he had no idea where he was relative to that initial POV. He had immersed himself almost immediately in the VR and lost all track of his motion. So there was nothing more to do than complain loudly that he didnt need any help and get on with chasing the fish. But of course, he now had to redefine his POV in the VR. That meant swapping out again all the old representations of the real world and scanning the VR again to find where he was and where the fish were. As he re-entered the fish world, he became much more proficient at catching the fish. In the beginning, he over reached and missed them by many inches and even feet. He adapted quickly to these errors and soon was able to reach with greater accuracy. However, it was not just the distance between him and objects that was distorted. He felt clearly in the "wrong place", especially when he turned steadily. Then he felt as if he was flying in a tight circle, rather than pivoting on the spot.
Some Speculations The momentary distraction and confusion must have lasted several seconds. Its theoretical importance for us is its demonstration in retrospect that he could not hold these two realities intact separately. There seems to be an all or none store of our immediate surroundings. In order to retrieve the memories of one reality after dealing with the other there appeared to be a real time delay and a difficult period of swapping memories from one memory into another. This was a relatively slow process, bandwidth limited and possibly serial, driven by salient cues in each environment. For instance, the hands on his hips led him to seek to identify the owner, who conveniently spoke. That led to memories of his position, his earlier point of view and the salient items -- e.g. monitor,doorway, cables , wall -- that came flooding in with the it. Could it be, that with practice we could become more adept at holding these two realities intact, synchronically? We think not. If anything, our experiences in VR are making it easier and easier to enter VR completely immersed, and easier to forget entirely our real world surroundings. On the other hand, it is easier to leave and return completely too. It seems that greater experience with VR results in greater serial intermingling of the two realities, moving quickly back and forth between them.
Another common analogy is immersion in work of reading or writing. A book can transport us to another environment, and it can be something of a shock to be disturbed by a voice or touch to recall us to the present.
==============================
______________________________
==============================
Nemire and Ellis (1991) added some evidence for this hypothesis by demonstrating that the enhanced structure of a pitched optic array does bias the perception of gravity- referenced eye level. This finding is a direct replication of Kubovy`s arguments about egocenters and Renaissance artists, although on a much smaller scale.
A simple experiment was conducted to begin to examine these kinds of errors.
An accurate model of an office was constructed using 3D Studio on a 386 PC with VGA graphics. The model contained walls, floor, and ceiling, three tables with computers and displays, two bookshelves with empty shelves, and two wastebaskets in the room. It was rendered with Phong shading at 320 by 200 pixels with 256 colors, and looked like a reasonable cartoon of the actual office holding the equipment (see Figure 7).
Animations of this model were then created from a panning camera located at the geometric center of the room rotating slowly 360 degrees around the room. Four animations were created with four different lenses for the scene: 17, 28, 50, and 135 mm. The geometric field of view for each of these lenses was: 140, 86, 48, and 18 degrees, respectively, where 140 degrees is similar to a fish-eye lens and 18 degrees is a telephoto view. The animations were viewed on a flat screen Zenith monitor whose screen dimensions were 190 by 245 mm. Subjects viewed the animations from two locations 800 and 300 mm from the screen. At those sites the screen subtended a FOV of 17 and 45 degrees, approximately. FOV is calculated by 2 times atan(.5 width of monitor/distance of eye point). Although their heads were not restrained mechanically, Ss held their positions reasonably well.
The geometric eye point of each of these lenses was 40, 140, 290, and 800 mm in the room. These projection points are independent of the viewer`s location. They are dependent on the actual size of the viewing screen. Thus the two viewing sites for the subjects corresponded approximately to the geometric eye points for the lenses of 135 and 50 mm.
Subjects were asked to view the animations binocularly, with corrected vision, and determine the location and path of the camera in each animation. The room was normally lit by recessed ceiling lights. They were told that the animation was of the very same room where they sat. They were shown a bitmap hardcopy of the room from an overhead view and asked to trace the path of the camera on it. ( See Figure 8.) They were not specifically told that the geometric "camera" was mathematically or "theoretically" stationary in the animations.
Fourteen students and colleagues with a variety of psychological training served as experimental subjects without pay. Ten of these Ss were asked at the end of the experiment to select for each animation the viewing station that produced the least camera motion.
Frame Effects. Ss repeatedly remarked that they appeared to be using the frame of the monitor as the frame of reference of their retinal field. When asked to describe what was happening, they said they appeared to be contracting their field of attention to the frame of the monitor, and then treating that as if it were their entire 180 degree visual field. If they were in fact doing this at a processing level, then the geometric eye point of the animation would not be determined by the size of the monitor, but by the virtual size of their expanded attentional field, roughly 180 degrees. The geometric eye point would then be expanded by a similar ratio, yielding the enlarged path of the camera with smaller FOVg. In fact, if one proposed that the zero station point is determined by the product of the animation`s geometric eye point (GEP ) times the ratio of 180/FOVg, one could calculate the predicted station points for zero camera motion.
This is very reminiscent of the proportional frame effects found throughout the perceptual literature (Rock, 1975). In these situations (see Figure 9) objects are shown in reduced vision situations, often monocularly in the dark with only the objects visible hanging in space. In these situations objects in smaller frames are judged to be proportionately larger than objects in larger frames. In fact, there is a powerful tendency to base size judgements on a compromise between the absolute or physical size of an object and its proportional size in the frame.
==============================
______________________________
==============================
The question naturally and frequently arises whether or not this effect is one created by years of television and movie watching. Is it produced by the many powerful movie techniques of pans, cuts, and dollies (Hochberg, 1986)? This seems unlikely since the perceptual frame effects precede TV by decades. It seems much more likely that our ability to transform the visual image in these metaphoric ways relies on our awesome powers of image manipulation, so well described by Kosslyn (1991). His evidence strongly suggests that we can zoom and rotate or otherwise manipulate our images very quickly.
Size constancy effects may in fact be related to the egocenter effects found in this paper. A brief review of the literature (Hochberg, 1978; Yonas and Hagen, 1973) indicates no general awareness of the possible effects of FOV size or FOVg size on the perception of distance to objects or object size constancy. The suggestion made above, that frame effects on size judgments may be mediated by FOV effects and the eyefield metaphor appears to have no historic precedent in the literature. This appears to be a promising avenue of research. In fact there is very little research on the nature of virtual space as perceived from geometrically created views of everyday scenes.
Clearly much work remains to be done if we wish to specify exactly how people interpret constructed geometric displays to select their egocentric viewing spot. Yet this work is very necessary if we wish to be able to create three- dimensional models that have the power to generate a truly satisfying and natural immersion experience.
For psychological theory, this research opens the possibility of dealing quantitatively with very abstract constructs, like virtual egocenters, in ways that were either impossible or very difficult without the new VR technologies. Clearly parametric studies need to be carried out in detail to create a nomograph of functions relating egocenter to FOVg and viewing station points. This pilot work suggests that even very close viewing station points such as those with head mounted displays (HMDs) are not immune to illusions caused by FOV that are smaller than 180 degrees. Their possible implication in more severe phenomena like simulator sickness, or less severe discomfort and dislike of HMDs, is only one further direction that needs exploration. It is clear, for instance, that these sorts of egocenter illusions adapt out very quickly in a VR environment. However, after adaptation is more or less complete, are there still physiological conflicts that can be detected in response to the conflicting cues of linear perspective and reduced FOV? Are there aftereffects that return to the real visual world?
Other, broader theoretical issues that need exploration are higher order cognitive implications of these new relations between multiple realities. When we view the animation apparently rotating on the monitor, somehow we build up a model of the room. That model is also somehow projected into the same space as the real room that we occupy. While viewing the animation, we have both an egocenter in real space, and a virtual egocenter in the space of the animation. It appears from these experiments that those egocenters interact with each other so that we feel some conflict as we rotate and move in one and remain stationary in the other. What are the long term effects of this conflict? What are the memory implications for conflicts between one reality and another? What are the physiological processing correlates of immersion? These are only some of the interesting psychological questions that need a firm base of experimental data to rest the initial creation of exploratory theoretical frameworks.
Beer, J. M. A (1993) Perceiving scene layout through an aperture during visually simulated self-motion. Journal of Experimental Psychology: Human Perception and Performance, In Press.
Cutting, J. E. (1991) On the efficacy of cinema, or what the visual system did not evolve to do. In Ellis, S. R. (Ed.), Pictorial Communication in Virtual and Real Environments. London: Taylor and Francis , 486-497.
Ellis, S. R. (Ed.), (1991) Pictorial Communication in Virtual and Real Environments. London: Taylor and Francis.
Franklin, N., Tversky, B., and Coon, V. (1992) Switching points of view in spatial mental models. Memory & Cognition, 20(5), 507 - 518.
Furness, T. (1992) Personal communication.
Hochberg, J. E. (1978) Perception. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Hochberg, J. (1986) Representation of motion and space in video and cinematic displays. In K. J. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 1, 22:1 - 22:64). New York: Wiley.
Howard, I. P. (1982) Human Visual Orientation. New York: Wiley.
Howlett, E. M. (1990) Wide angle orthostereo. In Merritt, J. O. and Fisher, S. S. (Eds.) Stereoscopic displays and Applications. Bellingham, WA: The International Society for Optical Engineering.
Intraub, H. and Richardson, M. (1989) Wide-angle memories of close-up scenes. Journal of Experimental Psychology: Learning, Memory and Cognition, 15(2), 179-187.
Jex, H. R. (1991) Some criteria for teleoperators and virtual environments from experiences with vehicle/operator simulation. In Durlach, N. I., Sheridan, T. B., and Ellis, S. R. Human Machine Interfaces for Teleoperators and Virtual Environments. Moffett Field, CA: NASA Conference Publication 10071.
Klein, S. and Dultz, W. (1992) Monocular depth cues in conflict with the stereoscopic parallax on the television screen. In J. O. Merritt and S. S. Fisher (Eds.) Stereoscopic Displays and Applications III, 142 - 145. Bellingham: SPIE-The International Society for Optical Engineering.
Kosslyn, S. M (1991) A cognitive neuroscience of visual cognition: Further developments. In R. H. Logie and M. Denis (Eds.) Mental Images in Human Cognition, 351 - 381. New York, North-Holland.
Kubovy, M. (1986) The psychology of perspective and Renaissance art. Cambridge: Cambridge University Press.
McGreevy, M. W. and Ellis, S. R. (1986) The effect of perspective geometry on judged direction in spatial information instruments. Human Factors, 28, 439 - 456.
Neisser, U. A sense of where you are: functions of the spatial module. In P. Ellen and C. Thinus- Blanc (Eds.), Cognitive Processes and spatial orientation in animal and man. Volume II. Neurophysiology and developmental aspects. 293-310. Boston: Martinus Nijhoff.
Nemire, and Ellis, S. R. (1991) Optic bias of perceived eye level depends on structure of the pitched optic array. Presented at the Psychonomic Society, San Francisco, CA.
Ono, H. (1981) On Wells (1792) law of visual direction. Perception and Psychophysics, 30, 403-406.
Piantaneda, T., Boma, D., and Gille, J. (1993) Human perceptual issues and virtual reality. Virtual Reality Systems, 1(1), 43 - 52.
Rock, I. (1975) An introduction to perception. New York: MacMillan.
Welch, R. B. Adaptation of space perception. In K. J. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 1, 24:1 - 24:45). New York: Wiley.
Yonas, A. and Hagen, M. (1973) Effects of static and kinetic depth information on the perception of size in children and adults. Journal of Experimental Child Psychology, 15, 254-265. FOOTNOTES******************************** {1}THE OPINIONS IN THIS PAPER DO NOT NECESSARILY IMPLY OR EXPRESS THE VIEW OF THE U.S. ARMY RESEARCH INSTITUTE (USARI). THIS RESEARCH WAS FUNDED BY THE USARI BASIC RESEARCH OFFICE. We thank Don King, Sandy Ressler, Marc Sebrechts and Bob Seidel for many stimulating discussions.