From VR Systems Fall, 1993, 1(2), pp 70-82

Exploring Immersion in Virtual Space

Joseph Psotka, USARI

Sharon A. Davison, Catholic U.

and

Sonya A. Lewis, Howard U.

Joseph Psotka, Ph.D

. U. S. Army Research Institute

ATTN: PERI-IIC

5001 Eisenhower Avenue

Alexandria, VA 22333-5600

(703)274-5540/5545/5569

Psotka@alexandria-emh2.army.mil or psotka@26.1.0.50

FAX: 274-5461

ABSTRACT

Exploring the characteristics of virtual space is a psychologists dream. The problems in understanding "immersion" or "presence" are fundamentally psychological and the applied issues depend on good cognitive design and engineering. The accurate location of ones virtual egocenter in a geometric space has really not been possible before the advent of the suite of new technologies we now call Virtual Reality (VR). Yet, understanding the factors that affect the psychological situation of ones sense of self is of critical importance for immersion technologies. Exploring virtual space with a tracked helmet mounted display usually leads to headaches and many missteps. Explorers may have an error-prone time rounding corners or picking up objects because they cannot accurately locate those objects or themselves. The implications of an experiment is described that has taken a unique step forward to increasing our understanding of virtual space. This experiment was conducted to investigate the role of field of view (FOV) and observer station points in the perception of the location of ones egocenter (the personal viewpoint) in virtual space. Fourteen subjects viewed an animated 3D model, of the room in which they sat, binocularly, from different Eye Station Points (ESP). They saw four models of the room designed with four geometric field of view (FOVg) conditions of 18, 48, 86, and 140 degrees. They drew the apparent paths of the camera generating the images in the room on a bitmap of the room as seen from infinity above. Large differences in the paths of the camera were seen as a function of both FOVg and ESP. The results fit well with predictions from an equation that takes the ratio of human eyefield horizontal FOV (roughly 180 degrees) to FOVg times the Geometric Eye Point (GEP ) of the image:

Zero Station Point = (180/FOVg)*GEP

The most striking phenomenon in these results occurred with the close up images of 18 degrees. When the observers saw the model of the TV screen panning past on the screen they were viewing with this lens, the geometry of the model should have made them appear to see it at the ESP where they sat. Instead, their virtual egocenter was a much shorter distance from the virtual monitor, even though the absolute size and FOV of the two monitors (virtual and real) were roughly the same. The implications of this are discussed in terms of an "eyefield constancy" and rivalrous cues. Rivalrous cues to the accurate location of ones egocenter may be one factor involved in simulator sickness. Implications of the findings are used to understand Renaissance techniques and to suggest extensions to cyberspace technologies.

INTRODUCTION

In a car youre always in a compartment, and because youre used to it you dont realize that through the car window everything you see is just more TV. Youre a passive observer and it is all moving by you boringly in a frame. On a cycle the frame is gone. Youre completely in contact with it all. Youre in the scene, not just watching it anymore , and the sense of presence is overwhelming. ---

Robert Pirsig in Zen and the Art of Motorcycle Maintenance.

Ecologically, we have not been adequately prepared for pictures, let alone TV, video, and now Cyberspace. These technologies try to recreate reality for us, but inevitably they distort reality. In the doing, they create new wonders that stir us to invest our energies and explore their possibilities (Piantaneda, Boma, and Gille, 1993).

It is a wonder that pictures, TV and movies let us see ourselves immersed in their possibilities at all. Clearly, they are not as effective as live theater in the affordances they provide us to immerse ourselves and indulge in a "willing suspension of disbelief", but they are remarkably effective. A brief look at their limitations compared to theater, or even to a window on the world, makes it surprising that they are as effective as they are. A video has movement, and that generates important differences (such as kinetic depth effects and multiple views of the same object) that we will ignore for the moment. The primary, fundamental difference between a flat representation like a TV or picture and a three dimensional representation like a scene through an aperture such as a glass window or stage is that self-motion produces drastically different results. Looking through a picture, TV, or movie screen, head motion specifies what is actually there: a flat screen (Hochberg, 1986); whereas looking through a glass window onto the world outside, head motion specifies our real relationships with objects. The fact is, we hardly ever hold our heads still, so motion parallax should constantly be telling us that what we are looking at on TV or in a picture is fake. Yet, we ignore this evidence with ease, although its effects are there to be seen: in fact, various effective proposals have been offered to make pictures more realistic by manipulating binocular views to give the two eyes the same view ( e.g. Klein & Dultz, 1992).

============================== ______________________________ ==============================

Figure 1.

The ambiguity of objects represented in flat screens.

Since modern TV and movies are filmed with superb optics, the images have an unequivocal, single projection point. Given that we have two eyes, one of them must be in the wrong place, yet we easily ignore this with minor loss of fidelity. In fact, most pictures can be viewed from widely eccentric locations with no appreciable apparent distortions. Psychological theories abound trying to explain this, with varying degrees of success (e.g. Cutting, 1991). Yet, most of them ignore a salient and compelling difference. When you look out a window or other natural aperture, your sense of self remains rooted where you are, without any conflicts. But, look at a picture, and immediately you must adopt an alter - ego, a sense of location at a virtual point in space that really is only accidentally in your real space, but mainly exists in the virtual space of the picture or video. This new viewpoint on that space is a kind of immersion, and the new viewpoint can be said to be at the virtual egocenter.

Immersion: Partial and Total

Immersion might be defined as the degree of compatibility between the location of the sense of self in the real world and in the virtual world. If, when you examine a picture or representation, your sense of self or egocenter remains rooted to the same spot, your degree of immersion ought to be high. If you are forced to adopt a separate egocenter in the space of the representation, immersion should be reduced. Should immersion then reflect the distance in space that the egocenter moves? To some extent the answer might be yes, since for small distances the rivalry should hardly be noticed. Also for smaller distance, adaptation to the discrepancy ought also to take place (c.f. Rock, 1975; Welch, 1986) quickly to remove the rivalry.

However, immersion also depends on the compatibility of different body spaces within an individual. The visual space has to be coordinated with the proprioceptive space of the eyes, head, and hands. This is not always coordinated in virtual environments, nor is it easy to engineer, since the psychological cues for these spaces remain complex and poorly understood.

Yet, there is more to immersion than this. For example, one can become completely immersed by donning a display that takes you to a completely different place, transporting you thousands of miles in an instant. Surely this distance in itself should have little effect on the quality of immersion. So total immersion is somehow different from partial immersion in a local representation like a picture or a movie. The visual coordination of real and virtual spaces becomes less important than the coordination of virtual and proprioceptive spaces.

The new technologies of virtual reality offer experimenters an opportunity to explore many factors that affect the perception and cognition of virtual spaces in ways that were either not possible before, or that were so difficult to control that they were virtually impossible. One important new psychological effect of these technologies is a re- emphasis of research perspectives on internal, cognitive constructs such as egocenter and virtual space, rather than external, objective constructs such as distance, size, and slant (Beer, 1992). Within cyberspace, the accurate location of ones sense of self assumes an importance because it can be manipulated. You can be made to feel vividly that your visual center is not coordinated with your hands and head, as well as the wrong "distance" from objects. Perceived "distance" remains an important psychological variable, but perceived egocenter location assumes a new simplifying functionality for understanding effects within virtual space.

Immersion in Cyberspace

The accurate location of ones virtual egocenter in a geometric space is of critical importance for immersion. Furness (1992) and Howlett (1990) report that immersion is only experienced when the field of view (FOV) is greater than 60 degrees, or at least in the 60 to 90 degree FOV range. Why this should be so is not understood, nor are there theoretical frameworks for beginning to understand this phenomenon. The question is also important for dealing with simulation or motion sickness. Immersion environments are notorious for producing motion sickness, and an inaccurate location of virtual egocenters may be implicated in this noxious effect. Jex (1991) reports that simulator sickness is hardly ever felt with FOV less than 60 degrees (the complement of immersion FOV). Perhaps a key variable is the quality of immersion and the accuracy of self-localization. Informal comments by users of immersion environments have yielded many descriptions of surprising errors of self-localization. As a start this research begins to explore how egocenters are determined from perceptual arrays.

Some work exists that may be helpful to understand the psychology of egocenters (Howard, 1982; Ono, 1981). Kubovy (1986) provides an insightful description of the use of techniques by Renaissance artists to manipulate the location of virtual egocenters, and thus manipulate attitudes and emotions. He and many art critics point out that Davincis "Last Supper" is painted without the usual Trompe Loeil foreshortening even though the painting is well above most observers eye level, some fifteen feet up in the air, in fact.

Figure 2. DaVinci`s Last Supper. The viewpoint is straight ahead even though the picture is actually viewed from fifteen feet below.

============================== ______________________________ ============================== Figure 3. A demonstration of our ability to hold only one viewpoint at a time. A Necker Cube. Imagine looking "down" at it from above, then "up" at it from below. The appearance of the cube will transform, although it will also continue to alternate. ============================== ______________________________ ==============================

Figure 4. A demonstration of the ecological validity of our perceptions, and the role of experience in determining point of view.

A Necker Cube with a "roof". Imagine looking "down" at it from above, then "up" at it from below. It is much more difficult to see it from below, since we generally see houses or house models from above or from straight on; never from below ground.

As the art critics point out, observers naturally think that they are looking at the picture straight on, (See Figure 2) and this might give them an elevated feeling, that could be interpreted metaphorically and spiritually. You can get something of this effect with much more mundane images: Try controlling your viewpoint by looking "up" at the Necker cube in Figure 3, or "down" at it. You will see reversals or alternations of these viewpoints continue, but for a brief while, the viewpoint should be in your control. It is a very clear demonstration that we can only hold one viewpoint at a time. A more complicated arrangement of Necker-like rectangles can be arranged into a rising staircase. Because we usually only examine stairs from above, its appearance does not reverse nearly as much as the Necker cube itself. Experience dramatically affects our perceptions. The point can be made even more clearly with the "house" outline (Fig. 4). It is almost impossible to see it from its alternate viewpoint, looking up from below, because houses or house models are generally seen from ground level or from above, never from below.

Viewpoint Constancy:

Kubovy(1986) provides an excellent example of something I would like to call viewpoint constancy. In the image of a horse and rider on a pedestal painted by Uccello (See Figure 5) the artist has ingeniously combined at least two different viewpoints. The viewpoint of the pedestal is clearly seen from below, while the horse and rider are seen from the front, about saddle height. Most observers do not notice this duality and generally judge the viewpoint to be somewhere in between. Similar effects can be created by distorting the images of cad models so that there are multiple viewpoints. In general, only one viewpoint is seen.

Figure 5. Uccello`s portrait of Sir John Hawkwood.

The viewpoint is one even though the picture actually has at least two distinct projection points of view.

Franklin, Tversky, and Coon (1992) have conducted a long series of experiments examining the cues that control placement of point of view in spatial mental models derived from textual descriptions. One of their solid replicated findings across many experiments is that readers will adopt only one, unique viewpoint for every described scene, adopting vantage points that can oversee the entire scene whenever possible. When there is no unique point that can encompass an entire scene, only then will readers adopt two viewpoints. Clearly, they hold those viewpoints sequentially, and not at the same time.

The question of whether we hold two viewpoints simultaneously when we examine a picture in a museum: one viewpoint of the picture and the other in the space of the picture; seems much more difficult and tricky to answer. It is not clear what kind of experiment would discern the difference between holding two viewpoints in rapid succession, versus simultaneously. Reaction time differences between variables affected by the distance of the pictures, versus the distance of objects in the pictorial representation might be able to pick up whether a consistent pair of viewpoints (one within the space of the pictorial representation and the other in the space of the room holding the picture) could be held simultaneously or whether the viewer switched back and forth between the two. But reaction time experiments have not been done in this mode, and might not be able to detect such small switching times.

Introspection on the question is fraught with complex biases and speculation; but without empirical data, some speculation is needed to begin the development of an appropriate theoretical framework.

A Story

: Recently, while one of us (JP) was chasing fish in a fish world synthetic environment (and just beginning to catch them) someone in the "real" world pushed him gently out of the way of some (to him) invisible obstacle in the real world. This set off a state of confusion that was particularly difficult for him to understand and explain. The explanation seems to demonstrate the fundamental separation among different immersed worlds, and our inability to hold more than one point of view at a time.

His Confusions: His first and primary confusion was simply that he was surprised to be moved in the fish world by an invisible force, since there was no person in the VR who could move him. Apparently he had become so immersed that he had "forgotten" about the real world where he "really" stood in a room full of people. But even when the memories of the real world flooded back in, he still could not see it, so it remained less than real. He was not "immersed" in the real world but in the fish world. His secondary confusion seemed to be that he thought that if he were moved in the real world he would not be moved in the VR environment. This was wrong and created a conflict of expectations when he realized he was being moved. Since he was in fact moved in the VR environment, there was some confusion about how to reconcile this with his surroundings in the fish world, since there was no one there to move him. It seems that he had separated the two worlds completely and assumed that they could not interact. It appears that a "dual reality" state existed rather than a single reality that was liberally interconnected. We say "dual" reality because he still believed in the real world, but it may in fact be better to call this a single reality state, because he truly was only immersed in the fish world. The real world had disappeared. Since the two realities did not interact, the movement in the fish world was assumed to be caused by a strange *action at a distance*, an invisible force. Somehow he recognized the violation of conservation of causality in the VR environment might be reconciled by realizing that the two worlds were in fact one, but he could not easily make this leap of faith and was confused by it. The difficulty was that although he "knew" he was in a lab room, he could not see it; all he could see was the fish world. His third confusion stemmed from his inability to share his internal state with the person who moved him. He was frustrated by not being able to share this state with his mover in the real world because they could not share views even though he was standing right beside him! Although he "knew" his mover was standing right beside him, he could not "see" him. Even though he could see the "space" right beside him filled with water and fish, there was clearly no person there. Yet, he knew he was there. Actually he knew he was there somewhere, but until he spoke he was not sure of where he stood, because he had already become disoriented relative to the "real" world. This left him confused about what he could do to communicate with his mover. His final confusion was that he thought by moving, he would lose track of where he was in the real world. In fact, he had already moved himself around a great deal, but it seemed that he was only moving in the fish world, not the real world. He had already lost track of where he was in the real world, so this added motion was not going to make any difference. But somehow he was under the misapprehension that he had not moved in the real world and when he took his helmet mounted display (HMD) off, he would find himself in the same place where he had been. Obviously this was a momentary confusion, and something illogical; but we point it out not to show how stupid he is, but to present something of an insight into the default workings of our presence analyzer when it is preoccupied with synthetic reality tasks. His final confusion arose because the hands around his waist moving him also pulled him back into that reality. This was the real source of confusion, because literally he could see himself swapping in all the memories of his real surroundings into the presence store,and quite visibly swapping out the vision of the fish and water in the VR. He began seeing in his minds eye the computer monitors and people standing around him, and of course that meant not seeing the fish and water. But at the same time he realized how useless his vision of all those real surroundings were. From their voices he could roughly tell where the people were but some were silent and so it was not possible to locate them accurately in space. Besides, his point of view (POV) or frame of reference (FOR) was still the old one when he first put the HMD on, and he realized he had no idea where he was relative to that initial POV. He had immersed himself almost immediately in the VR and lost all track of his motion. So there was nothing more to do than complain loudly that he didnt need any help and get on with chasing the fish. But of course, he now had to redefine his POV in the VR. That meant swapping out again all the old representations of the real world and scanning the VR again to find where he was and where the fish were. As he re-entered the fish world, he became much more proficient at catching the fish. In the beginning, he over reached and missed them by many inches and even feet. He adapted quickly to these errors and soon was able to reach with greater accuracy. However, it was not just the distance between him and objects that was distorted. He felt clearly in the "wrong place", especially when he turned steadily. Then he felt as if he was flying in a tight circle, rather than pivoting on the spot.

Some Speculations The momentary distraction and confusion must have lasted several seconds. Its theoretical importance for us is its demonstration in retrospect that he could not hold these two realities intact separately. There seems to be an all or none store of our immediate surroundings. In order to retrieve the memories of one reality after dealing with the other there appeared to be a real time delay and a difficult period of swapping memories from one memory into another. This was a relatively slow process, bandwidth limited and possibly serial, driven by salient cues in each environment. For instance, the hands on his hips led him to seek to identify the owner, who conveniently spoke. That led to memories of his position, his earlier point of view and the salient items -- e.g. monitor,doorway, cables , wall -- that came flooding in with the it. Could it be, that with practice we could become more adept at holding these two realities intact, synchronically? We think not. If anything, our experiences in VR are making it easier and easier to enter VR completely immersed, and easier to forget entirely our real world surroundings. On the other hand, it is easier to leave and return completely too. It seems that greater experience with VR results in greater serial intermingling of the two realities, moving quickly back and forth between them.

Synthetic Environments and Pictures on a Wall

In many respects these speculations about two realities are analogous to many other experiences we have with split attention. Many of us have experienced driving routinely home with no recollection of the drive because we were deep in thought or conversation with others. The most straightforward analogy is with pictures on a wall. When we look at a picture, it is often hard to see the picture as a flat object, a piece of paper hanging on the wall. Instead its contents are seen as objects with form, depth, and real space that have a virtual reality of their own. It is particularly difficult to see the picture at the same time as a flat object and as a representation in depth. We see either one or the other, and it is much easier to see the representation in depth. Look again at the Necker cube house in Figure 4 and see how much easier it is to see it as a representation in depth rather than as flat lines, rectangles, and triangles on paper. Notice how much less attention you pay to the surrounding text when the drawing is seen in depth rather than as flat lines. Constructing the virtual space of the picture, when that does not coincide with the geometry of the surrounding space, is an effortful enterprise that seems to diminish our attention to the surrounding wall space. In a Trompe L`oeil representation, where the geometries of the real and virtual spaces coincide, it is even more difficult to see the representation as a flat object, and the virtual space and real space truly merge into one.

Another common analogy is immersion in work of reading or writing. A book can transport us to another environment, and it can be something of a shock to be disturbed by a voice or touch to recall us to the present.

A Presence Store:

The Memory of Presence The intriguing possibility this switching among realities suggests is that there is a special memory mechanism or store that deals specifically with our surrounding "presence". It is filled up as we scan and touch and listen to our surroundings. When we enter another VR it gets overlaid with the sensory experiences of that VR. If the VR is brand new, the overlay of the "Presence Store" is perhaps piecemeal, overlaying parts of the earlier reality as it is discovered. If the new VR is one that has been experienced before, it appears to be pulled in from long term memory in a default mode. There is an intriguing relationship and similarity between this "Presence Store" and what psychologists have traditionally called short term memory. Could the two be one? It seems to me that psychologists will have a great deal of fun over the next few years teasing the similarities or differences among these possibilities apart.

Errors of Localization:

Movement in virtual environments often displays many errors that gradually adapt out. In the beginning users overreach and miss objects by many inches and even feet. On reflection, there seem to be many possible sources for these errors: inaccurate settings for interocular distance; lack of convergence and accommodation cues; lack of good texture gradients for depth; an improperly designed model; and many more. Althoughthese errors seem easy to correct and unlikely in the many situations., they are easily overlooked and remain a factor in many VR applications. The remaining issue that is not easy to change with most displays, and that seemed to need further investigation was the effect of reduced field of view, 90 degrees versus a normal eyefield of about 180 degrees FOV.

Errors of Immersion:

A series of experiments by Ellis (McGreevy and Ellis, 1986; Tharp and Ellis, 1990; Nemire and Ellis, 1991) probably indirectly reflects on virtual egocenters. Ellis and McGreevy (1986) discovered a systematic error in pointing the direction of objects in a virtual display. The error was a function of the geometric FOV of the display. They developed a complex model that accurately predicted these errors on the basis of memory for the size and shape of objects and geometric "distortion" based on linear projections. Tharp and Ellis (1990) provided an explanation based on errors of estimation of the pitch and yaw of the viewing direction used to produce the perspective projection. They argued that people have acquired, through experience of observing the world, a way of determining the effects of viewpoint rotations and perspective transformations. People use this experience to build a "table" of perspective transformations relating target azimuth to projected angle. They then use the wrong table. This is a little like saying that people project themselves at the wrong point, and so it may be possible to find an effect on the location of virtual egocenters in these conditions. The regular shape of the error (see Figure 6) could be produced by an altered location of the virtual egocenter in the display such that, in their experiment, observers felt themselves closer to the objects than the geometry of the scene should have made them feel they were. In other experiments to follow these up, Ellis and his associates found the direction of these errors systematically reversed. The cues that produce these effects are unknown but may have something to do with the relationship of the actual FOV of the display and the computed geometric FOV of the display image (FOVg). When the ratio of FOV/FOVg is greater than 1, the observers may have located the virtual egocenter too near to the objects; and when the ratio of FOV/FOVg is less than 1, the observers may have located the virtual egocenter too far from the objects. It is not clear from their data which case held, but these relationships appear to be appropriate for their results.

============================== ______________________________ ==============================

Figure 6.

A schematic diagram showing the pattern of errors found by Ellis.

Nemire and Ellis (1991) added some evidence for this hypothesis by demonstrating that the enhanced structure of a pitched optic array does bias the perception of gravity- referenced eye level. This finding is a direct replication of Kubovy`s arguments about egocenters and Renaissance artists, although on a much smaller scale.

A simple experiment was conducted to begin to examine these kinds of errors.

An accurate model of an office was constructed using 3D Studio on a 386 PC with VGA graphics. The model contained walls, floor, and ceiling, three tables with computers and displays, two bookshelves with empty shelves, and two wastebaskets in the room. It was rendered with Phong shading at 320 by 200 pixels with 256 colors, and looked like a reasonable cartoon of the actual office holding the equipment (see Figure 7).

Figure 7.

A black and white photocopy of the color screenprint of a 135 mm lens view of the experimental room.

Animations of this model were then created from a panning camera located at the geometric center of the room rotating slowly 360 degrees around the room. Four animations were created with four different lenses for the scene: 17, 28, 50, and 135 mm. The geometric field of view for each of these lenses was: 140, 86, 48, and 18 degrees, respectively, where 140 degrees is similar to a fish-eye lens and 18 degrees is a telephoto view. The animations were viewed on a flat screen Zenith monitor whose screen dimensions were 190 by 245 mm. Subjects viewed the animations from two locations 800 and 300 mm from the screen. At those sites the screen subtended a FOV of 17 and 45 degrees, approximately. FOV is calculated by 2 times atan(.5 width of monitor/distance of eye point). Although their heads were not restrained mechanically, Ss held their positions reasonably well.

The geometric eye point of each of these lenses was 40, 140, 290, and 800 mm in the room. These projection points are independent of the viewer`s location. They are dependent on the actual size of the viewing screen. Thus the two viewing sites for the subjects corresponded approximately to the geometric eye points for the lenses of 135 and 50 mm.

PROCEDURE

Subjects were asked to view the animations binocularly, with corrected vision, and determine the location and path of the camera in each animation. The room was normally lit by recessed ceiling lights. They were told that the animation was of the very same room where they sat. They were shown a bitmap hardcopy of the room from an overhead view and asked to trace the path of the camera on it. ( See Figure 8.) They were not specifically told that the geometric "camera" was mathematically or "theoretically" stationary in the animations.

Figure 8

An overhead view of the experimental room. Subjects traced the camera path as shown with representative traces derived from the four views of the experiment.

Fourteen students and colleagues with a variety of psychological training served as experimental subjects without pay. Ten of these Ss were asked at the end of the experiment to select for each animation the viewing station that produced the least camera motion.

RESULTS

In general, the subjects had no difficulty describing the apparent paths of the camera as they saw it as oval paths of varying eccentricity centered on the geometric center of the room. The diameters of the ovals varied with the focal length of the lens.

Frame Effects. Ss repeatedly remarked that they appeared to be using the frame of the monitor as the frame of reference of their retinal field. When asked to describe what was happening, they said they appeared to be contracting their field of attention to the frame of the monitor, and then treating that as if it were their entire 180 degree visual field. If they were in fact doing this at a processing level, then the geometric eye point of the animation would not be determined by the size of the monitor, but by the virtual size of their expanded attentional field, roughly 180 degrees. The geometric eye point would then be expanded by a similar ratio, yielding the enlarged path of the camera with smaller FOVg. In fact, if one proposed that the zero station point is determined by the product of the animation`s geometric eye point (GEP ) times the ratio of 180/FOVg, one could calculate the predicted station points for zero camera motion.

Zero Station Point = (180/FOVg)*GEP

For this experiment these predictions are: 8000, 1100, 287, and 50 mm. quite close to the empirical values of : 9112, 1092, 291, and 53 mm. This seems to indicate that when the FOVg is 180 degrees, the egocenter is located correctly, but when the FOVg is less than 180 degrees, the egocenter is displaced proportionately.

This is very reminiscent of the proportional frame effects found throughout the perceptual literature (Rock, 1975). In these situations (see Figure 9) objects are shown in reduced vision situations, often monocularly in the dark with only the objects visible hanging in space. In these situations objects in smaller frames are judged to be proportionately larger than objects in larger frames. In fact, there is a powerful tendency to base size judgements on a compromise between the absolute or physical size of an object and its proportional size in the frame.

============================== ______________________________ ==============================

Figure 9.

The relative size of the frame has an effect on both its apparent distance and the apparent size of the objects in the frame. Which line, B or D, looks to be the same size as A? If you chose B, you may have based it on relative size (line vs. frame). If you chose D, you may have based it on physical size. A is 7 units (the size of the smaller house`s vertical wall)and D is 6 units on the paper. Under reduced vision conditions, most people choose C, something in between (Rock, 1975, p. 59).

Frames as Metaphors for Eyefields:

The frame effects in this experiment are quite different from those reported by earlier experimenters. The most dramatic difference occurred with the model at a FOV of 18 degrees squeezed onto the screen viewed as a 17 FOV. When the camera rotated to a view of the same monitor that subjects were viewing the experiment on, the geometry of the scene ought to have made them see both the real and virtual monitors at roughly the same distance. Instead, the real monitor was seen in its proper place, but the virtual monitor was seen close up. The frame effects reported by Rock (1975) and others make objects in smaller frames appear further away, not closer. So, the frame effects in this experiment appear to be quite different. The evidence suggests that observers in these experiments really are using the frames and the limited FOV as metaphors for their natural FOV of 160 by 180 degrees. It suggests that these effects might be made even more dramatic if the frames really were delimited by edges that looked like noses, cheeks, and eyebrows. The experiment has not been done but seems modestly worthwhile. The concreteness and vividness of this spatial metaphor is striking in the clarity and unambiguity of the perception it produces. There are few other examples of a metaphor that has such strong and unequivocal perceptual effects.

The question naturally and frequently arises whether or not this effect is one created by years of television and movie watching. Is it produced by the many powerful movie techniques of pans, cuts, and dollies (Hochberg, 1986)? This seems unlikely since the perceptual frame effects precede TV by decades. It seems much more likely that our ability to transform the visual image in these metaphoric ways relies on our awesome powers of image manipulation, so well described by Kosslyn (1991). His evidence strongly suggests that we can zoom and rotate or otherwise manipulate our images very quickly.

Effect of Size.

The familiar size of objects might be affecting this illusion of virtual egocenter placement. Objects like chairs and tables and monitors have roughly expected sizes or degrees of visual angle from every distance. Egocenter location could be computed from that information as well as the perspective lines of the image or the kinetic depth effects of the turning motion. It is not easy to specify exactly what the role of size is. At the smallest FOVg in this experiment ( 18 degrees) objects had roughly a 1:1 size ratio with objects in the real world; yet the impression was not one of being the real world distance from them, but of being very close to them, 785 mm or 98% of the distance closer, in fact. Still it is very easy to redo this experiment with objects that have no familiar size; and even to remove linear perspective cues by using balloons in a spherical room (See Figure 9). Preliminary explorations with these figures indicates some differences in the perception of relative motion, namely it is very difficult to perceive these figures as stationary, but not apparently in the main findings of this paper.

Size constancy effects may in fact be related to the egocenter effects found in this paper. A brief review of the literature (Hochberg, 1978; Yonas and Hagen, 1973) indicates no general awareness of the possible effects of FOV size or FOVg size on the perception of distance to objects or object size constancy. The suggestion made above, that frame effects on size judgments may be mediated by FOV effects and the eyefield metaphor appears to have no historic precedent in the literature. This appears to be a promising avenue of research. In fact there is very little research on the nature of virtual space as perceived from geometrically created views of everyday scenes.

Viewpoint Constancy.

A striking result of these experiments is that although the visual scenes were often very distorted, especially in the wide field of view (140 FOVg display ) conditions, where objects appeared to contract and dilate in odd ways as the camera rotated, observers always continued to maintain one unique viewpoint on the scene. This suggests that viewpoint constancy; the maintenance of one unique point of view on every scene; is a more powerful constancy than shape or size or any of the other constancies. It suggests that viewpoint constancy may be a primary and fundamental construct for the interpretation of all visual stimuli.

Simulation Sickness.

Although no one became nauseous, everyone reported some degree of discomfort with viewing the displays larger than 60 degrees FOVg, especially the largest. Several people asked to look away from the 140 FOVg display to reorient themselves during the experiment. This effect may be related to the nonlinear compression of 140 degrees into an 18 or 45 degree FOV. This distortion effect needs to be investigated separately to determine how sensitive viewers are to FOV and compression-based distortions.

Figure 9.

Top view of a room full of round balls suspended in space.

CONCLUSION

Clearly much work remains to be done if we wish to specify exactly how people interpret constructed geometric displays to select their egocentric viewing spot. Yet this work is very necessary if we wish to be able to create three- dimensional models that have the power to generate a truly satisfying and natural immersion experience.

For psychological theory, this research opens the possibility of dealing quantitatively with very abstract constructs, like virtual egocenters, in ways that were either impossible or very difficult without the new VR technologies. Clearly parametric studies need to be carried out in detail to create a nomograph of functions relating egocenter to FOVg and viewing station points. This pilot work suggests that even very close viewing station points such as those with head mounted displays (HMDs) are not immune to illusions caused by FOV that are smaller than 180 degrees. Their possible implication in more severe phenomena like simulator sickness, or less severe discomfort and dislike of HMDs, is only one further direction that needs exploration. It is clear, for instance, that these sorts of egocenter illusions adapt out very quickly in a VR environment. However, after adaptation is more or less complete, are there still physiological conflicts that can be detected in response to the conflicting cues of linear perspective and reduced FOV? Are there aftereffects that return to the real visual world?

Figure 10. Michelangelo`s Last Judgement. Note the absence of a dominating point of view.

Viewpoint Manipulation:

A clear implication of this research is that it is possible to manipulate the apparent location of one`s virtual egocenter in many complex and mutually interacting ways. We do not understand how this is to be done, nor can we yet contrive good purposes for carrying it out. The artists of the Renaissance and Chinese landscape art offer some interesting clues about possibilities. Take for instance Michelangelo`s Last Judgement (See Figure 10.) In this stunning representation there is a grand conceptual unity of action encompassing huge areas of space with masses of saved humanity struggling upwards on the left, and the damned ricocheting down on the right. In the center there is a massive stillness. What is dramatically unique about this conception is that there is no unique viewpoint portrayed by the artist. Every creature is at the center of the visual field, and yet the artist has subtly contrived to unify the whole without the dominating constraint of a unique point of view. How has he done this? In part, he has created local areas of fixed viewpoint, so that the central figure of Christ creates peripheral views of the surrounding figures. Without this device the figures would all be flat, as if seen from infinity. Yet, he must have distorted the projection so that the surrounding figures are not as peripheral as they would have been with a true isomorphic projection. He may even have got the inspiration for this from the dozens of larger panels that he created throughout the chapel. There each one has a central viewpoint, yet he must have merged them all, at least in his own conception, if not physically throughout the chapel. In any event, the Last Judgement itself is unified both by projection and by thematic constraints that affect projection in complex ways. It is not clear how a computer model of this image ought to be carried out. Certainly, no modelling system can combine the multiple viewpoints and integrate them pleasingly the way Michelangelo intuitively did. Yet, it could serve us well to do that. Instead of the strictly projective view of dataspaces as cities and rooms that we currently have (see Figure 11,) we might be able to lay out dataspaces giving equal emphasis to more distant and proximal areas, much as a Chinese landscape does. Clearly there are many potential uses for such a distortion of virtual space.

Figure 11.

A city as a dataspace, with a dominating point of view that forces distant objects into unrecognizable blobs.

Other, broader theoretical issues that need exploration are higher order cognitive implications of these new relations between multiple realities. When we view the animation apparently rotating on the monitor, somehow we build up a model of the room. That model is also somehow projected into the same space as the real room that we occupy. While viewing the animation, we have both an egocenter in real space, and a virtual egocenter in the space of the animation. It appears from these experiments that those egocenters interact with each other so that we feel some conflict as we rotate and move in one and remain stationary in the other. What are the long term effects of this conflict? What are the memory implications for conflicts between one reality and another? What are the physiological processing correlates of immersion? These are only some of the interesting psychological questions that need a firm base of experimental data to rest the initial creation of exploratory theoretical frameworks.

REFERENCES

Beer, J. M. A (1993) Perceiving scene layout through an aperture during visually simulated self-motion. Journal of Experimental Psychology: Human Perception and Performance, In Press.

Cutting, J. E. (1991) On the efficacy of cinema, or what the visual system did not evolve to do. In Ellis, S. R. (Ed.), Pictorial Communication in Virtual and Real Environments. London: Taylor and Francis , 486-497.

Ellis, S. R. (Ed.), (1991) Pictorial Communication in Virtual and Real Environments. London: Taylor and Francis.

Franklin, N., Tversky, B., and Coon, V. (1992) Switching points of view in spatial mental models. Memory & Cognition, 20(5), 507 - 518.

Furness, T. (1992) Personal communication.

Hochberg, J. E. (1978) Perception. Englewood Cliffs, NJ: Prentice-Hall, Inc.

Hochberg, J. (1986) Representation of motion and space in video and cinematic displays. In K. J. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 1, 22:1 - 22:64). New York: Wiley.

Howard, I. P. (1982) Human Visual Orientation. New York: Wiley.

Howlett, E. M. (1990) Wide angle orthostereo. In Merritt, J. O. and Fisher, S. S. (Eds.) Stereoscopic displays and Applications. Bellingham, WA: The International Society for Optical Engineering.

Intraub, H. and Richardson, M. (1989) Wide-angle memories of close-up scenes. Journal of Experimental Psychology: Learning, Memory and Cognition, 15(2), 179-187.

Jex, H. R. (1991) Some criteria for teleoperators and virtual environments from experiences with vehicle/operator simulation. In Durlach, N. I., Sheridan, T. B., and Ellis, S. R. Human Machine Interfaces for Teleoperators and Virtual Environments. Moffett Field, CA: NASA Conference Publication 10071.

Klein, S. and Dultz, W. (1992) Monocular depth cues in conflict with the stereoscopic parallax on the television screen. In J. O. Merritt and S. S. Fisher (Eds.) Stereoscopic Displays and Applications III, 142 - 145. Bellingham: SPIE-The International Society for Optical Engineering.

Kosslyn, S. M (1991) A cognitive neuroscience of visual cognition: Further developments. In R. H. Logie and M. Denis (Eds.) Mental Images in Human Cognition, 351 - 381. New York, North-Holland.

Kubovy, M. (1986) The psychology of perspective and Renaissance art. Cambridge: Cambridge University Press.

McGreevy, M. W. and Ellis, S. R. (1986) The effect of perspective geometry on judged direction in spatial information instruments. Human Factors, 28, 439 - 456.

Neisser, U. A sense of where you are: functions of the spatial module. In P. Ellen and C. Thinus- Blanc (Eds.), Cognitive Processes and spatial orientation in animal and man. Volume II. Neurophysiology and developmental aspects. 293-310. Boston: Martinus Nijhoff.

Nemire, and Ellis, S. R. (1991) Optic bias of perceived eye level depends on structure of the pitched optic array. Presented at the Psychonomic Society, San Francisco, CA.

Ono, H. (1981) On Wells (1792) law of visual direction. Perception and Psychophysics, 30, 403-406.

Piantaneda, T., Boma, D., and Gille, J. (1993) Human perceptual issues and virtual reality. Virtual Reality Systems, 1(1), 43 - 52.

Rock, I. (1975) An introduction to perception. New York: MacMillan.

Welch, R. B. Adaptation of space perception. In K. J. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 1, 24:1 - 24:45). New York: Wiley.

Yonas, A. and Hagen, M. (1973) Effects of static and kinetic depth information on the perception of size in children and adults. Journal of Experimental Child Psychology, 15, 254-265. FOOTNOTES******************************** {1}THE OPINIONS IN THIS PAPER DO NOT NECESSARILY IMPLY OR EXPRESS THE VIEW OF THE U.S. ARMY RESEARCH INSTITUTE (USARI). THIS RESEARCH WAS FUNDED BY THE USARI BASIC RESEARCH OFFICE. We thank Don King, Sandy Ressler, Marc Sebrechts and Bob Seidel for many stimulating discussions.