Interacting with a smart environment involves agreeing on what to do when, based on a joint understanding of where things and people are or where they should be. Face-to-face interaction between humans, or between humans and robots, implies clearly identifiable perspectives on the environment that can be used to establish such a joint understanding. A smart environment, in contrast, is ubiquitous and thus perspective-independent. This paper reviews the implications of this situation in terms of the challenges for establishing joint spatial reference between humans and smart systems, and presents a somewhat unconventional solution as an opportunity.