How Do Models Work?
(Chapter 5)
[Draft!]
Back Contents Comments Next

Though the word is itself is a noun, "model" describes a process which connects systems to one another through some sort of regular interface. As a process, we can usefully divide it into three processes organized around the interfaces between systems:

When a model changes or breaks down or when we try and evaluate a model's fit to its purpose(s), it is profitable to look at the model in terms of these components. Often, we will see an attempt to change one component which also ends up changing others.

Reference: What's What

Reference is how a model connects to whatever it is representing. In discussing reference, we need to be especially careful to avoid the map/territory confusion I have already mentioned. The historian and philosopher of science Thomas Kuhn, who monitored and mentored me in these areas when I was starting to think about them in graduate school, cautioned me whenever I spoke of "models of the world" since I almost always meant "models of other models". Because we cannot escape models, our models are almost always linked to other models, be they perceptual, literal, visual, or acoustic. Reference is as much an issue for how models refer to each other as for how models ultimately connect to the world.

The most pervasive account of reference in modern thought is called "model theory" and was introduced by mathematicians around the turn of the 20th century to talk about how mathematical theories related to one another and to the objects they discussed.

The 19th century had been an immensely productive time for mathematical discovery around the shrinking globe. The formalization of the calculus, boolean logic, differential equations, and the immensely productive co-fertilization of mathematics and physics were changing the way that the sciences approached the world. But the mathematicians at the time, most of whom had had some philsophical training, were concerned about the relationship between their theories and the "things" those theories were about. They were also troubled by the formulation, in the early 1800s, of "non-Euclidian geometries" which were mathematically consistent but didn't seem to correspond to a world they recognized. And finally, mathematicians may well have been struck by the consequence that formal mathematical foundations had had for the physical sciences and wanted to leverage some of that same power for their own discipline.

Mathematicians believed, thought, and acted as though things like numbers, functions, and geometric points (to name a few instances) actually existed and that their theories were statements about these things in the same way that statements like "The dog is at my feet" are statements about a physical world. So they introduced model theory to talk about how the formal statements in their proofs, demonstrations, and counterexamples corresponded to these abstract objects.

They began by introducing the ideas of "theory" and "structure" to begin describing this linkage of concrete theory to ideal objects. The theory was the set of statements or expressions which they made in their theorizing; the structure was the ideal thing about which they were making statements. The structure consisted of objects and kinds of relations and a truth function which would indicate whether a particular kind of relation held for, between, or among particular objects. The theory, on the other hand, consisted just of a set of expressions, possibly with inference rules for generating more expressions from the initial set. Finally, there was also a model function which tied the two together, mapping expressions in the theory into relations among instances and sets in the structure.

Given this account of mathematical models, "meaning" comes from translating expressions the theory into relations in the structure. The first and most important constraint on this translation is called systematicity.

Systematicity

Systematicity is the consistency of translations between theory and structure. For instance, suppose we say that the symbol "greater-than" in theory X represents the relation ">" in structure Y and that the symbol "five" in X represents the number 5 in Y while the symbol "twelve" in X represents the number 12 in Y. Systematicity then demands that the expression "twelve greater-than five" in the theory X represents the "fact" that "12>5" in the structure Y.

Another way to describe systematicity is that it is a requirement for consistency in translation. If we at some point translate "twelve greater-than five" into "12>5", we cannot subsequently translate "8 greater-than 3" into "8<3". Consistency means that once we've decided on a translation of one expression or subexpression, we have ruled out many other translations of other expressions. However, the advantage of systematicity is that we can apply rules which we discover or invent on simple expressions to translating more complex expressions. This allows us to connect the "logic" of the theory (which expressions are related to one another) to the "truth" of the structure (how objects and sets are related to one another).

Translations between human languages are generally not systematic. For example, the English sentence "The book is by May Sarton" can be translated to French as "Le livre est de May Sarton" with (roughly) the following component translations:

TheLe
booklivre
isest
byde
May SartonMay Sarton

However, the English sentence "The map is by the Metro" translates to "Le carte est pres le Metro" with the following components:

TheLe
mapcarte
isest
bypres
thele
MetroMetro

where "by" translates to "pres" rather than "de". (With apologies to the lost Belgian truck driver who helped me unknowingly generate this example with my accidentally mystical comment that "There is a map written by the subway").

It is mostly the non systematicity of translation between natural languages which makes computer translation between natural languages so difficult. In the early days of computers, before the depth of this non systematicity was fully understood, engineers thought that computers would soon be translating languages routinely and effectively. The profound dependence of natural language on context was one of the lessons learned from the introduction of the computer into natural languages.

But in mathematics, at least, systematic translation is possible and this systematicity makes it possible to say interesting things about the mapping between theories and structures. Two especially important properties are soundness and completeness:

Ten pounds in a 5 pound bag

One striking and important result out of mathematical model theory is that we can have a complete theory which is "smaller" than the structure it is modelling. This is the equivalent of putting ten pounds of potatoes into a five pound bag, and so is worth talking about. It is also one of the ways in which models have the possibility of completeness: by coordinating the logic of the model with the rules for reference, we can have a fully consistent finite model for an infinite structure.

Suppose we have the following structure consisting of four sets:

This structure is infinite, since there are infinitely many odd and even integers. However, we can build an adequate theory of this structure with just four sentences, first articulated by Euclid:

providing that we systematically translate Op to Plus, being odd into being a member of the set Odd, and being even into the set Even. The relation of this theory to the structure is sound because any translation of any of the above sentences into the structure will describe a real relation. The relation is complete, because we can take any addition relation in the structure and pick one of the four sentences to describe it.

Another interesting property of mathematical model theory is that we can have one theory for several structures. With a slightly different translation, we can link our four-sentence theory to a structure where Minus, rather than Plus, is the relation between integers, e.g.

When mathematicians think about mathematical model theory, they often think about the kinds of structures which might satisfy a particular formal model (though the models are usually more complicated than ours).

We can only manage this "magic trick" of putting ten pounds in a five pound bag because the relation in the structure, "Plus" (or "Minus" in our alternative version), has a very simple behavior with respect to odds and evens. We can make the model relation incomplete by just adding another relation to the structure.

Add a relation ">" (representing order between numbers) to the structure , and it is no longer complete. And we cannot make it complete by simply adding new expressions to the theory since ">" is not nearly as well behaved with respect to odd and even numbers as "Plus" is. However, the original theory is still sound, since its expressions still translate uniformly into relations about addition of odds and events which are true in the structure.

Not all extensions to the structure force incompleteness. We could, for instance, merge the Plus and Minus structures:

and construct a theory where Op could systematically map to either Plus or Minus or where there were two operations Op1 and Op2 which mapped to Plus and Minus respectively. This model relation would be both sound and complete, even though the structure has been extended.

Some sorts of extensions to the structure can make the relation of the theory to the structure unsound. For instance, if we expand the sets Odds and Evens to include the decimal fractions starting with odd or even numbers, thus:

our original theory would no longer be sound since we could translate:

which is not true.

However, adding those new elements (e.g. 1.33 or 2.86) alone need not have made the theory unsound. If we had added each to Odds or evens based on the last digit rather than the first, e.g.

the theory we constructed would still be sound.

As outsiders, it is odd to think of these theories as complete when they ignore the actual identity of the numbers in the structure. However, one of the advantages of reference --- elegantly captured by mathematical model theory --- is that it allows us to ignore things which don't matter. Of course, what matters depends on what you're describing and what you're choosing to describe depends on your purpose.

Lessons from the Example

First, a caveat. This example has been based on an informal and incomplete account of the model theory developed by mathematicians. In particular, model theory as studied by mathematicians generally deals with potentially infinite theories and generally involves more complex mappings between theory and structure. Indeed, what makes model theory especially challenging and satisfying as an area of mathematics is the demonstration of connections among infinite systems, which the examples above only addressed superficially.

We can still draw some useful lessons from this example:

  1. By relying on a flexible but not arbitrary mapping from theory to structure, we can have theories which are drastically (indeed, infinitely) smaller than the structures they describe. This is part of what makes models so powerful: we can use finite models to think about infinite things.
  2. Two important properties of such a mapping are soundness (everything represented in the theory is true in the structure) and completeness (everything which is true in the structure has an expression in the theory).
  3. These properties depend on both the expressions in the theory and the character of the structure; if we change either, we may undermine either the completeness, the soundness, or both.
  4. Adding new relations to a structure will not make a theory unsound, but it may make it incomplete.
  5. Adding new objects (connected by existing relations) to a structure need not make it incomplete but may make it unsound.

Some Problems of Mathematical Model Theory

Because models use reference to connect to their subject matter, reference relations affect the "identity" of elements in the model. When the rules for reference change, the rules for identity also change.

Consider the example of the Copernican revolution, where astronomy went from an earth-centered model to a sun-centered model. We probably learned in grade school that, in the Copernican revolution, scientists discovered that the planets orbited the sun rather than the Earth. However, this statement belies the real change underlying this revolution. As Thomas Kuhn has pointed out, the Copernican revolution changed the way in which cosmological theories referred. Before the revolution, the sun was a planet like the Mars, Jupiter, and the Moon. After the revolution, the sun was a different kind of thing and the Earth was a planet; likewise, the moon also stopped being a planet and became what the planets (including the sun) has always been thought to be: bodies circling the Earth. Interestingly, one of the factors in the growing acceptance of the Copernican framework was the telescopic observation which showed that the other planets looked like the Earth, with discernible mountains and valleys! This change in reference relations also opened the way for an even more striking transformation in seeing the sun as being like the other stars, with all the incredible possibilities opening from that simple profound revelation.

In building computer models, we often run into the problem of reference when we find that our initial design of data structures doesn't quite fit the evolving or emerging application. For instance, if we have a database describing financial transactions, we may wish to represent the fact that certain transactions (for instance, a mortgage payment on a house) are actually multiple transactions (a check to the lending company, a deposit in an equity account, and a payment of interest). If we design an application so that every transaction is independent, we may lose important opportunities for recognizing errors or making inferences.

For another example, closer to artificial intelligence, suppose we have a computer program working in a world of blocks and constructions built out of blocks. If we represent each block by a name in our program, what do we do when we use the saw to chop a block in two? Certain properties, like color or density, may be sustained across the divide, but others no longer hold. And in particular, the relation of the pieces to other blocks is now independent in a way in which it could not be before.

This is a problem which I call symbol-splitting: what happens when you need to change a model so that one individual now becomes two? Which of the things you know about the one individual apply to both new individuals and which apply only to one or the other? Suppose (there's a soap opera plot here) that you learn that one of your friends was actually identical twins living a single life. Knowing that there were two individuals, how could you decide what you really knew about either one of the twins? While this seems fanciful, we encounter milder forms of it everyday, when we learn that people, places, or objects have aspects which we'd never before noticed or known.

Symbol-splitting is one of the hardest problems for conventional accounts of reference. If it were rare, we might be able to think of it as a marginal problem, but it is a common part of our activities.

Inference

In our description of the virtues of models, we listed "completeness" as one function served by models. Most of the burden of completeness is carried by the model's mechanisms for inference. Inference is the way in which a model "says more than it is told". Indeed, we could describe inference as the way models get completeness: inference takes partial descriptions and makes them complete. It is also through inference that models get their internal logic, whose constraints may drive changes in both the patterns of reference and access provided by the model.

For instance, in scientific theories, the inference mechanisms often (and especially in the physical sciences) involve mathematics. For instance, Newton's, Maxwell's, and Einstein's theories were all mathematical accounts of physical behavior with a particular set of conventions about how the terms in the mathematical account correspond to experimental phenomenon. As we saw in our introductory example, it was partially a conflict in the mathematical model of Maxwell's laws --- an inferential question --- which made scientists change the way the theory referred, abandoning the reference to absolute space and time of Newton's physics. This is one reason that, in my version of the "systems model" of models, I give inference --- the logic of the interface --- an important role of its own. As often as not, it is these properties of the interface which drive changes and adaptations in reference and access.

Inference And Mathematical Model Theory

I mentioned above that mathematical model theory usually assumes infinite theories. It does this by describing ways of combining expressions to generate other expressions. These methods are called rules of inference. One of the most popular rules of inference is called modus ponens:

If X implies Y is in the theory and
   X is in the theory,
 then Y is in the theory.

while another possible rule might be:

If X implies Y on Tuesday is in the theory and
   X implies Y on Wednesday is in the theory,
 then X implies Y is in the theory.

A rule like this one is risky because this sort of combinational inference is both "blind" and relentless. There is no simple way to tell the system not to do simple combinations and deny (for example) that a store open on Tuesdays and Wednesdays will be open on Sundays. Blind inference is a little like the broom enchanted by the sorcerer's apprentice to carry water: once it starts going, it can't stop. For this reason, great care has to be taken in specifying both the rules of inference and the starting place (called the axioms) for such theories. But if we want to use such rules to talk about the real world, this makes it difficult to say certain kinds of quite natural things.

One classic example is the useful common sense rule is that living creatures which have wings can fly. With this rule, a hunter can guess (for instance) that a delicious-looking pheasant can fly and have a better chance of successfully "bagging" it for dinner. The problem of combining this rule with "blind inference" is that there is no way to say that ostriches and penguins cannot fly.

One solution to the problems of blind inference is the addition of components to the rules for inference which allow certain paths of combination to be invalidated. This solution, proposed by AI founder John McCarthy, is called "circumscription". The idea is that every rule like "if something is a bird, it can fly" is rewritten to read something like: "if something is a bird, it can fly, unless there's something bird-fly-strange about it" with a modified inference mechanism allowing the rule to be used until --- at some later point --- an assertion of the form "there's something bird-fly-strange about penguins" is added to the model, changing everything proven about penguins using the original rule.

The need for circumscription comes, in some respects, from the power of reference. A theory like "if something is a bird, it can fly" is sound for a suitably constrained world where penguins and ostriches don't exist. However, if we add something to the model which changes some of the existing terms (as in our example with classifying decimal numbers as odd or even), the theory may stop being sound. In this sense, the introduction of the predicate bird-fly-strange is a sort of prophylactic solution to possible changes in reference. It is a solution to maintaining the power of reference to parts of a structure while still handling the extension of the structure to include changes which would otherwise make the relationship between theory and structure unsound.

Circumscription solves one of the problems of logical inference, but others still remain. Many scientists feel that there are so many problems with this approach that entirely new kinds of inference are neccessary. Some of the favorite candidates involve devices or simulations which are roughly based on the means of biological computation used by our own brains. This is tricky because we are only now beginning to understand these mechanisms. However, we can learn things from devices whose general characteristics reflect those which we have long identified with biological computers: massive parallelism (our brains contain billions of neurons) and rich patterns of interconnection (our brains contain hundreds of billions of connections).

Neural Networks: Play-dough Inference

As long as there have been electronic computing devices, there have also been efforts to use these devices to simulate the operation of the human brain. In the early 1950s, at the same time that the first experiments with logical reasoning by computers were taking place, a graduate student named Marvin Minsky was nursing an assemblage which wired together hundreds of tubes and other components into a miniature neural network. The connections between the components allowed changes in one part of the device to have effects on many other parts of the device, spreading both response and change throughout the mechanism. The most striking thing about this network, in comparison to the logical reasoning systems, was that it learned: it adapted its behavior based on experience and training.

Though it is not difficult to make logical systems which learn, consistent logical systems (relying on "blind inference") need to be so simple and free of contradictions that it nearly always makes more sense to design them or at least anticipate possible designs rather than letting them evolve or develop. But this initial design effort may make them inflexible, while a system which has been learning from the start can often adapt to a new situation by learning just a little bit more.

Why is this so? An analogy may be helpful. Logical systems like those described by mathematical model theory are something like the "Lego system" of toy blocks: pieces can be assembled to make complex structures, but the basic pieces have a certain size and shape which limits the final texture and contours of Lego constructions. Network machines, like those of Minsky and his successors, are more like Play-dough modelling clay, with lots of flexibility and nothing "preset" other than color and volume. While one can quickly build things taller and stronger with Lego blocks, one can make complex shapes more quickly with Play-dough, by simply pressing the clay against some complex shape in the world to get a complex impression in the Play-dough clay.

It is this small-scale flexibility which makes it so easy for such machines to learn. It is also what enables them to provide a certain kind of completeness without much initial design or human intervention. For example, we can make a play-dough impression of an object and then pour hot wax into the impression to make a copy of its shape without having measured any of the details and contours of its surface.

Even more striking, suppose we make an impression of an object and then break the object in two: putting one part of the object back in the impression and adding hot wax (providing that the materials can all stand the hot wax) will make the fragment complete, congealing into a replica of the missing part.

Likewise, we can take a neural network and present it with a variety of inputs describing, for instance, the spelling and pronunciation of words. [Ref: Sejnowski] After some learning and training with these examples, we can present the network with just spelling or pronunciation and the network will generate, with impressive if imperfect fidelity, the missing component. More impressively, the same network will often respond appropriately even to incomplete examples of spelling/pronunciation pairs which it has never seen.

The key to this performance is the ability to accumulate many small changes into a complex system which can also tolerate small changes in its inputs. In this way, the system both learns and adapts without having to be explicitly programmed. Furthermore, the effects of the changes are spread out through the system, so it can sometimes perform quite well even when some of its components are damaged or removed. All of these advantages are conferred because the network is a sort of "semi-fluid" device in much the same way that Play-dough is but Lego bricks are not.

On the other hand, one would not want to use Play-dough to build a tower, or a bridge, or a staircase. For these, something not so flexible, like Lego bricks, makes much more sense. In addition, we have to remember that the play-dough, like the Lego bricks, have their own "logic" of a strange sort. For example, impressions in Play-dough cannot model internal hollows or other shapes into which the flexibility of the play-dough cannot extend. When we look at the mathematics of neural nets (which look more like physics than logic), we can see some of these limitations.

Again, we note the importance of purpose. Depending on what we want to do, different kinds of models are appropriate.

Reference in Neural Networks

What does reference look like for "play-dough" inference on neural network-like devices? The logical account of inference, though it was actually derived from the sciences of rhetoric (making arguments which convince people), fits well with the mathematical model theory of reference developed millenia lately. But "play dough" computing has its own referential structures which are difficult to see. In particular, selecting the inputs to a simulated neural network involves a great number of choices about what to provide as an input and what to either ignore or rely on the system to learn. Choosing the right set of inputs can make it immensely easier for the system to "learn the right thing".

For instance, in an experiment with neural networks in the mid 1980s, Michael Mozer built a network (called RAMBOT) to play a simple video game. The input to the game was the appearance of the video game display and the outputs were commands to move an animated character around the display and to have it perform certain actions.

In order to have the network learn to play the game in reasonable time, Mozer arranged the inputs so that rather than presenting the enire screen of the game in "absolute coordinates" (where the upper right corner of the screen, for instance, was always the same input), the inputs were always relative to the player. For example, one input to the network always described what was visible in the square three up and three across from the player, another the contents of the square two up and two across, and so forth.

This made it much easier for the network to learn tactics relating to playing the game, because referential assumptions simplify both learning and acting. For example, RAMBOT just needed to learn what to do when a monster was to its upper right and not what to do if (for example) it was at <8,8> and a monster was at <7,7> or it was at <5,12> and a monster was at <4,11> or it was at <3,9> and a monster was at <2,8> or .... The referential assumptions of this model made situations which were tactically identical into situations which were descriptively identical.

These referential assumptions might break, however, if absolute coordinates suddenly became important. This might be the case, for instance, if some particular threat (such as a ravenous, quick-moving, monster) always emerged in the upper right corner of the display. In fact, Mozer also represented things at a coarser and global scale, another example of multiple models being used to simplify learning and reasoning.

So neural networks, though they have the advantages of flexibility and plasticity, also make referential assumptions, just as do conventional models. And the choice of appropriate sorts of reference are just as important for such networks as they are for the logical systems we discussed earlier in the chapter. Indeed, they may be even more important.

When a neural network fails to perform a task adequately, fixing the problem may be quite complicated, as it may not be possible to explain why the network takes an inappropriate action or ignores some important aspect of the situation. In the logical system, it is at least possible to say "the model can't say this easily" or "this factor is not represented".

In the case of the neural network, where its inputs may include all the "details" of the situation, in some formulation, it is more difficult to say why some particular details or combination of details is not recognized by the network. One of the penalties of these network's admirable flexibility is a certain inexplicability which makes repair or alteration more difficult.

The Importance of Systematicity

In the mathematical account of reference, one of the most important character of reference between theory and structure is the systematicity of the translation between expressions and subexpressions in the theory and elements in the structure.

Systematicity remains important because it is the basis for the reliability of the connection between the interface and its subject. Even for a neural network, the reliable connection of inputs to the world is of vital importance if the system is to act and learn effectively. All of the purposes of models --- safety, completeness, simplicity, and community --- are served by the systematicity of the mapping.

But exactly what does systematicity mean for the inputs to a neural network such as RAMBOT? It is clear when we are translating expressions into relations that systematicity has to do with the consistency of the mapping in more complex expressions. What does it mean for a neural network whose inputs are finite and cannot be readily composed into new combinations?

Though the inputs to the neural network are finite, the aspects of the world it is describing are not. Each situation the system might find itself in --- at each successive moment --- is different in some aspects, if only by the progression of time we observe as outsiders. Systematicity demands that the inputs map to the same aspects of different situations. Only with this guarantee can the networks' behavior and learning be reliable.

This also gives us a way, when looking at a model, to figure out what patterns of reference it involves. If we look for systematicity --- across both expressions and situations --- we will have found the model's reference relations.

To look at our first examples with this tool, we know that the reference relations for the computer dates were years in the 20th century because those were the phenomenon for which the mapping was systematic. For the part of human models of time which we discussed, the reference was to events and their ordering, because that was what we saw preserved or lost as we asked questions and probed knowledge. And for the Newtonian inertial frames, we can know they are referring to absolute distances and intervals because that is exactly what is systematically preserved by the Galilean transform upon which Newton relied. This also shows us that Einstein's innovation was a change in reference for the theory that included the Lorentz transformation, because the new theory systematically transformed relative distances and intervals and did away with the need for two different kinds of measurements.

Systematicity is more than a relation between formal expressions --- as mathematical model theory took it to be --- but can include procedures for turning descriptions in a theory into actions or experiments in the world. By changing patterns of reference, maintaining the constraint of systematicity, we transform models by "choosing" what they are about. But another way of transforming models is changing the ways they are used, to which we now turn.

Access

The pivotal role played by the "user" of the model is what make access important. Access involves the connection of the logic of the model to the structure and process of the system using it. In the systems model of models, this "user" could be a computer program, a human being or human community, or even a part of a human organization or a biological or ecological system. In this book, we will speak mostly of computer programs and human beings and communities, but it is interesting to think about what the view has to say about complex biological and ecological systems.

Access is connected to the effort a system expends to use and manipulate a model. In the previous chapter, I compared addition in an analog model (which took months) to addition in a digital model (which took minutes). The striking difference between these models is one of access: the digital model is much easier to work with for purposes of addition and most other arithmetic operations.

But the digital model is also harder to work with for other purposes, such as the comparison of absolute magnitudes: comparing the length of two analog models is always the same as comparing the magnitudes they represent. This may not be the case for digital models when the magnitudes are relatively close to one another, where writing "90" takes the same space as writing "20". Access, like reference and inference, is neccessarily tied to purpose rather than any absolute measure.

Access is one of primary ways in which models can be modified to acheive the purpose of "simplicity". The reason is that we can change the way a model is accessed without changing the underlying logic or patterns of reference, giving us something which "still works" but is easier to use. We can see this kind of model change as being similar to reorganizing ones kitchen, workshop, or computer desktop so that the things which are used the most are easiest to get at. However, there is an important constraint on these sorts of changes: we can't have everything next to us. This makes convenience into what scientists call a "conserved quantity".

Convenience is a conserved quantity

When we hear about "conservation" in the popular press, it is usually in the context of needing to limit our use of some limited resource such as paper or water or oil. When we hear about "conservation" in science, it usually denotes some way of measuring something so that there is always the same "amount" of whatever is being measured. The popular definition is really just half of the scientific definition: the popular definition says "there's not going to be any more" while the scientific definition adds "there's not going to be any less either". Of course, the popular and scientific voices are often talking about different things, but in both cases you have quantities which one can divide and rearrange but not change in total sum.

What does it mean that convenience is a conserved quantity? It means that when we build a model, we need to make choices about which things we make convenient and which things we make less convenient or even awkward to access. Within any given model, we cannot make everything equally convenient, but in choosing to make something convenient we will always often up making something else less convenient. If the loser in this tradeoff is seldom used, the overall convenience of the model may be improved, as the increased convenience of the winner makes up for the new awkwardness of the loser.

This tradeoff is still an issue even if we simply decide to have two completely separate models (for instance, Ptolemaic for stars and Copernican for planets). We are still making the overall model more complicated because we need to keep track of which model we are using. This is like having two desks, where each desk is conveniently organized but we will sometimes need to switch from desk to desk (which isn't very convenient!).

Different models have different purposes because our purpose determines which things we want to make convenient and which it is acceptable to make awkward. A different purpose or set of purposes means we want to rearrange our choices about convenience and access, giving us a different model.

Why is Convenience Conserved?

To see why this tradeoff is so, let's take a very simple example depicted in the Longfellow poem, "The Midnight Ride of Paul Revere" which illustrates a basic theorem in what is called "information theory". According to the poem (and at some variance with the actual history), when Paul Revere set out to warn the colonists about the British march on Concord, he had the problem that he could not move much faster than the British troops, now matter how fast his horse or desparate his mission. However, he needed to tell his comrades how the British were moving, which he couldn't know until they started moving!

His solution was to leave before he had the information and have the information travel by a different channel: lights in the tower of the old North Church. He instructed his friend to find out how the British were leaving and to light either one or two lamps up in the steeple. If the British were coming by land, his friend would light one lamp in the steeple; if they were coming by sea, he would light two lamps. Revere, across the river in Cambridge, could see the lights and convey the message when the British had barely started.

Suppose that Revere and his comrade had wanted to convey more information, such as whether they were bringing any artillery, the numbers they were gathering, and so forth. It would be neccessary to use more lamps and more spaces on the steeple and for his friend left in Boston to trudge up and down the steps of the tower more and more times. Because some messages take more effort than others, the patriots would have to decide which messages were more likely to be needed or more important to get through quickly.

By being clever and using position or color as well as number of lights, Revere and his friend might be able to convey more messages with a smaller number of lights but they would still have to make choices about which messages use fewer lights. In fact, by using a line of lights on the edge of the steeple, they could take advantage of the same logarithmic reduction used by arithmetic and describe exponentially many messages with a given number of lamps (for instance, 256 messages with 8 lamps). But still, choices would have to be made with the growth of the number of possible messages they would have to send.

Because models rely on differences (a light on the steeple or not) and differences involve choices (putting it there or not), models always involve choices about what will be described. This is one of the limits on the power of models, since these choices and the reasons which motivate them may turn out to be inappropriate or incorrect.

Constructive Ignorance

What is obvious in the case of encodings like the lamps on the tower is less obvious with more complicated models. Sometimes, a model can readily be made simpler without introducing any additional complexity elsewhere. For instance, the subway maps in the introduction both describe the same subway system, but as we commented, the idealized map:

was more useful than the positionally accurate map. For the purposes of planning one's transfers, nothing is made more complex in this map. However, it doesn't accurately represent either the physical location of the stations or the distance and time between stops. As we pointed out before, if we wanted to use that map for scheduling transfers (where distance/time between stations is important) or for figuring out how to transform one's commute during a rail strike (where physical location is important), the added detail and description might easily make the map more complicated than the positionally accruate map which it replaced.

The design of models for humans or machines involves the making of choices. And the choices we make depend on the purposes we have and the multiplicity of purposes gives us a multiplicity of models.

Familiarity: Access and Metaphor

While one element of access is the choices that have to be made regarding convenience, another element is the actual connection of the model to the structure of the system which is "using" the model. Because these systems are often quite complex (using multiple models, in fact), this mapping --- a kind of "inverse reference" --- is very important. And this importance is why metaphor --- the systematic linking of different structures in thought and language --- plays such an important role in model.

In the late 1970s, a linguist (George Lakoff) and a philosopher (Mark Johnson), began arguing for the radical assertion that nearly all language use is metaphorical. Previously, linguists and philosophers had claimed that metaphor was a "marginal" phenomenon of primarily historical interest for its role in language change and evolution. They thought that the only connection between "prices rose" and "balloons rose" was historical. According to this established theory, there had once been an act of imagination connecting them, but that the use of similar words did not denote connections between their meanings.

What Lakoff and Johnson demonstrated was that everday language remained systematic in its use of metaphor. This is the exact same sense of systematic which we used in describing formal model theory above: metaphorical mappings between systems of words ("prices rise" and "balloons rise") retained the relations between the words (if "balloons crash", then "prices crash"). Interestingly, this became visible in the words or turns of phrase which were not used: one does not generally say "prices rose until they hit the floor" and linguists and non-linguists alike will know that something sounds a little funny about it.

Given this strong evidence of how meanings were connected, Lakoff and Johnson began two parallel studies. The first began reexamining all of human cognition and language in the light of their new understanding of understanding. The second was to look at what common foundations these metaphors might have, which they believed lay in the physical and bodily experience of human beings. From this second agenda, we come to the question of access in models where the "user" is a human being.

Convenience is important to access but so is "familiarity". In the same way that the two-digit model of calendar years fitted the architecture of early computers, the metaphors articulated in the logic of language fit the architecture of human language users. In language, we do not need to expend great effort to "figure out" what is meant by "prices fell to the floor" or "negotiations turned around". Our experience and inner lives, which we substantially share by being of similar construction in relatively similar worlds, connects naturally to such models, linking the logic of our experience to the logic of the model.

This has two important consequences. The first is that when we build computer programs, whose "experience" and "bodies" are different from ours, that we need to understand how the structure of our own bodies, experience, and culture is linked to the models we can comfortably use. This doesn't neccessarily mean that we need to give bodies and experience to computers, but it does mean that we need to understand what the natural modes of human activity and human models are. Otherwise, we will end up with computers which are dehumanizing, hard to use, and possibly more trouble than they are worth.

The second consequence is that we can not get away from thinking about the user and their purposes --- which is often the human element --- in thinking about about models. In the next part of this book, I will be illustrating this way of looking at models --- in terms of reference, access, and inference --- to look at a variety of models and their uses. In addition to describing the logic of the models themselves --- which is neccessary --- it will be important to repeatedly ask two questions:

Copyright (C) 1997, 1998 by Kenneth Haase
Draft, not for citation or circulation
Back Contents Comments Next