- Home
- Martin Erwig
Once Upon an Algorithm Page 7
Once Upon an Algorithm Read online
Page 7
Similarly, the pebbles in the forest are signifiers that represent locations for Hansel and Gretel and belong to the computation representation. In addition, the individual locations represent positions of danger in the problem representation. To distinguish between different locations, one could further quantify the degree of danger by the distance of a location from Hansel and Gretel’s home. Moving from pebble to pebble means simply a change of location in the computation representation but also means a decrease in danger in the problem representation if the location moved to is closer to home. The transitivity of signs is the reason “He worked at CHH” means that he worked at Charing Cross Hospital, not that he worked at “Charing Cross Hospital,” which would not make any sense, since one cannot work at the name of a hospital.
Making Sense of Signifiers
A sign that works across different representation levels provides an example of a signifier that is bound to multiple signifieds. The sign “1” in the roulette example signifies the number one and a bet; the pebbles used by Hansel and Gretel represent locations as well as danger; and the collection of pebbles represents a path as well as a way from danger to safety. Thus any abbreviation represents the name it stands for as well as the concept the name stands for.
However, one signifier can also represent different, unrelated concepts, and it is also possible for one concept to be represented by different, unrelated signifiers. For example, “10” signifies the number ten in the decimal representation and the number two in the binary representation. Moreover, the number two is represented by the signifier “2” in the decimal representation and by the signifier “10” in the binary representation. Of course, multiple representations exist also on the level of problem representation. Clearly, the number one can be used to represent other things than a bet on black at the roulette table.
These two phenomena are well known in linguistics. A word is said to be a homonym if it stands for two or more different signified concepts. For example, the word “trunk” can stand for the upright part of a tree, the nose of an elephant, or the luggage compartment of a car. In contrast, two words are said to be synonyms if they both represent the same concept. Examples are “bike” and “bicycle” or “hound” and “dog.” In the context of computation, homonyms raise several important questions.
For example, if one signifier can represent different signifieds, which of the representations is actually active when the signifier is used? Not surprisingly, the representation that a signifier invokes depends on the context in which it is used. For example, when we ask the question, What does “CCH” stand for?, we are inquiring about the meaning of the abbreviation, which is the name “Charing Cross Hospital.” In contrast, a question such as, Have you ever been to “CCH”? makes use of the fact that it refers to a hospital and thus picks out the second representation. Moreover, the signifier “10” represents ten or two, depending on whether one uses the decimal or the binary representation. The story of Hansel and Gretel also illustrates that usage context matters for a sign to play a particular representational role. For example, when the pebbles were lying in front of Hansel and Gretel’s house, they didn’t represent anything in particular. In contrast, once they were deliberately placed in the forest, they represented locations used for finding a path.
The same signifier can also have different meanings for different agents interpreting a sign. For example, the breadcrumbs used by Hansel and Gretel during the second night signified locations for Hansel and Gretel. However, the birds of the forest interpreted them as food. Both interpretations of the breadcrumbs make sense and work, from either Hansel and Gretel’s or the birds’ point of view. It doesn’t require much imagination to see that homonyms can cause problems in algorithms, since they essentially present an ambiguity that has to be resolved somehow. Why would anyone want or even need one name representing different values in an algorithm? See chapter 13, where I also explain how to resolve the apparently resulting ambiguities.
Finally, it is also possible to be mistaken about a representation and to associate an incorrect signified with a signifier. An example is Sherlock Holmes’s inference that the inscription on Dr. Mortimer’s walking stick represents his retirement, whereas it actually represents Dr. Mortimer’s wedding anniversary (see figure 3.1). This particular misrepresentation is discussed and resolved as part of the story The Hound of the Baskervilles itself.
Figure 3.1 Signs are the basis for representation. A sign consists of a signifier that represents some concept, called the signified. One signifier can stand for different concepts.
The correctness of representation is crucial for computation because if a computation receives an incorrect representation as input, it will produce incorrect results. This fact is sometimes referred to as “garbage in, garbage out.” Unsurprisingly, computations based on incorrect input that lead to incorrect results can have devastating consequences. If the pebbles represented a path farther into the forest, no correct path-finding algorithm could help find Hansel and Gretel their way home, and they would die in the forest.
A stark reminder of the importance of carefully choosing representations is the loss of the Mars Climate Orbiter, an unmanned spacecraft that was launched in 1998 by NASA to explore the climate and atmosphere of Mars. During a maneuver to correct its trajectory, the spacecraft came too close to the surface and disintegrated. The reason for the failed maneuver was the use of two different representations of numbers by the control software and the spacecraft. The software computed thrusts in English units while the thrust controller expected numbers in metric units. This failure of representation came with a heavy price tag of $655 million. I discuss methods to avoid these kind of mistakes in chapter 14.
Three Ways to Signify
Given the importance of accurate representations, how is the relationship between a sign and what it signifies established? This can happen in several different ways, and signs can be classified accordingly. The logician, scientist, and philosopher Charles Sanders Peirce identified three different kinds of signs.
First, an icon represents an object based on its similarity or likeness to the object. An example is a drawing of a person that represents the person by highlighting specific features. An obvious example of an iconic representation in The Hound of the Baskervilles is the portrait of Sir Hugo Baskerville, which represents him through likeness. The portrait also looks similar to the murderer and is thus another example of a signifier that can stand for different signifieds. The fact that the portrait is effectively two signs helps Sherlock Holmes solve the case. Further examples are the abbreviations CCH and MRCS when they are used to represent the phrases they stand for. Here the likeness is established by the characters that a phrase and its abbreviation have in common. Finally, Sherlock Holmes employs a map of the Devonshire moor to understand the location where the murder has happened. A map is iconic, since the features it contains (paths, rivers, forest, etc.) resemble, mostly in shape and position, the objects they represent.
Second, an index represents an object through some lawlike relationship that lets the viewer of the index infer the object through this relationship. An example is a weather vane from whose direction the wind direction can be inferred. Other examples are all kinds of gauges that have been engineered as indexes for different physical phenomena (temperature, pressure, speed, etc.). The saying “Where there is smoke, there is fire” is based on smoke being an index for fire. An index sign is determined by the object it signifies through the lawlike relationship between them. Other important index signs in The Hound of the Baskervilles (and in other Sherlock Holmes stories) are footprints. For example, the dog footprints found near the deceased Charles Baskerville signify a gigantic hound. Moreover, the fact that the footprints stop at some distance from Sir Charles is interpreted by Sherlock Holmes to indicate that the hound did not have physical contact with him. The special form of Sir Charles’s footprints indicates that he was fleeing the hound. Another index is the amount of ashes
from Sir Charles’s cigar found at the crime scene, indicating the time he was waiting at the location of his death. Incidentally, Peirce himself used the example of a murderer and his victim as an example of an index. Applied to the story, this means the dead Sir Charles is an index for his murderer.
Third, a symbol represents an object by convention only; no likeness or lawlike connection is involved in the representation. Since the link between the signifier and the signified is completely arbitrary, the creator and the user of the sign must agree on the definition and interpretation of the sign for it to work. Most modern languages are symbolic. The fact that the word “tree” stands for a tree cannot be inferred but is a fact that has to be learned. Similarly, that “11” is a symbol for eleven as well as three and that the pebbles are symbols for locations are conventions. Of the signs we have mentioned from The Hound of the Baskervilles, the abbreviations MRCS and CCH are symbols if used to represent the surgeon society membership and the hospital, respectively, because they bear no resemblance nor do they result from any lawlike relationship. Also, the symbol 2704 represents a cab that Holmes and Watson follow to identify a person suspected to be threatening the Baskerville heir Sir Henry.
Using Representations Systematically
Having distinguished icon, index, and symbol, we can now look at how these different forms of representation are used in computation. Since computation works through transforming representations, the different representation mechanisms of icon, index, and symbol lead to different forms of computation.
For example, since icons represent through likeness, they can be transformed to reveal or hide specific aspects of the represented object. Photo editing tools offer numerous effects to alter pictures in a systematic way, for example, by changing colors or distorting image proportions. The computation effectively changes the likeness of the icon. Another method of computing with iconic representations that is relevant to the profession of Sherlock Holmes is the creation of drawings of suspects, facial composites, according to descriptions of eyewitnesses. The eye witness reports features about the face, such as the size and shape of the nose, or the color and length of the hair, which are interpreted by a police sketch artist as drawing instructions to produce a drawing of the suspect. The computation of a suspect drawing is the result of an algorithm that is given by the eye witness and executed by the police sketch artist. Given its algorithmic nature, it is not surprising that this method has been automated.
The method is due to Alphonse Bertillon, whose ideas about anthropometry, the measurement of people’s body parts, were adopted by the Paris police in 1883 as a method for the identification of criminals. He created a classification system for facial features that was originally used to find specific suspects in a large collection of mug shots of criminals. This method is an example of a computation that uses sketches for searching, an important algorithmic problem (see chapter 5). Sherlock Holmes is an admirer of Bertillon’s work, even though he does not speak too highly of him in The Hound of the Baskervilles. Another computation with sketches is the process of inferring the identity of suspects from facial composites. This computation is effectively establishing a sign where the signifier is the sketch and the suspect is the signified. A sign is also established when a suspect is recognized with the help of a sketch or picture. This happens when Sherlock Holmes recognizes the murderer in the portrait of Sir Hugo Baskerville.
As an example of a computation with an index sign, recall the map of the Devonshire moor. To find the location where a particular road crosses a river, Sherlock Holmes could compute the intersection of the two lines that represent the river and the road, respectively. In fact, the map representation has the point effectively computed already, so that it could be simply read off the map. Assuming that the map is accurate, the resulting point then would represent the sought location.4 Sherlock Holmes could also compute the length of a path on the map and, using the scale of the map, determine the length of the path in the moor and also the time needed to traverse it. Again, this works if the map is drawn to scale. Index computations exploit the lawlike relationship between sign and signified to move from one signified to another via a transformation of the index.
Computations with symbols are probably the most common in computer science, because symbols enable the representation of arbitrary problems. The most obvious symbol-based computations involve numbers and arithmetic, and we can find such an example in Sherlock Holmes’s first adventure, A Study in Scarlet,5 where he computes the height of a suspect from the length of the suspect’s stride. This is a very simple computation, consisting of only one multiplication, and the corresponding algorithm consists of simply one step.
More interesting from a computational perspective are Sherlock Holmes’s attempts to decode encrypted messages. In The Valley of Fear he tries to understand a message sent to him that starts as follows: 534 C2 13 127 36 …. This code signifies a message. The first task for Sherlock Holmes is to figure out the algorithm that was used to generate the code, since this allows him to decode the message. He concludes that 534 must be a page number of some book, that C2 means “second column,” and that the following numbers represent words in that column.
But is the code really a symbolic sign? Since the code is generated by an algorithm from a given message, it seems that the encoding algorithm establishes a lawlike relationship between the message and its code. Therefore, the code is not a symbol but an index. This illustrates another way in which computation is involved with signs. Whereas interpretation produces the signified for a given signifier, an algorithm for generating index values operates in the opposite direction and generates a signifier from a given signified.
The bottom line is this: Representation is the basis for computation. Its nature and basic properties can be understood through the lens of signs. And much like works of art can be based on many different materials (clay, marble, paint, etc.), computation can be based on different representations. The importance of representation was emphasized in chapter 1: No computation without representation.
In Your Office
You have arrived at your office and face the task of working through a collection of documents. Before you can start the actual work, you have to decide on the order in which to process the documents and how to manage the document collection to ensure that order. These questions are also relevant in other contexts. Think, for example, of a car mechanic who has to repair several different cars or a physician who has to treat a number of patients in the waiting room.
The order in which the elements of a collection (documents or cars or people) are to be processed is often determined by a policy, such as first come, first serve, which means that the elements are dealt with in the order in which they arrive. Such a policy requires that the maintenance of a collection follow a particular pattern that defines the interplay between adding, accessing, and removing elements. In computer science a collection with a specific access pattern is called a data type, and the data type that captures the first come, first serve principle is called a queue.
While queues are widely used to determine the order in which the elements of a collection are to be processed, there are also other strategies. For example, if elements are processed according to some priority instead of in the order they arrive, the collection is a priority queue data type. Some of the documents in your office might be of that type, for example, an urgent inquiry that has to be answered immediately, a memo that you have to respond to by lunch time, or an offer that must go out today. Other examples are patients in the emergency room, who are treated in order of the severity of their symptoms, or travelers who can board an airplane according to their frequent flier status.
Another pattern is the processing of requests in the opposite order of their appearance. While this may seem odd at first, this situation arises quite frequently. For example, imagine you are working on your tax return. You may start with the main tax form, but when one of the fields asks for your deductions, thi
s requires you to fill out another form first. To do this, you have to retrieve the corresponding receipts and add the amounts. You finish dealing with the three kinds of documents in the opposite order in which they became relevant: You first enter the amounts from the receipts and put them away. Then you can complete the deductions form, and finally you go back to working on the main tax form. A collection whose elements are processed in this order is referred to as a stack data type, since it behaves like a stack of pancakes: the pancake that ended up last on the stack will be eaten first, and the bottom-most pancake, which is the first element put on the stack, gets eaten last. The pattern described by a stack data type occurs in a variety of tasks, ranging from baking to assembling furniture. For example, the egg whites have to be whisked before being added to the batter, and the drawer has to be screwed together before being inserted into the cupboard.
Knowing the access pattern for processing the elements of a collection leads to the second question of how to arrange the elements to best support that pattern. Such an arrangement of elements is called a data structure. Let’s take the queue data type as an example and see how it can be implemented in different ways. If you have enough desk space (which may be too optimistic an assumption), you can line up the documents, add them at one end and remove them from the other. Every now and then you shift all documents to move the empty space at the front of the queue to its end. This is similar to how people line up one after another at a coffee shop. Each person enters the line at one end and makes their way to the front after all other people in front of them have left the line. A different method, employed in many government offices, is for everybody to pull a number and then wait to be called for their turn. You can also use this system for the documents in your office by attaching sticky notes with consecutive numbers to them.