Free Novel Read

Once Upon an Algorithm Page 2


  Problem Solving and Its Limitations

  An algorithm is a method for solving problems, be it finding the square root of a number or baking a cake. And computer science is a discipline that is concerned with systematic problem solving.

  Of the many problems that can be solved through algorithms, two deserve to be discussed in detail. In chapter 5, I explain the problem of searching, one of the most frequently used computations on data. Chapter 6 then explains the problem of sorting, which illustrates a powerful problem-solving method and also the notion of intrinsic problem complexity. In chapter 7, I describe the class of so-called intractable problems. Algorithms exist for these problems, but take too long to execute, so the problems are not solvable in practice.

  Chapters 5, 6, and 7 clarify

  why searching can be difficult and time consuming,

  methods to improve searching,

  different algorithms for sorting,

  that some computations can support others, such as sorting can support searching,

  that algorithms with exponential runtimes cannot really be considered solutions to problems.

  Why does it matter?

  We spend countless hours of our lives searching, be it for our car keys or for information on the internet. It is therefore helpful to understand searching and to know about techniques that can help make it more efficient. In addition, the problem of searching illustrates how the choice of representation affects the efficiency of algorithms, reflecting John Dewey’s observation, “A problem well put is half solved.”1

  Knowing when a problem cannot be solved efficiently is just as important as knowing algorithms for solvable problems because it helps us avoid searching for efficient solutions where none can exist. It suggests that in some cases we should be content with approximate answers.

  In the realm of technology, the most obvious instance of searching is given by internet search engines such as Google. Search results for a query are not presented in arbitrary order but are typically sorted according to their anticipated importance or relevance. The knowledge about the hardness of problems is used to develop algorithms that compute approximate solutions where exact solutions would take too long to compute. One famous example is the traveling salesman problem, which is to find the round-trip that visits a certain number of cities in an order that minimizes the total distance traveled.

  Knowledge about the lack of an efficient algorithm to solve a problem can also be exploited in a positive way. One example is public key encryption, which enables private transactions on the internet, including managing bank accounts and shopping online. This encryption only works because currently no efficient algorithm is known for prime factorization (that is, writing a number as the product of prime numbers). If that were to change, public key encryption would not be safe anymore.

  Language and Meaning

  Any algorithm has to be expressed in some language. Current computers cannot be programmed in English because natural languages contain too many ambiguities, which humans, but not machines, can deal with easily. Therefore, algorithms that are to be executed by machines have to be written in languages that have a well-defined structure and meaning.

  In chapter 8, I explain what a language is and how we can define its syntax. A syntax definition for a language ensures that each of its sentences has a well-defined structure, which is the basis for understanding and defining the meaning of sentences and languages. In chapter 9, I discuss the meaning of languages and the problem of ambiguity.

  Chapters 8 and 9 describe

  how a grammar defines a language,

  how a grammar can be used to construct all sentences belonging to the language,

  what syntax trees are,

  how syntax trees represent the structure of sentences and resolve ambiguities in the meaning of sentences.

  Why does it matter?

  We employ languages to communicate meaning. For communication to work, the communicating partners have to agree on what counts as a proper sentence and what each sentence means. For example, instructions in recipes must be precise about measurements, oven temperature, cooking time, and so on, to produce the desired outcomes.

  In most areas of our lives we have produced special terminology and languages that facilitate more effective communication. This is particularly true in computer science, in which a crucial part of communication happens via machines. Since machines are inferior to humans in their abilities to process language, the precise definition of languages is important to ensure that programmed machines behave as expected.

  In the realm of technology, a widely used programming language is the formula language of spreadsheets. Anybody who has ever put a formula into a spreadsheet has written a spreadsheet program. Spreadsheets are notorious for being erroneous at times and causing losses of billions of dollars because of incorrect formulas. Another ubiquitous language is HTML (hypertext markup language). Whenever you load a web page onto your laptop, PC, or cell phone, it is very likely that the content is presented in your browser in HTML, which is a language that makes the structure of the web page explicit and presents it in an unambiguous way. While HTML is just for representing information and does not itself describe computation, another language any web browser understands these days is JavaScript, a language that especially defines the dynamic behavior of web pages.

  Control Structures and Loops

  Instructions in an algorithm have two distinct functions: they either directly manipulate data, or they decide which instructions are to be performed next and how often. The latter kind of instructions are called control structures. Just like the plot of a movie or a story ties together individual actions and scenes into a coherent narrative, control structures build algorithms out of individual instructions.

  In chapter 10, I explain different control structures and focus specifically on loops, which are used for expressing the repetition of actions. An important question, discussed in chapter 11, is whether a loop terminates or runs forever and whether this can be decided by an algorithm.

  Chapters 10 and 11 discuss

  what control structures are,

  why control structures are a crucial part of any language for expressing algorithms,

  how repetitive tasks can be expressed by loops,

  what the halting problem is, and how it exemplifies a fundamental property of computation.

  Why does it matter?

  You have to grease the pan before baking the pancakes. The order of steps in recipes matters. Moreover, recipes sometimes contain decisions that are based on properties of the ingredients or cooking utensils. For example, if you are using a convection oven, you have to use a shorter baking time or a lower temperature (or both). A recipe contains a loop when it instructs you to repeatedly perform some actions, such as repeatedly adding an egg and whisking the batter for a bundt cake.

  The distinction between control structures and other operations amounts to the difference between doing something and organizing when and how often to do it. For any process or algorithm we may want to know whether it does what it is supposed to do, or even simply whether it can be completed at all. This rather simple question, posed by the halting problem, is only one example of many properties of algorithms that one would like to know about. Knowing which properties of algorithms can be determined automatically by other algorithms tells us about the reach of algorithms and the limits of computation.

  In the realm of technology, control structures are used wherever algorithms are used, and thus they are everywhere. Any information sent over the internet is transmitted in a loop repeatedly until it has been properly received. Traffic lights are controlled by endlessly repeating loops, and many manufacturing processes contain tasks that are repeated until a quality measure is met. Predicting the behavior of algorithms for unknown future inputs has many applications in security. For example, one would like to know if a system is vulnerable to attacks by hackers. It also applies to rescue robots that have to be used in
situations different from the ones they are trained in. Accurately predicting robot behavior in unknown situations can mean the difference between life and death.

  Recursion

  The principle of reduction—the process of explaining or implementing a complex system by simpler parts—plays an important role in much of science and technology. Recursion is a special form of reduction that refers to itself. Many algorithms are recursive. Consider, for example, the instructions for looking up a word in a dictionary that contains one entry per page: “Open the dictionary. If you can see the word, stop. Otherwise, look up the word in the dictionary part before or after the current page.” Notice how the look-up instruction in the last sentence is a recursive reference to the whole process that brings you back to the beginning of the instructions. There is no need for adding something like “repeat this until the word is found” to the description.

  In chapter 12, I explain recursion, which is a control structure but is also used in the definition of data organization. In chapter 13, I illustrate different approaches for understanding recursion.

  Chapters 12 and 13 examine

  the idea of recursion,

  how to distinguish between different forms of recursion,

  two different methods to unravel and make sense of recursive definitions,

  how these methods help in understanding recursion and the relationship between its different forms.

  Why does it matter?

  The recursive definition of “season to taste” is as follows: “Taste the dish. If it tastes fine, stop. Otherwise, add a pinch of seasoning, and then season to taste.” Any repeated action can be described recursively by using the action to be repeated (here “season to taste”) in its description and a condition when to stop.

  Recursion is an essential principle for obtaining finite descriptions of potentially infinite data and computations. Recursion in the grammar of a language facilitates an infinite number of sentences, and a recursive algorithm allows it to process inputs of arbitrary size.

  Since recursion is a general control structure and a mechanism for organizing data, it is part of many software systems. In addition, there are several direct applications of recursion. For example, the Droste effect, in which a picture contains a smaller version of itself, can be obtained as a result of a feedback loop between a signal (a picture) and a receiver (a camera). The feedback loop is a recursive description of the repetitious effect. Fractals are self-similar geometric patterns that can be described through recursive equations. Fractals can be found in nature, for example, in snowflakes and crystals, and are also used in analyzing protein and DNA structures. Moreover, fractals are employed in nanotechnology for designing self-assembling nanocircuits. Self-replicating machines are a recursive concept because once they are operating, they reproduce copies of themselves that reproduce further copies, and so on. Self-replicating machines are investigated for space exploration.

  Types and Abstraction

  Computation works by transforming representations. But not every transformation is applicable to every representation. While we can multiply numbers, we cannot multiply lines, and similarly, while we can compute the length of a line or the area of a rectangle, it does not make sense to do that for a number.

  Representations and transformations can be classified into different groups to facilitate the distinction between transformations that are viable and those that don’t make sense. These groups are called types, and the rules that determine which combinations of transformations and representations are allowed are called typing rules. Types and typing rules support the design of algorithms. For example, if you need to compute a number, you should employ an operation that produces numbers, and if you need to process a list of numbers, you have to use an operation that accepts lists of numbers as input.

  In chapter 14, I explain what types are and how they can be used to formulate rules for describing regularities of computations. Such rules can be used to find errors in algorithms. The power of types lies in their ability to ignore details about individual objects and therefore to formulate rules on a more general level. The process of ignoring details is called abstraction, which is the subject of chapter 15, where I explain why abstraction is central to computer science and how it applies not only to types but also to algorithms, and even computers and languages.

  Chapters 14 and 15 discuss

  what types and typing rules are,

  how they can be used to describe laws about computation that help to detect errors in algorithms and to construct reliable algorithms,

  that types and typing rules are just a special case of the more general idea of abstraction,

  that algorithms are abstractions of computation,

  that types are abstractions of representations,

  that runtime complexity is an abstraction of execution time.

  Why does it matter?

  If a recipe requires the opening of a can of beans, you’d be surprised if someone approached this task using a spoon because that would violate a typing rule that deems spoons inadequate for that task.

  The use of types and other abstractions to describe rules and processes is ubiquitous. Any procedure that has to be repeated suggests an algorithmic abstraction, that is, a description that ignores unnecessary details and replaces variable parts by parameters. Recipes also contain algorithmic abstractions. For example, many cookbooks contain a section describing basic techniques, such as peeling and seeding tomatoes, which can then be referred to in recipes by simply requesting a particular amount of peeled and seeded tomatoes. Moreover, the roles of the different objects that take part in such an abstraction are summarized by types that characterize their requirements.

  In the realm of technology, there are many examples of types and abstractions. Examples of physical types are all kinds of differently shaped plugs and outlets, screws and screwdrivers and drill bits, and locks and keys. The shapes’ purpose is to prevent improper combinations. Software examples of types can be found in web forms that require entering phone numbers or email addresses in a specific format. There are many examples of costly mistakes that were caused by ignoring types. For instance, in 1998, NASA lost its $655 million Mars Climate Orbiter because of incompatible number representations. This was a type error that could have been prevented by a type system. Finally, the notion of a computer is itself an abstraction of humans, machines, or other actors that are capable of executing algorithms.

  How to Read This Book

  Figure 2 presents an overview of the concepts discussed in this book and how they are related. Chapters 7, 11, and 13 (dark-shaded boxes in figure 2) contain more technical material. These chapters may be skipped and are not required in order to understand the rest of the book.

  While the material in this book is arranged in a specific order, it doesn’t have to be read in that order. Many of the chapters can be read independently of one another, even though later chapters of the book sometimes refer to concepts an examples that have been introduced earlier.

  The following provides guidance for selecting chapters to read and the order in which to read them. It also provides exits and shortcuts that allow readers to skip ahead while reading chapters. While I discuss the computing concepts via events, people, and objects that occur in stories, I also sometimes introduce new notation and work through examples to demonstrate important aspects in more detail. Hence, some parts of the book are easier to follow than others. As a reader of many popular science books, I am well aware of the fact that one’s appetite for such details may vary. That is why I hope this guidance will help the reader to better navigate the book’s content.

  Figure 2 Concepts of computation and how they are related.

  I suggest reading chapters 1 and 2 first, since they introduce fundamental concepts of computing that appear throughout the book, such as algorithm, parameter, computer, and runtime complexity. These two chapters should be easy to follow.

  The other six topic areas (light-shaded bo
xes in figure 2) are largely independent of one another, but within each topic area the chapters should be read in order. Chapter 4 introduces several data structures, so it should be read before chapters 5, 6, 8, 12, and 13 (see diagram).

  Finally, the Glossary at the end of the book provides additional information about how the chapters relate by grouping definitions into topic areas and cross-linking their entries.

  Part I

  ALGORITHMS

  Computation and Algorithms

  Hansel and Gretel

  Getting up

  It is early in the morning. The alarm clock rings. You move your body out of bed—eventually. You get dressed. This simple daily getting-up routine solves a recurring problem through a series of well-defined steps. In computer science such a routine is called an algorithm. Taking a shower, brushing your teeth, having breakfast, and so on, are more examples of algorithms that solve specific problems.

  But wait a minute. Except perhaps for not getting enough sleep, what is the problem here? We usually do not perceive mundane everyday activities as producing solutions to problems. Maybe this is because these problems have obvious solutions or the solutions are easily obtained. However, the word problem is commonly used for situations or questions that have well-known solutions. Think of exams, which contain problems with well-defined answers. A problem is thus any question or situation that demands a solution, even if it is clear how to get it. In this sense, having to get up in the morning is a problem that has a well-known method for producing a solution.