About this text

This revision of the book was last updated on May 21, 2025.

This is an old revision of the Remix. The current revision can be viewed here.

This chapter was last updated on February 2, 2025.
Revised subsection "Partial list of changes made (or to be made) to the Remix" after reordering the chapters for the Spring 2024 semester.

This work, "Discrete Math - The MKD Remix" by Mark Kelly Davis, is adapted from “Discrete Math” by Mohamed Jamaloodeen, Kathy Pinzon, Daniel Pragel, Joshua Roberts, and Sebastien Siva. “Discrete Math” is used under CC BY-NC 4.0. All files for “Discrete Math” are available at its associated GitHub repository.

This work, "Discrete Math - The MKD Remix", is licensed under CC BY-NC 4.0 by Mark Kelly Davis.

If you were a student in my course during the Spring 2024 semester, the version of the book you used can be found here.

If you were a student in one of my courses during the Fall 2024 semester, the version of the book you used can be found here.

How does the Remix differ from the original work?

The remixer’s goal is to adapt the original “Discrete Math” to create an OER textbook for a one-semester Discrete Mathematics/Discrete Structures course for Computer Science students who have some experience coding in Java, Python, or another high-level programming language, and have completed an Algebra Ⅱ-level (or equivalent) mathematics course. The Remix is a work-in-progress that will continue to evolve over time toward this goal.

The remixer’s intent is that the Remix:

  • address topics and learning outcomes listed for

  • give students the opportunity to learn new content based on what they already know then to move toward building a more formal understanding of the content (e.g., pointing out that the set of odd integers and the set of even integers are the two equivalence classes of an equivalence relation, , and that the rules for adding and multiplying "odds" and "evens" is an example of modular arithmetic.)

If you are looking for an OER textbook for a Discrete Mathematics course intended primarily for Mathematics majors (e.g., one that does not include topics like analysis of algorithms and binary tree traversal), there are many suitable ones that exist. For example, see Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

About the use of Python in the Remix

The Remix is intended for a course that does not require programming. Python is not part of the course content.

The original “Discrete Math” uses Python code samples throughout the textbook and includes "Introduction to Python" as its 3rd chapter. The Remix repurposes this content: Code samples in the Remix are used as "pseudocode that can run on a computer," with coding that uses "just enough Python" to illustrate important abstract ideas and concepts. Most of the existing Python examples were altered, and many new Python examples were introduced throughout the Remix. Note that, in order to illustrate concepts and ideas in the style of pseudocode, much of the Python code shown in the Remix avoids using built-in functions and often uses less efficient data structures and algorithms! For example, in the chapter "Algorithms & Big-O", code samples for sorting and searching avoid using built-in Python functions in order to illustrate all steps needed by the algorithm. In many cases, a comment can be found near a non-optimal code example that explains or illustrates a more Pythonic way of coding.

Partial list of changes made (or to be made) to the Remix.

  • Terminology, definitions, notation and symbols were changed throughout the Remix to align with other commonly-used textbooks. For example, the Remix defines the set of natural numbers \(\mathbb{N}\) to include the integer 0 as an element; this definition is very common and is in fact a "standard" that appears in International Standard ISO 80000-2:2019, Quantities and units — Part 2: Mathematics.

  • In the chapter "Introducing Discrete Mathematics," informal definitions of foundational mathematical ideas needed in the course are introduced. This is done so that learners can see what they do (or do not) already know and create the necessary basis to learn the course content. In addition, a new Appendix, "On-Demand Math Resources" was written which includes material that learners can refer to as needed.

  • The original chapter "Introduction to Python" was moved to the appendices.

  • The original chapter "Counting" was split into two chapters, "Counting: Arithmetic Techniques" and "Counting: Permutations And Combinations". The first of these chapters is placed near the beginning of the book, but the second is place much later, after sequences and recurrence relations have been discussed.

  • The order of the chapters "Set Theory" and "Logic" was swapped. New material was inserted into each of the two chapters.

  • A new chapter, "Proofs: Basic Techniques," was written and inserted after "Logic."

  • The chapter "Number Bases" is based on the original chapter "Number Theory," but the content on divisibility, congruence, and modular arithmetic was moved into the remixed chapters "Introducing Discrete Mathematics" and "Relations."

  • The chapter "Sequences and Recursion" is based on the original chapter "Sequences, Recursive Definitions, and Induction," which was split into two new chapters, "Sequences and Recursion" and "Proofs: Mathematical Induction." "Sequences and Recursion" appears before and as a lead-in to "Functions" since sequences are a special case of functions and recursion is often used to define functions.

  • The chapter "Functions" was moved to its new position, several chapters after "Set Theory." This was done for the following reasons:

    • The learner is expected to have a basic working understanding, from previous classes, of the one-to-one correspondence concept: A unique pairing of each element in one set with elements in another set.

    • The learner is expected also to have a basic working understanding of the function concept: A rule/mapping/association that takes certain objects as inputs and assigns each such input to exactly one output object.

    • It is likely that the learner has some ability to work with function notation and operations such as composition and inversion of functions from previous mathematics courses.

    • The remixer felt that a precise, formal definition of function, as well as properties such as injectivity and surjectivity, could be delayed until after learners had used their previous knowledge of functions.

  • A new chapter, "Relations," was written to include topics listed in the ACM/IEEE-CS/AAAI and CCC C-ID courses but absent from the original work, and was inserted after "Functions". This chapter also includes some of the content on divisibility, congruence, and modular arithmetic from the "Number Theory" chapter of the original work.

  • The chapter "Proofs: Mathematical Induction" is based in part on the original chapter "Sequences, Recursive Definitions, and Induction," but the content of this chapter was heavily rewritten and new content was inserted. This chapter was placed immediately before the chapters "Rates of Growth of Functions" and "Algorithms and Their Analysis" so that mathematical induction can be viewed as a way of validating algorithms rather than as just another more complicated proof technique.

  • The order of the chapters "Algorithms" and "Growth of Functions" was swapped, then the title "Growth of Functions" was changed to "Rates of Growth of Functions" and the title "Algorithms" was changed to "Algorithms and Their Analysis." New content was inserted into each of the chapters and existing content was revised.
    Note that algorithms and their analysis are not mentioned explicitly as topics to be included in the ACM/IEEE-CS/AAAI and CCC C-ID courses, but these topics fit naturally as a motivation to learn much of the other content of the Remix.

  • The original chapter "Graph Theory" was split into two chapters, "Graphs" and "Trees". Additional content will be introduced into each of the new chapters.

1. Introducing Discrete Mathematics

This chapter was last updated on January 30, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Welcome to the Remix! I hope this textbook provides you an opportunity for a stimulating and intellectually enjoyable learning experience.

Mathematics is one of human civilization’s greatest tools: It involves pattern noticing, collecting, comparing, counting, generalizing, formalizing, and abstracting. The development of mathematics is a continuing work that spans at least 5,000 years and many different peoples and cultures. This development will continue long after you and I are gone - never forget that we are living during an era that will be someone else’s Ancient History!

1.1. What is "Discrete Mathematics"?

There seems to be no universally agreed-upon definition of "discrete mathematics," but I will describe to you my understanding of what the phrase "discrete mathematics" means.

Discrete mathematics is the mathematics people use to study and understand structures that are built from individual objects in a way that the individual objects can still be treated as separate from one another within the structure. In such a structure, the individual objects can be put into categories and counted; it makes sense to ask questions like "What is the next object in the structure after this one?" or "Which other objects are the closest to this one?" (where "closest" could refer to physical distance or could mean most similar in color or size) or "How many objects in the structure have a certain property?" Discrete mathematics is a collection of tools that can be used to answer these kind of questions.

Vintage Stone And Brick Wall

As an example, consider a wall, a structure built from individual stones and bricks, part of which is shown in the image. The individual objects in the structure are the stones and bricks. You can still identify individual objects even though they were combined to build the larger structure, and can classify an individual object by type (either stone or brick) or by color or by the how close to the top of the wall the object is. You can try to count the total number of individual objects, and you can count the number of individual objects that are next to any one object you choose. The wall is a (non-mathematical) example of a discrete structure.
Image credit: "Vintage Stone And Brick Wall" by Paul Brennan. The image is dedicated to the public domain under CC0.

Here are two more examples of discrete structures.

  • A family of humans can be treated as a discrete structure.
    The humans in a family are seen as individuals who can be distinguished from one another. Questions like "Which humans are siblings of this human?" and "Which humans are parents of this human?" and "How many children does this human have?" make sense.

  • The set of integers, in the usual order, as represented on a number line is a discrete structure.
    It makes sense to ask questions like "What is the next integer after -2?" or "What are the integers that are closest to -2?" for this structure.

Notice that in these examples, the individual objects are of a different nature than the entire structure: Individual bricks and stones aren’t considered a wall, individual humans aren’t considered a family, and individual integers are not a collection of integers.

So what kind of structure is not discrete?

Water Glass

Consider the water in the glass shown in the image. The water in the glass is a structure, but for most of human history it has not been seen as made up of objects that are of a different nature than the structure. That is, many generations of humans have recognized the difference between a wall and the individual stones and bricks that make up the wall, but likely have perceived water as being made up of…​ water. This is why humans tend to use measurement of quantities of water, using units such as fluid ounces or milliliters. In our current era, we humans understand that the water is built from molecules, which are of a different nature than the water "structure," but because the molecules are so tiny, numerous, and densely distributed throughout the structure, we humans (except perhaps some scientists and engineers) still use measurement instead of counting and ignore the individual molecules. For example, a recipe might call for "8 ounces of water" but never would ask for "7.6 × 1024 molecules of water." Notice that the measurement units used do not correspond in any natural way to individual molecules or groupings of individual molecules (unless, perhaps, you are a scientist or engineer.)
Also, notice that the glass container itself is another structure in which the individual molecules tend to be ignored.
Image credit: "Water Glass" by Peter Griffin. The image is dedicated to the public domain under CC0.

Humans use continuous mathematics like calculus to study a structure that is built from objects that are densely distributed throughout the structure. For such a structure, measurement and approximation is more appropriate than counting.

  • The set of real numbers, in the usual order, as represented on a number line is NOT a discrete structure.
    It does NOT makes sense to ask about "the next real number after \(\pi\) on the number line" because if we think c is "the next real number after \(\pi\) on the number line" then we can compute the number \(c_{1} = \frac{c+\pi}{2},\) which is the midpoint of the interval with endpoints \(\pi\) and c, so \(c_{1}\) is closer to \(\pi\) than c is, which means \(c_{1}\) is a better candidate for "the next real number after \(\pi\) on the number line" than c … so the concept "the next real number after \(\pi\) on the number line" does not make sense for this structure as such a number cannot exist! We can just keep computing numbers that get closer and closer to \(\pi\) over and over again. Likewise, "the real numbers that are closest to \(\pi\)" do not exist.
    Instead, it makes sense to talk about "the real numbers that differ from \(\pi\) by less than \(\epsilon\)" where \(\epsilon\) is some positive real number (The symbol \(\epsilon\) is the Greek letter "epsilon".) By choosing \(\epsilon\) as small as we like, we can describe the real numbers that are as close to \(\pi\) as needed to use as approximations to \(\pi.\) This is why and how limits are defined and used in courses like precalculus and calculus, subjects that involve the real number line.
    Note 1: You could use any other real number instead of \(\pi\) in the discussion above because the argument will still be valid. For example, "the next real number after \(-2\) on the number line" and "the real numbers that are closest to \(-2\) on the number line" do not exist since you can choose numbers that get closer and closer to \(-2\) such as \(-1,\) \(-1.5,\) \(-1.75,\) and so on.
    Note 2: The technique used to justify that "the next real number after \(\pi\) on the number line" does not exist is called proof by contradiction and will be discussed along with other techniques in the Proofs: Basic Techniques chapter of this textbook.
    Note 3: Is \(\pi\) really on the number line? Click here to view an artist’s explanation about where \(\pi\) lies on the number line. Be warned that some of what this artist says (about history, about \(\pi\) being equal to 3.14) is not correct, but the visualization still may be helpful to you.
    Note 4: FYI, the set of real numbers is called the continuum in advanced mathematics courses.

1.2. To The Student: Some Things To Know Before You Begin

Here are some things to orient you.

1.2.1. How To Use This textbook

The Remix is designed to build on your previous knowledge, and then build new knowledge and understanding by visiting the topics over and over again. In the next subsection, the basis of foundational mathematical ideas are discussed. You are encouraged to read through all of those foundational topics and to work through the Questions and Challenges.

There are two analogies that I, the remixer, like to use here:

  • Think of the course presented in the Remix as a language course.
    You will use this language to talk about mathematics and computer science as you continue along your professional path. You cannot master a new language by learning some words or grammar in the first few weeks of a language course and then "forgetting" that content later, but still succeed in the course and master the language to use beyond the course. You need to assume that everything you learn in this textbook will apply later in the textbook and in your later learning. One of the goals of this textbook is to help you build a broad and rich vocabulary in discrete mathematics and a way of thinking that will apply to your future work.

  • Think of the course presented in the Remix as exercising at a gym.
    You will build your strength and awareness about your abilities by working out. It may be tempting to watch others "demonstrate" how to do certain exercises and then choose not to do them yourself, but you are selling yourself short. A physical trainer usually knows already what they are capable of doing, so it is no compliment if you tell them "Wow, you’re really strong"…​ the trainer’s goal is to get you to say "Wow, I’m starting to get really strong!"

1.2.2. Foundations

Learning discrete mathematics requires putting together old ideas in new ways and adding new ideas to your mix.

This subsection highlights some of the ideas you will need to work with in order to get the most out of this textbook. Also, you’ll find opportunities here to practice with the built-in tools that are part of the textbook. Some of these ideas may be old for you, but others will probably be new. To use this textbook effectively, you’ll need to be able to work with each of these ideas with relative ease. If a few of these ideas are brand new to you, that is fine: All of these ideas will be discussed again, much more formally and in greater detail, later in the textbook.

I encourage you to read all of the topics discussed below. Don’t skip anything, even if it looks "old" because there may be some new ways of understanding the old ideas that are introduced and that will be used later in the course.

  • A set is an unordered collection of zero or more objects. You can think of a set as a list of the names of the objects included, but we do not care about the order of the names in the list and we do not care if the list contains duplicate names.

    An example is the set of the names of the additive primary colors of light which can be written as \[\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\] which contains exactly three elements: "blue", "green", and "red." It does not matter that "red" appears twice, and it does not matter that the order of the colors in the list "blue", "green", and "red" is different from the order used in the previous set notation. We could define the same set with any other list that contains the same three elements, for example, \(\{ \text{"green"},\,\text{"red"},\,\text{"blue"} \}.\)

    As you will see in the Set Theory chapter, it is common practice to use uppercase English letters to stand for sets; this is similar to how lowercase letters are used as variables or constants in algebra. For example, you could write \[P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\] which allows you to refer to the set as P instead of needing to read the list of elements every time you want to talk about the set. This is like using the Greek letter π instead of needing to read off the first few digits of the non-repeating decimal expansion 3.14159265359… every time you refer to that number.

    • NOTE 1: In mathematics, a set like \(P,\) that is, \(\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\) is treated as a constant, so you cannot remove elements or insert new elements once a set has been defined and described. As an example, if you wanted to insert the name "white" into the set, you would need to define a new set \[ C = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"},\,\text{"white"} \}\] that contains four elements by appending "white" to the list used to define the old set P.

    • NOTE 2:

      • One way of creating a new set from two other sets is to form the union of the two sets. The union of two sets is formed by joining the sets.
        For example, If \(P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\) and \(T = \{ \text{"yellow"},\,\text{"red"},\,\text{"blue"} \}\), then the union of the two sets is \(\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"}, \, \text{"yellow"},\,\text{"red"},\,\text{"blue"} \},\) the set that contains the colors that are in at least one of the sets \(P\) and \(T.\)

      • Another way of creating a new set from two other sets is to form the intersection of the two sets. The intersection is the meeting of the two sets.
        For example, If \(P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\) and \(T = \{ \text{"yellow"},\,\text{"red"},\,\text{"blue"} \}\), then the intersection of the two sets is \(\{ \text{"red"},\,\text{"blue"} \},\) the set that contains the colors that are in both of the sets \(P\) and \(T.\)
        Likewise, if \(S = \{ \text{"yellow"},\,\text{"cyan"},\,\text{"magenta"} \},\) the set of additive secondary colors, then the intersection of \(P\) and \(S\) is the set \(\{ \},\) the set that contains zero elements. We call the set \(\{ \}\) the empty set.

    • NOTE 3: We can define a subset of any set by selecting zero or more of the members of the set. For example, \(\{ \text{"green"},\,\text{"blue"} \}\) and \(\{ \text{"green"} \}\) are subsets of \(\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \},\) and so is \(\{ \},\) the empty set.

    • NOTE 4: A set can also be defined by a property instead of a list. This will be explained in the Set Theory chapter.

  • A one-to-one correspondence between two sets A and B is a pairing of each member of the set A with exactly one member of set B, and each member of set B with exactly one member of set A.

Examples - One-To-One Correspondences

As a first example, the one-to-one correspondence between the set of uppercase English letters and the set of lowercase English letters is represented in the image. You can choose each letter in the upper row and follow the arrow down to the corresponding lowercase letter that it is paired with.

UpperLowerOneToOneCorrV3

Notice that you could choose any lowercase letter in the image and follow the arrow up to its corresponding uppercase letter. This shows that the one-to-one correspondence is invertible in the sense that there is a second one-to-one correspondence that can "undo" what the first one-to-one correspondence does. This is why the arrows in the image point in both directions: It is natural to interpret the arrow as meaning "change case" in either direction.
In fact, the image for the one-to-one correspondence actually represents two functions: The first function is the uppercase-to-lowercase conversion that uses only the down arrows, and the second function lowercase-to-uppercase conversion that uses only the up arrows.

THE_GAME_OF_KEN_derivative_v3

As a second example, consider the game "rock, paper, scissors." The image represents the one-to-one correspondence that pairs each element of the set \(\{ \text{"rock"},\,\text{"paper"},\,\text{"scissors"} \}\) with the element of the same set that it loses to.
Image credit: Remixer-created derivative of original work "THE_GAME_OF_KEN.(1910)-illustration-_page_315.png". According to Japanese Copyright Law (June 1, 2018 grant) the copyright on the original work has expired and is as such public domain. Also, the original work is in the public domain in the United States because it was published (or registered with the U.S. Copyright Office) before January 1, 1930.

For this one-to-one correspondence, it makes more sense to use arrows that point in only one direction. In the image, all the arrows point to the winner, so each arrow can be read as "loses to," for example "rock loses to paper".
Note 1: A one-to-one correspondence between a set and itself is called a permutation. You will see the term "permutation" again, used with a related but different meaning, in the Counting: Permutations and Combinations chapter.
Note 2: "Rock, paper, scissors" is a modern form of a very old game that appears to date back at least 1,800 years to China during the time of the Han dynasty. You can learn more about the history of "rock, paper, scissors" and other intrasitive games at this Wikipedia link. The image used in the Remix is derived from one that appears in a book published in 1910 that describes the Japanese game stone-ken or jan-ken.

  • A natural number is one of the nonnegative integers. The letter \(\mathbb{N}\) denotes the set of natural numbers, that is, \[\mathbb{N} = \{ 0,\,1,\,2,\,3,\,\ldots \}\] where the three dots are used because we cannot write down the entire list of natural numbers.

    • NOTE: We almost always use the base-ten (decimal) place-value system to represent natural numbers, but later in this textbook you will see that there are other number base systems such as the base-two (binary) place-value system, that are useful in some contexts.

    • WARNING: The definition of natural numbers used in this textbook is an ISO standard, but be aware that other textbooks and sources may use the "nonstandard" definition that the natural numbers are the positive integers. In this textbook, the set of natural numbers ALWAYS include zero as well as the positive integers, which is the standard definition.

Example - Starting at 0

Welcome to your first opportunity to use PythonTutor in this textbook!

The idea that it is "natural" to start counting from 0 may be familiar to you from coding: Both Java arrays and Python lists are indexed starting at 0.

Notice in the Python code below that the initial item in list L is the string "Discrete" at index 0, not index 1.

To step through the code, click on the "Next" button.

FingersToNumerals
  • A set is called a finite set if either the set is empty or there is a one-to-one correspondence between the set and \(\{1, 2, \ldots , n \}\) for some positive integer \(n.\) For example, the image represents how a child may "count up to 5" by pairing fingers with the numbers in \(\{1, 2, \ldots , 5 \}.\)
    A set that is not a finite set is called an infinite set. Both finite and infinite sets will be discussed in more detail in the Set Theory chapter.

  • You may be surprised to see Counting listed as a topic in a university-level course because you probably learned to count by putting physical objects (like fingers) into a one-to-one correspondence with number words such as "one, two, three, four, five" when you were a young child. That way of counting one by one is fine for small sets, but the counting techniques discussed in this textbook let you count the number of elements in very large finite sets quickly.

Example - The Multiplication Principle

The multiplication principle, also called the product rule, is the following statement.

Suppose that we have a procedure that consists of completing two steps, that there are k possible ways to complete the first step, and for each possible way of completing the first step, there are m possible ways to complete the second step. Then there are k⋅m ways to complete the procedure.

As an example, the number of possible strings of 2 characters with the first character being one of the 26 uppercase English letters and second character being one of the 10 Hindu-Arabic numerals is 260 = 26⋅10.

MultRuleExampleCropped

In this case, 26 and 10 are small integers so you can list all possibilities by arranging them in a table as shown in the image. What is important to notice is that this same technique (multiplying k and m) will work for much larger values of k and m, when creating such a table may be either not helpful or impossible.

Challenge
Suppose that a U. S. state uses passenger vehicle license plate numbers that are 7 characters long, where the first character is one of the 9 nonzero Hindu-Arabic numerals, the second, third, and fourth characters are uppercase English letters, and the fifth, sixth, and seventh characters are Hindu-Arabic numerals. How many different possible passenger vehicle license plate numbers can the state create?

Click on "Hint," "Another challenge," and "One last challenge" to reveal the hidden text.
Hint
You can apply the multiplication rule more than once if the procedure you are trying to complete requires more than two steps.
Another challenge
The previous challenge was a bit simplified. In fact, the state places additional restrictions on the characters. To avoid confusing "0" with "O" or "Q", and "1" with "I", the state does not allow "O", "Q", and "I" as the second or fourth character on the license plate. With that change, how many different possible passenger vehicle license plate numbers can the state create?
Hint
You can still apply the multiplication rule more than once to answer this challenge question.
One last challenge
Both of the previous challenges were simplified. In fact, in addition to the restrictions on "O", "Q", and "I," the state does not use 1SWD000 through 1TZZ999, 1WAA000 through 1YZZ999, strings whose first four characters are 1UAA through 1VZZ, 1ZZA through 1ZZZ, or 3ZZA through 3ZZG for its passenger vehicle license plate numbers.
With these additional restrictions, how many different possible passenger vehicle license plate numbers can the state create?
Hint
You may want to use some addition and/or subtraction as well as the multiplication rule.
  • The Pigeonhole Principle states that if \(n\) is a positive integer and \(n+1\) objects are going to be assigned to \(n\) categories then one of the categories must be assigned at least 2 of the objects. Click here to see a commonly-used photoshopped image that illustrates this principle.

  • A function from set D to set C is a rule that assigns to each element in D (that is, to each input value) exactly one output value from C. We also say each input value is mapped to the output value. A much more formal definition will be given in the Functions chapter of the textbook, but for now it is enough to understand functions in this way.

    • The rule may be represented by a mathematical equation, a verbal description, a table of paired values, a plot of points, … or even code 😎!

    • The set D of all input values is called the domain of the function.

    • The set C that contains the output values is called the codomain of the function.

    • The range of the function is the set that contains only the output values and no other elements. The range is always a subset of the codomain C, but may not contain every element of C. That is, some elements of C may not be outputs for the function. It is often important to distinguish the range from the codomain; this is discussed in detail in the Functions chapter.

Examples - Functions

Here are some examples of functions with their rules, domains, codomains, and ranges.

  • Any one-to-one correspondence between two sets A and B can be viewed as a function \(f\) from A to B.

    • The rule for \(f\) is given by the pairing of elements in A with elements in B.

    • The domain of \(f\) is the set A.

    • The codomain of \(f\) is the set B.

    • The range of \(f\) is the set B, too, because every element in B is paired with an element in A by the one-to-one correspondence.

      Notice that the one-to-one correspondence can be used to define another function g from B to A with domain B, codomain A, range A, and rule given by the pairing of elements in B with elements in A. The two functions \(f\) and g are called inverse functions because \(f\) and g "invert" or "undo" each other. Inverse functions will be discussed in more detail in the Functions chapter.

  • The floor of x, \(\lfloor x \rfloor,\) and the ceiling of x, \(\lceil x \rceil,\) are two functions defined for all real numbers as follows:

    • \(\lfloor x \rfloor\) is the greatest integer less than or equal to x. For example, \(\lfloor -1.5 \rfloor = -2\) and \(\lfloor 6.3 \rfloor = 6\)

    • \(\lceil x \rceil\) is the least integer less than or equal to x. For example, \(\lceil -1.5 \rceil = -1\) and \(\lceil 6.3 \rceil = 7\)

    • The domain of both the floor and ceiling functions is the set of all real numbers, \(\mathbb{R}.\)

    • The range of both the floor and ceiling functions is the set of all integers, \(\mathbb{Z}\) (NOTE: The German word for "numbers" is Zahlen, which is why the letter \(\mathbb{Z}\) is used for the integers.)

    • The codomain of these two functions can be chosen depending on the context. If we only need to consider output values we could choose the codomain to be the same set as the range, \(\mathbb{Z},\) but if we want to plot these two functions in the xy coordinate plane we would choose the codomain to be the set of real numbers, \(\mathbb{R}.\) This is discussed in the Functions chapter.

      The floor and ceiling functions are discussed in more detail in Appendix: Library of Functions.

ListAsFunction
  • In Python, a list can be used to represent a function with inputs the valid integer indices for the list and outputs the values stored in the list. In an earlier example, we defined list L to be ["Discrete", "Mathematics"] and then evaluated L[0] and L[1] to access the strings stored in the list.
    Note for programmers: In reality, the items in the list are references to objects that implement the two strings. That is, the list items are neither the strings nor the objects that implement strings but references to those objects, which are located elsewhere in memory.

  • The rule "Return the first character of the input Unicode string."

    • The domain is the set of all strings of length greater than or equal to 1 that contain Unicode characters.

    • The range is the set of all Unicode characters.

    • We would need to decide what the codomain should be in this context: It could be the same as the range, or we could use the larger set of all strings that contain Unicode characters (including the empty string "".)

  • \(f(x) = x^{2}\) from the set of real numbers to the set of real numbers.

    • The domain and codomain are both the set of real numbers.

    • The range is the set of nonnegative real numbers.

  • \(g(x,\,y) = xy + y\) where x and y are real numbers.

    • The domain is the set of ordered pairs of real numbers.

    • The range is the set of all real numbers. The codomain is the same as the range.

  • See the Appendix: Library of Functions for other functions you should be familiar with.

Example - Functions

The code below shows two functions that are user-defined in Python.

Click on the "Next" button to step through the code.

Here is another example.

Example - Functions and Data Types

What do you get when you "add" a Python object to itself?

Note that the answer depends on the object’s data type.

Click on the "Next" button to step through the code.

  • A sequence is a function from the natural numbers, or a subset of the natural numbers, into another set (e.g., the natural numbers, or the real numbers, or a nonnumerical set.) For example, we can define a sequence by the rule \[a_{i} = 2i+1 \text{ for every natural number } i \] which describes the sequence of positive odd integers \(a_{0} = 1,\) \(a_{1} = 3,\) \(a_{2} = 5,\) and so on.

    • NOTE 1: It is common to use i as a variable in sequence notation because i is the initial letter of the word index. This i has nothing to do with the complex number \(\sqrt{-1}.\) Mathematicians recycle and reuse letters!

    • NOTE 2: It is traditional to write the input variable for a sequence as a subscript instead of putting it between parentheses. In the preceding example, \(a_{i} = 2i+1\) has the same meaning as \(a(i) = 2i+1\).

    • NOTE 3: The output values of a sequence are called terms. For example, the 0th term of the sequence is \(a_{0} = 1,\) the 1st term of the sequence is \(a_{1} = 3,\) and so on.

NumeralsToFingersFunction
  • A finite sequence is a sequence that is defined for only a finite subset of \(\mathbb{N}.\) That is, the set of input \(i\) values that make sense for the sequence is a finite set.
    For example, a child counting up to five on the fingers of one hand is defining the sequence called \(\textit{fingers}_{i}\) that is represented by the image.
    Technically, the sequence \(\textit{fingers}_{i}\) shown in the image is the inverse of the child’s actual counting sequence. Because the child assigns to each finger "input" exactly one number "output," the arrows would point up from a finger to the corresponding number.

    The sequence \(\textit{fingers}_{i}\) can be written, formally, as \begin{equation} \begin{aligned} \textit{fingers}_{1} {} & = \text{"Thumb"} \\ \textit{fingers}_{2} {} & = \text{"Index Finger"} \\ \textit{fingers}_{3} {} & = \text{"Middle Finger"} \\ \textit{fingers}_{4} {} & = \text{"Ring Finger"} \\ \textit{fingers}_{5} {} & = \text{"Pinky Finger"} \\ \end{aligned} \end{equation} but it is much more common to list the terms of the sequence in order: "Thumb," "Index Finger," "Middle Finger," "Ring Finger," "Pinky Finger."

    Notice that Java arrays and Python lists are implementations of the mathematical concept of a finite sequence where the domain is the set of \(i\) values \(\{0, 1, \ldots , n \}\) for some natural number \(n.\)

    An infinite sequence is a sequence that is not a finite sequence, that is, the there are infinitely many \(i\) values that make sense as inputs for the sequence. For example the sequence \[ \text{isOdd}_{i} = \begin{cases} \text{1} & \text{ if } n \text{ is odd} \\ \text{0} & \text{ if } n \text{ is even} \\ \end{cases} \] is an infinite sequence with domain the integers that has only two output values.

  • A bitstring is a finite sequence of the bits 0 and 1. Bitstrings are written as a string of 0s and 1s without spaces or commas between the terms of the sequence; for example, 01101011 is a bitstring of length 8. Bitstrings can be used to represent a sequence of answers to "Yes-No" or "True-False" questions, with "1" representing "Yes" or "True" and "0" representing "No" or "False." Bitstrings can also be used to represent numbers in binary notation, which will be discussed in the Number Bases chapter.

  • Summation notation is a "shortcut" used to abbreviate a sum of a finite sequence of numbers, called addends, when the sequence contains a large number of addends.
    As an example, the sum \(1+3+5+7+9+11+13+15+17+19\) is abbreviated as \(\sum\limits_{i=0}^{9}(2i+1)\).
    As another example, the sum of the first \(500\) positive odd integers, \(1+3+5+\ldots+995+997+999\), is abbreviated as \(\sum\limits_{i=0}^{499}(2i+1)\).

    • NOTE 1: The variable i used in summation notation is called the index of summation and the symbol \(\sum\) is the capital Greek letter "sigma." To compute the value of the sum, you generate the sequence of addends by substituting each integer value, starting with the lower limit written below the sigma and stopping at the upper limit written above the sigma, for i into the algebraic expression or function written to the right of sigma, then find the sum of all the numbers in the sequence.

    • NOTE 2: "Infinite sums," more properly called infinite series, are not discussed in the Remix. The sum of an infinite series is defined as the limit of its sequence of partial (finite) sums, and "limits" is a topic from continuous mathematics, not discrete mathematics.
      Another use of infinite sum notation is to represent the generating function of a sequence, which is discussed in some discrete mathematics textbooks but not in the Remix. If you want to learn about generating functions, you can read about them in Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

  • Recursion is a process that defines an object, or computes a value, or describes the construction of an object or set of objects, using steps that refer to one or more previously completed steps.

Example - A Recursively-Defined Function

In this example, a Python function is defined recursively. The function takes any natural number input n (represented as an int in Python) and returns a value that we claim is \(5^n\).

Click on the "Next" button to step through the code.

Notice that each time the loop executes, a new instance of the function is created.

Later in the textbook, you will be able to prove that the power_of_5 function must return \(5^n\) for every natural number input n using a proof technique called mathematical induction.

  • Recurrence relations consist of one or more equations that define a sequence or a function with domain \(\mathbb{N}.\)

Examples - Recurrence Relations

The following examples show how to define a sequence from \(\mathbb{N}\) to \(\mathbb{N}\) using recursion. Notice that for each of the sequences we can compute the output value corresponding to any input value by repeatedly using the recurrence that relates a term to its preceding term in the sequence.

  • \(b_{0} = 3\) and \(b_{i+1} = b_{i} + 2\) for all natural numbers i.

  • \(c_{0} = 3\) and \(c_{i+1} = 2 c_{i}\) for all natural numbers i.

As an example, we can use the recurrence relations to compute \(b_{1}\) and \(c_{1}\) as follows. \[b_{1} = b_{0} + 2 = 3 + 2 = 5\] \[c_{1} = 2 c_{0} = 2 \cdot 3 = 6\]

Challenge
A closed form for a sequence is a formula that lets you find the value of any term of the sequence by computing directly with the index i. In an earlier example, we had a sequence defined by the closed form \(a_{i} = 2i+1\) for every natural number i: You can compute any term of the sequence by substituting directly a natural number value for the index i into the closed form, for example, \(a_{8} = 2 \cdot 8+1 = 17.\)

The challenge is to find closed forms for the two sequences in this example.

Use the following steps.

First, make a table of values that shows the value \(i_,\) \(b_{i},\) and \(c_{i},\) for each natural number i that is less than or equal to 8.

Secondly, make a conjecture (that is, a guess based on the values in the table) for the closed forms of the two sequences.

Thirdly, verify that the conjectured closed forms give the correct results for each of the natural numbers i that is less than or equal to 8. Notice that this does not show that the closed forms are correct for much larger natural number values for i such as 100 or 1,000,000. A method for validating the closed form for all natural numbers i will be introduced in the Proofs: Mathematical Induction chapter.
Hint
Look for patterns in the numbers in the table of values you made.
Help!
You may want to review arithmetic sequences and geometric sequences here.
  • In English, there are four types of sentences, depending on what is being communicated: statements (or declarative sentences), commands, exclamations, and questions. A proposition is a statement that declares a fact that is either True or False (but not both!) In mathematics, we are usually most interested in analyzing and verifying propositions.

  • A predicate is an incomplete proposition that contains one or more variables that need to be filled in to complete the proposition. One example of a predicate is "My major is \(\rule{12mm}{.5pt}\)." Notice that this becomes a proposition once the blank, which represents the variable in this case, has been filled in.
    Another example of a predicate is "The positive integers m and n are prime numbers." Again, this becomes a proposition once values are substituted for the two variables.
    In this textbook, predicates will often be written in a way similar to functions: \[ P(m, n) = \text{"The positive integers } m \text{ and } n \text{ are prime numbers."} \] Notice that the output of the predicate is a statement but the output does not tell us whether the statement is True or False - think of this like a programmer: The return value is a string, not a Boolean.
    Two predicates are equivalent if for every possible substitution for the variables, the statement produced by the first predicate is true if and only if the statement produced by the second predicate is true.

  • An algorithm is a finite sequence of commands and statements that describe a process for completing a task.
    One example is the following (correct but inefficient) algorithm for division of positive integers.

    • Task: Given two positive integers a and b, compute the quotient q and remainder r so that
      \(a = q \cdot b + r\) and \(0 \leq r < b.\)

    • Input: Two positive integers a and b

    • Steps:

      1. Set r equal to a and set q equal to 0.

      2. If r is greater than or equal to b

        1. set r equal to r - b

        2. add 1 to q

      3. If r is greater than or equal to b then repeat step 2

    • Output: Integers q and r such that both \(a = q \cdot b + r\) and \(0 \leq r < b.\)

      • q is the quotient, that is, the number of times each of the two assignments under step 2 was executed.

      • r is the remainder, that is, the result of the last execution of step 2, so \(r = a - q \cdot b.\)

Example - Division of Integers by Repeated Subtraction

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

Notice that each time the loop executes, the code prints an equation that shows that a is the sum of a whole number times b and a remainder r. The loop terminates when we compute a value for the remainder r that is both less than b and greater than or equal to 0.

Question
In the code, \(a = 13\) and \(b=3.\) How many times does the block of code within the loop execute?
Hint
You can answer this by stepping through the code using the "Next" button.
Question
How many times does the block of code within the loop execute if \(a = 13\) and \(b=6?\) If \(a = 13\) and \(b=9?\)
Hint
You can answer this by editing the code, changing the value of b, and stepping through the code using the "Next" button.
Question
In the code suppose that \(a = 13\) and that \(b\) can be assigned any positive integer value that is less than or equal to 13. Let’s say that the worst-case behavior for inputs of the form \((13, \, b)\) is the maximum number of executions of the block of code within the loop that occurs for one of these inputs. What value(s) of b correspond to the worst-case behavior, that is, what value(s) of b correspond to the maximum number of executions of the block of code within the loop for all inputs of the form \((13, \, b)\) where b is a positive integer value that is less than or equal to 13?
Hint
You could answer this by editing the code, changing the value of b to values other than 3, and stepping through the code using the "Next" button. BUT, it may be faster if you use reasoning about the value(s) of b instead.
Question
How many times does the block of code within the loop execute if \(a = 130\) and \(b=3?\) If \(a = 299\) and \(b=3?\)
Hint
You can answer this by editing the code, changing the value of a, and stepping through the code using the "Next" button. BUT, it may be faster if you use reasoning about the value(s) of a instead.
Challenge
Now suppose that, in the code, the ordered pair \((a, \, b)\) of variables can be assigned any ordered pair of positive integer values with a greater than or equal to b. Find a formula for a worst-case complexity function \(W(a)\) that assigns to each positive integer input a the output that is the maximum number of executions of the block of code within the loop for all positive integer pairs \((a, \, b).\)
Hint
Try to form a conjecture by editing the code, changing the value of a and then holding a constant while using various values of b that are less than or equal to a, then stepping through the code using the "Next" button. Refer back to your answers to the previous questions, too.
Challenge
The algorithm implemented in the code is correct but not very efficient. You probably learned how to do division by hand in elementary or middle school. Use your knowledge of how to do division by hand to (1) change the Python code to be more efficient (and still correct!) then (2) determine the worst-case behavior as the maximum number of times the block of code within the loop will execute, in terms of the variables a and/or b with a greater than or equal to b.
Challenge
For any integer a and any positive integer b, we can compute integers q and r so that both \(a = q \cdot b + r\) and \(0 \leq r < b.\) What changes are needed to the algorithm to compute q and r correctly if a is zero or negative?
Hint
Consider what changes are needed to the loop condition and the computations within the loop. Use the print statements to help you see what changes are needed.
  • Based on the case \(b=2\) in the division algorithm discussed above (and the Challenge), every integer a can be written in the form \(a = q \cdot 2 + r\) where q is an integer and \(0 \leq r < 2.\) The integer a is even if \(r=0\) and is odd if \(r=1.\) So we have a precise formal way of understanding and discussing odd and even integers - this may seem unnecessary (or even completely silly), but as you continue reading this textbook you will see that precise formal definitions and descriptions are useful when you need either to justify that certain statements are true or to validate that certain processes always produce correct and expected results.

  • Suppose that \(a\) and \(b\) are integers, which could be positive or negative or zero. The integer b is called a factor of a (or divisor of a), and a is called a multiple of b if \(a = q \cdot b\) for some integer q. For example, 2 is a factor of 10, and 10 is a multiple of 2, because \(10 = 5 \cdot 2.\) As another example, \(-2\) is a factor of 10, and 10 is a multiple of \(-2\) because \(10 = (-5) \cdot (-2).\)

  • A positive integer n that is greater than 1 is called prime if the only positive integer factors of n are 1 and n itself, and is called composite otherwise. For example, 2 is prime since its set of positive integer factors is \(\{ 1,\,2 \}\), but 6 is composite since its set of positive integer factors is \(\{ 1,\,2,\,3,\,6 \}\).

    • NOTE: The integer 1 is considered neither prime nor composite. The reason for this is beyond the scope of this textbook but would be discussed in a more advanced math course in ring theory.

  • Two integers a and b are called relatively prime if the only common positive integer factor of a and b is 1; this is equaivalent to stating that the two integers do not share any prime factors. For example, 10 and 21 are relatively prime integers.

  • A relation on the sets A and B is an association between elements from set A and set B; A and B are often the same set. Relations will be defined much more formally and precisely in the Relations chapter of the textbook.
    Here are some examples of relations:

    • The ordering relation "is less than," \(x < y,\) for real numbers x and y. So \(3 < 4\) but \(5 \nless 4.\) The slash through the "<" symbol means that "5 is not related to 4" in the way we want.
      The orderings \(>\), \(\geq\), and \(\leq\) are also examples of relations.

    • The equality relation \(s=t\) for any elements s and t of the same set A.
      A related example is inequality, \(s \neq t.\)

    • The divisibility relation "a is a divisor of b" (or "a divides b") for integers a and b; this is sometimes written as \(a \mid b.\) So for example, \(2 \mid 4\) but \(3 \nmid 4.\)

    • For two integers a and b, we say that "a has the same parity as b" if either both a and b are odd or both a and b are even.

    • Any function \(f\) with domain A and codomain B is a relation since the function associates each element a in A with exactly one element b of B, namely \(b = f(a)\).

    • A relation can also involve more than two sets. As an example, imagine a database of records that has three fields: a student’s name, a student’s college identification number, and the student’s major. The database can be viewed as a set R of ordered triples. So, for example, if a student named Chris Garcia has identification number 900123001 and is a Computer Science major, the set R would contain as an element the ordered triple ("Chris Garcia", 900123001, "Computer Science").

  • A graph is a mathematical object that consists of vertices (also called nodes) that are connected by edges. Graphs are often represented by drawings like the ones shown in the following examples, but you can also represent a graph in other ways that are easier and more efficient to use in code; this will be discussed in the Graphschapter.

    The drawing of a graph is not treated like a geometric polygon: The only two points "on an edge" are the edge’s endpoints. Edges are just connectors between vertices and points that are not indicated as endpoints of an edge are ignored. Also, in a drawing of a graph, the lengths of edges and the straightness or curvedness of edges are not important, just the connections between the edges' endpoints.

Some high school-level textbooks use the term vertex-edge graph to distinguish this type of graph from graphs (that is, plots of points in the \(xy\)-plane) for equations, functions, or statistical data.
Example - Two Drawings of One Graph

Keep in mind that a graph is NOT the same as a drawing of the graph. In fact, a graph can usually be drawn in many different ways that may look very different. What is important is the connections, represented by the edges, between pairs of vertices.

Isomorphism2av2notComplete

In the image, two different drawings are shown for the same graph. Notice that in each drawing, the connections between pairs of vertices are the same: The only pair of vertices that is not connected by an edge is \(\{ C, D \}.\)

Also notice that there is no vertex drawn where the two edges cross in the 1st drawing on the left, so this graph has 4 exactly vertices: \(A,\) \(B,\) \(C,\) and \(D.\)

Example - A Network Of Students

A graph can represent relationships between pairs of people.

GraphWithSevenNodesv2

Here is a graph that represents whether pairs of students are enrolled in at least one class together. Each of the 7 vertices represents a student, and each of the 7 edges represents a pair of students who are enrolled in a class together. The graph indicates that Adil and Elias are enrolled in at least one class together and that Elias and Maya are enrolled in at least one class together.

Question
Are all three of the students Adil, Elias, and Maya enrolled in at least one class together?
Hint
Two students are enrolled in at least one class together if and only if there is an edge connecting the vertices labeled by the two students' names.
Question
Are all three of the students Sofia, Elias, and Jun enrolled in at least one class together?
Hint
Can you imagine two different scenarios, one where the answer is "Yes" and another where the answer is "No"?
Question
The vertex for Li has degree 2 because it occurs as an endpoint of an edge 2 times. Can you determine how many classes Li is enrolled in from the graph?
Hint
Remember that an edge indicates that the pair of students at the endpoints are enrolled in at least one class together.
Question
Chris is represented by an isolated vertex that is not the endpoint of any edge (so that vertex has degree 0.) Does this mean that Chris is enrolled only in Independent Studies classes with no other students?
Hint
Think carefully about what an edge represents in this graph.
Examples - Complete Graphs and Star Graphs

Here are some other examples of graphs.

KompletGraphOn4Vertices

A complete graph is a graph in which every pair of distinct vertices are the endpoints of exactly one edge. The image shows the complete graph on 4 vertices. Notice that two edges appear to "intersect" but there is no vertex drawn where the edges cross, so these edges do not have any points in common - as stated above, an edge contains only its endpoints.

StarGraphOn6Vertices

A star graph is a graph that has one central vertex that is one of the endpoints of every edge in the graph. The image shows the star graph on 6 vertices. The star graph is one example of a tree, a graph in which for every pair of distinct vertices there is exactly one path of edges that can be used to connect the vertices. Some of the many applications of trees in computer science will be discussed in the Trees chapter.

The design of this book is to introduce each concept informally, as was done for the preceding foundational ideas, then notice properties and patterns, generalize from what has been noticed, and formalize the ideas to prepare for even deeper analysis.

And congratulations if you read through all of those foundational mathematical ideas in this subsection and worked through all the Questions and Challenges! If you compare the list of ideas to the Table of Contents, you will see that you have touched on every one of the topics that will be discussed in this textbook!

1.2.3. On-Demand Math Resources and Library Of Functions Appendices

Two appendices to this textbook contain additional mathematics that you may need to review as you work your way through the textbook.

1.2.4. Do I Need To Know How To Program In Python?

You are NOT expected to know the Python programming language before you start this course.

As you’ve seen above, this textbook contains Python code snippets that are designed to aid your understanding of the mathematical concepts. It is NOT one of the goals of this textbook to teach you Python, but instead "just enough Python" to be able to examine, run, and alter the existing code snippets.

The appendices "An Introduction to Python" and "Python Syntax Examples" cover most of the basic concepts you will need from the Python programming language.

1.3. Applications of Discrete Mathematics

Remixer’s Note: This section is taken from the original “Discrete Math” book, with only a few minor edits.

Discrete mathematics is applied in many areas including the physical, engineering, and increasingly, the social sciences.

1.3.1. Applications to Applied Mathematics

Most problems that involve computational methods, need to be solved using computers. Rather than solve for the temperature map of an entire planar region, we solve for the temperature using a discrete set of mesh or grid of points on a representative subset of the planar region.

temperature distribution
Figure 1. Continuous temperature profile versus discrete meshed representation on computer

1.3.2. Applications to Information Technology and Computer Science

Discrete mathematics is needed for computer science as information and data is stored digitally. Digitally represented data is inherently discrete and is processed using discrete methods. For example a course grid discrete representation of the 2-d temperature distribution from the plate above could be:

\( \left(\begin{matrix}1&1&1\\2&4&8\\3&9&27\\4&16&64\\5&25&125\\\end{matrix}\right) \)

A voter registry may have voters in a database accessible from a list:

\( \left(\begin{matrix}John\ Smith\\Raheem\ Johnson\\.\\.\\.\\Sarah\ Muller\\\end{matrix}\right) \)

Which may need to be accessed and sorted, say geographically or alphabetically.

1.3.3. Applications to Data Science

Data science solutions to many problems use machine learning algorithms that are inherently discrete in nature. The information that needs processing is discrete, as are the basic problems in data science such as classification or clustering problems. In particular

  • Information consisting of data sets is represented using various data structures including graphical structures such as trees. Data science methods and algorithms involve procedures that manipulate these graphical structures to, for example, networks, classification trees, and decision trees.

  • Classification problems are discrete in nature. Classifying tumors as malignant or as benign involves trying to predict if a variable \(Y\) that we can think of as taking on two values either \(0\) or \(1\) either malignant or benign. There are various algorithms used in classification problems, such as the binary tumor classification, including methods from probability.

classification
Figure 2. Binary classification algorithm ("1" malignant, "0" benign)

1.3.4. Applications to Engineering

Digital signal processing involves taking a video, audio, or other signal like temperature, pressure, position and velocity, which is continuous, digitizing it and then processing the digital signal mathematically.

signal processing
Figure 3. Continous vs discrete time signal

1.3.5. Applications of Combinatorics

Combinatorics involves in part the study of counting the number of objects, satisfying a specified condition, from sets of variable size. Enumeration and combinatorics is important in many areas and examples including:

  • Calculating the number of steps an algorithm needs to process a data set of variable size \(𝑛\). This problem is called the computational cost of the algorithm as a function of \(𝑛\).

  • Calculating the possible number of codes in a cryptographic code system

1.3.6. Applications of Graph Theory

Graph theory, which is the study of structures constructed with nodes and the edges joining them, has applications in many fields including,

  • Chemistry - representing molecular bonding and structure

molecular bonding
Figure 4. Graph theory and molecular bonding
  • Information technology and computer science - ranking pages on the internet, with pages considered as nodes and page links as edges.

page ranking
Figure 5. Page ranks using a graph theory model.
  • Industrial engineering and network optimization

    • Traffic routes (computer, internet, air, highway, subway systems) can be represented with stations as nodes and connections as edges.

    • Often we are interested in finding an optimal path in a network such as in the following example, finding the shortest tour over a series of towns on a map.

An example of the shortest tour problem, is shown below, using a software solution.

shortest tour
Figure 6. Using software like Mathematica to solve a network optimization problem such as finding the shortest tour.

1.3.7. Applications of Probability and Statistics

Many probability assignments are based on counting and combinatorial methods.

  • If we assume that the likelihood of rain is the same on any day in the month of September, we might be interested in the probability that it rains on \(0\) days, it rains on exactly \(1\) day, exactly \(2\) days, etc. Such probability assignments are called discrete distributions, by contrast with continuous distributions like the bell curve.

  • Also probability and statistical techniques are often used in data science. The binary classification problem, of say classifying a tumor as malignant or benign, uses a statistical modeling technique, called regression, specifically logistic regression to determine the strength of the relationship between the independent variable, and dependent heterogeneity variable. In the tumor grading example the independent variable would be \((x_1,x_2 )\) (elastic heterogeneity, nonlinear elasticity), and the dependent variable would be \(Y\), classified as \(0\), or \(1\), (malignant or benign).

1.3.8. Applications to Social Sciences

Discrete mathematical techniques are important in understanding and analyzing social networks including social media networks.

The mathematics of voting is a thriving area of study, including mathematically analyzing the gerrymandering of congressional districts to favor and/or disfavor competing political parties. The following example illustrates some of the fundamental ideas related to gerrymandering.

Example—​Mathematics and Voting

Consider a fictitious state made up of \(10\) congressional districts with \(7\) thousand voters in each district. To win a district a party (Green or Blue) needs to win \(4\) thousand or more votes. Consider the following two districting map scenarios. In each scenario, the blue party earns \(28\) thousand votes, and the green party earns \(42\) thousand votes. In scenario \(A\), the blue party wins \(2\) out of \(10\) districts, but in scenario \(B\) it wins \(7\) out of \(10\) districts.

gerrymandering
Figure 7. Gerrymandering example with two equivalent votes

Sets, including subsets, the empty set, unions of sets, and intersections of sets

One-to-one correspondence, including the example "rock, paper, scissors"

Counting, including the Multiplication Principle and Pigeonhole Principle

Functions, including domain, codomain, range, inverse function, and the floor and ceiling functions \(\lfloor x \rfloor\) and \(\lceil x \rceil\)

Sequences, including index and terms

Recurrence Relations, including closed forms

Algorithms, including the Division Algorithm

2. Counting: Arithmetic Techniques

This chapter was last updated on February 2, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

You probably learned how to count objects one by one when you were a child. You may have counted up to 5 on the fingers on one hand, or up to 10 on the fingers of two hands, or up to 20 on fingers and toes - that is, you paired the objects (fingers and/or toes) with number names to create a one-to-one correspondence. You may have counted all the way up to 100.

However, many problems in computer science or mathematics require you to count the number of elements in sets that contain millions of elements, or billions of elements, or even more elements, so counting one by one is inefficient or impossible. Starting with this chapter, and continuing later in the Set Theory and Counting: Permutations and Combinations chapters, you will study ways to quickly and efficiently count the number of elements of any set no matter the size of the set.

Key terms and concepts covered in this chapter:

  • Counting arguments

    • Sum Rule

    • Product Rule

    • Division Rule (also called the Rule Of Quotient)

    • Subtraction Rule (also called the Principle of Inclusion-Exclusion for two sets)

  • The pigeonhole principle

2.1. Some Foundational Counting Principles

Each of the four arithmetic operations corresponds to a rule we can use to count quickly.

2.1.1. The Sum Rule

The sum rule, also called the addition principle or the rule of sum, describes the number of possible choices of one element from a union of two sets that share no common elements (Such sets are called disjoint sets).

The Sum Rule

Suppose that we have a procedure that consists of completing one step, and that the step can be chosen from either a first set of j possible ways to complete the step or from a second set of k possible ways to complete the step, and that no way of completing the step is in both the first and second sets. Then there are \(j+k\) ways to complete the procedure.

This is called the sum rule for counting because it involves adding to find a total. The sum rule can also be extended to more than two sets, as long as every pair of the sets have no elements in common.

Example 1

A student in a Capstone course can choose a project from three different professors. The professors have 3, 7, and 4 possible projects, and no project is on more than one professor’s list. How many possible projects can the student choose?

Solution

The student can pick out a project by choosing from the first professor, the second professor, or the third professor. Since no project is on more than one list, the sum rules says that there are \(3 + 7 + 4 = 14\) projects to choose from.

set-of-playing-card_CROPPED
You Try

A card will be drawn from a standard 52-card deck.
How many ways can the card be an even number or a king?
Image credit: This is a cropped version of "Set Of Playing Card". George Hodan has released this “Set Of Playing Card” image under Public Domain license (CC0 Public Domain).

2.1.2. The Subtraction Rule

The subtraction rule describes the number of possible choices of one element from a union of two sets that do share one or more common elements.

The Subtraction Rule

Suppose that we have a procedure that consists of completing one step, and that the step can be chosen from either a first set of j possible ways to complete the step or from a second set of k possible ways to complete the step, but that there are m possible ways of completing the step that are in both the first and second sets. Then there are \(j+k-m\) ways to complete the procedure.

The subtraction rule is a special case of the inclusion-exclusion principle that involves only two sets. The inclusion-exclusion principle is discussed in more detail in the Set Theory and Proofs: Mathematical Induction chapters.

Example 2

In a group of students, 17 are enrolled in discrete mathematics, 13 are enrolled in probability, and 6 are enrolled in both discrete mathematics and probability.
How many students are in the group?

Solution

There are 17 students enrolled in discrete mathematics, but 6 of these students are enrolled in both discrete mathematics and probability. Likewise, there are 13 enrolled in probability, but 6 of these students are enrolled in both discrete mathematics and probability.
If we applied the addition rule by computing \(17+13 = 30\), this counts the 6 students who are enrolled in both discrete mathematics and probability twice, once as a student enrolled in discrete mathematics and then again as a student enrolled in probability. We can repair the count by subtracting 6 from the sum of 17 and 13; this will count each of the students exactly once. That is, the number of students in the group is \(17+13-6=24.\)
Alternative solution
First form three sets that have no students in common: Subtract 6 from each of 17 and 13 to find that there are \(17-6=11\) students in the set of students who are are enrolled in discrete mathematics but not probability, \(13-6=7\) students in the set of students who are enrolled in probability but not discrete mathematics, and 6 students in the set of students who are enrolled in both discrete mathematics and probability. Notice that the Sum Rule can be used because no student can be in two (or more) of the sets: There are \((17-6)+(13-6)+6 = 24\) students, which is the correct count.
Also note that \((17-6)+(13-6)+6 = 17+(-6)+13+(-6)+6 = 17 + 13 -6 =24,\) which is the computation used in first solution.

Example 3

A tech company has 200 applicants for a position. Of the applicants, 150 were computer science majors, 43 were business majors, and 25 were double majors in both computer science and business.
How many applicants did not major in either computer science or business?

Solution

By the subtraction rule, there are \(150 + 43 - 25 = 168\) applicants that majored in computer science or business (or both). The number of applicants who did not major in either computer science or business is \(200 - 168 = 32.\)

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Notice that Dr. Roberts uses a Venn diagram to represent the sets in this video. Venn diagrams are covered in the Set Theory chapter of the Remix.

2.1.3. The Product Rule

The product rule, also called the multiplication principle or the rule of product, describes the number of possible choices of two successive elements where the first element comes from one set and the second from another set (which could be the same set as the first set).

To find the total number of outcomes for two or more successive events where both events must occur, multiply the number of outcomes for each event together. For instance, if you want to find the number of outcomes possible when you roll a die and toss a coin, you could use the product rule. It is important to note that the events must be independent, meaning one doesn’t effect the other.

The Product Rule

Suppose that we have a procedure that consists of completing two steps, that there are k possible ways to complete the first step, and for each possible way of completing the first step, there are m possible ways to complete the second step. Then there are k⋅m ways to complete the procedure.

Example 4

Suppose there are 27 computers in a computer center and each computer has 15 ports. How many different ways are there to choose a specific port?

Solution

Choosing a port means you first choose a computer and then a port on that computer. Since there are 27 computers and 15 ways to choose a port on a computer, there are \((27)(15) = 405\) ways to choose a port.

You Try

How many functions are there from a set \(A\) with \(m\) elements to a set \(B\) with \(n\) elements?
Click on "Hint" to reveal the hidden text.

Hint
Try working out an example where \(m\) and \(n\) are small natural numbers. For example, how many functions are there from the set \(\{ 1, 2, 3, 4 \}\) to the set \(\{ \text{"red"},\,\text{"green"},\,\text{"blue"} \}\)?

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 5
Example 6

You can use more than one of the rules to solve a problem.

How many bitstrings of length four start with 1 or end with 00?

Solution

First, a bitstring of length four that starts with 1 will be of the form \(1~*~*~*\), where there are two choices for each \(*\), either 0 or 1. Use the product rule to compute that there are \((1)(2)(2)(2) = 2^3 = 8\) bitstrings of this form.

Secondly, a bitstring of length four that ends with 00 will be of the form \(*~*~0~0\), so there are \((2)(2)(1)(1) = 2^2 = 4\) bitstrings of this form.

Thirdly, a bitstring of length four that starts with 1 and ends with 00 will be of the form \(1~*~0~0\), so there are \((1)(2)(1)(1) = 2\) bitstrings of this form.

Now use the subtraction rule to compute the number of bitstrings of length four that start with 1 or end with 00 (or both): \(8 + 4 - 2 =10.\)

You Try

If a card is drawn from a standard 52-card deck, how many ways can the card be black or a face card (that is, either a Jack or a Queen or a King)?

2.1.4. The Division Rule

This rule is used when there are \(n\) ways to complete a procedure, but each of those ways is equivalent to \(d\) ways (including the way itself.) That is, every possible outcome of the procedure can be done in \(d\) different ways

The Division Rule

Suppose that we have a procedure that can be completed in n possible ways, but that for each way of completing the procedure there are d possible ways with the same outcome. Then there are \(\frac{n}{d}\) ways to complete the procedure.

The next example uses both the product and division rules.

DivisionRuleCircularTableImage
Example 7

Four students will sit around a circular table, but two seatings are considered "not different" whenever each student has the same left neighbor and right neighbor. How many different seatings are there?

Solution

First use the product rule to find that there are (4)(3)(2)(1) = 24 possible ways for the 4 students to sit (For example, there are 4 choices for the "North," then 3 choices remaining for the "East," then 2 choices remaining for the "South," and 1 choice remaining for the "West.") Next, notice that if all the students shifted one chair to the left once, twice, or thrice, they would all have the same neighbors that they originally had. This means that for each of the 24 ways the students can sit, 4 of those ways are not considered different.
Therefore, there are \(\frac{24}{4}\) or 6 different seatings.

2.2. The Pigeonhole Principle

A suprising number of counting problems can be solved with the so-called pigeonhole principle.

Pigeonhole Principle

If \(k+1\) pigeons are roosting in \(k\) pigeonholes then at least one pigeonhole must contain more than one pigeon.

NOTE: The Pigeonhole Principle is often attributed to Peter Gustav Lejeune Dirichlet, who called it the Schubfachprinzip. The remixer is willing to speculate that this principle has been known for at least as long as humans have kept birds such as pigeons.

Click here to see a photoshopped image that illustrates this principle.

Example

In a group of 367 people at least two will have the same birthday because there are only 366 possible birthdays (counting February 29).

You Try

How many people, with English names, must be in a room for at least two of the people to have first names that starts with the same letter?

2.3. Exercises

  1. There are 67 mathematics majors and 124 computer science majors at a college. There is no student who is both a mathematics major and a computer science major.

    1. In how many ways can two representatives be picked so that one is a mathematics major and one is a computer science major?

    2. In how many ways can one representative be picked who is either a mathematics major or a computer science major?

  2. A multiple-choice test contains 20 questions, and each question has four choices.

    1. In how many ways can a student answer all of the questions on the test if each question must be answered?

    2. In how many ways can a student answer all of the questions if the student is allowed to not answer one or more questions?

  3. How many different three-letter initials, using uppercase English letters, are there?

  4. How many different three-letter initials, using uppercase English letters, end with "R"?

  5. How many bit strings are there of length five?

  6. How many bit strings are there of length five that begin and end with 1?

  7. How many bit strings are there of length less than \(n\), where \(n\) is a positive integer, that start and end with 1?

  8. How many license plates can be made using three digits followed by four uppercase English letters if:

    1. Digits and letters can be repeated?

    2. Digits and letters cannot be repeated?

  9. Each student in a Discrete Mathematics class is a mathematics major, a computer science major, or a double major in both mathematics and computer science. If the class has 5 mathematics majors (including double majors), 23 computer science majors (including double majors), and 7 double majors, how many students are in the class?

  10. Suppose a computer system requires a password of length no less than 7 and no more than 10 characters, and each character must be an English lowercase letter, an English uppercase letter, a digit, or one of six special characters (*, >, <, !, +, =).

    1. How many different passwords are available?

    2. Suppose a hacker can check a potential password once every nanosecond (1 nanosecond is \(1 \times 10^{-9}\) seconds). How long will it take the hacker to check every potential password?

  11. Suppose that there are 29 students in a class, all of whose last names use only English letters. Explain why at least 2 students in the class have last names that begin with the same letter.

  12. Show that in any set of 5 integers, there are at least two of them that have the same remainder when divided by 4.

  13. A bag contains 8 red balls and 7 blue balls.

    1. How many balls must be chosen to be sure of choosing 3 of the same color?

    2. How many must be chosen to be sure of choosing 3 red balls?

  14. Someone cleaning out their attic finds a box containing 12 rock CDs and 12 country CDs. What is the minimum number of CDs they can take out to guarantee at least one of each type?

  15. Give an argument that there are at least two people in California with the same number of hairs on their head.

3. Set Theory

This chapter was last updated on February 10, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Set theory, along with logic, is the foundation of mathematics in our time. In earlier eras, people tried to use arithmetic (e.g., counting and whole numbers) and geometry (e.g., measurement of lengths, areas, and volumes along with geometric constructions) as the foundations of all mathematics. However, over the last two centuries, mathematicians gained new understandings of issues with these traditional foundations which led them to seek a new, firmer foundation. For now, that firmer foundation uses set theory: The study of collections of objects and how those collections can be combined, associated, and themselves added to other collections.
NOTE: If you want to dig much more deeply into what the issues with using arithmetic or geometry as the foundation of mathematics are, you can start with this Wikipedia page which includes a a brief description of the "foundational crisis of mathematics."

Key terms and concepts covered in this chapter:

  • Sets

    • Subsets of a set

    • The empty set

    • The power set of a set

    • Cartesian products

  • Venn diagrams

  • Cardinality and countability of finite and infinite sets

    • Set cardinality and counting

  • Operations with sets: Union, intersection, complement, and others

    • DeMorgan’s laws

    • Inclusion-exclusion principle

3.1. Sets

A set is an unordered collection of objects, called elements or members. A set is said to contain its elements.

If \(x\) is an element of the set \(S,\) then we write \(x \in S\). If \(x\) is not an element of the set \(S\), then we write \(x \not\in S\). For example, if \(S\) is the set of names of states in the United States of America, then “New York” is an element of \(S\) and “Ontario” is not an element of \(S,\) that is \[ \text{“New York”} \in S \text{ and } \text{“Ontario”} \not\in S. \] As another example, if \(E\) is the set of even integers, then \(2 \in E\) and \(3 \not\in E.\)

3.1.1. Describing A Set: The Roster Method

One way of describing a set is the roster method: List all the elements of the set between curly braces. For example, \[A = \{1,-2,0,1,-3\} \] is the set whose elements are \(-3,\) \(-2,\) \(0,\) and \(1.\)

  • Notice that the set \(A\) contains exactly \(4\) elements, even though the element \(1\) appears twice in the roster - duplicate entries do not matter.

  • Also, the order of the elements in the list does not matter. That is, \(\{-3,-2,0,1 \}\) and \(\{0, 1,-2,-3\}\) are two more ways of describing the same set \(A.\)

Example 1 - Checking Set Membership in Python

The code below checks to see if \(5\) and \(0\) are elements of the set \(A = \{1,-2,0,1,-3\}.\) Since \(5 \not\in A\) and \(0 \in A,\) the code prints False followed by True.

Example 2 - Listing the Elements of a Set in Python

The code below lists all of the elements of the set \(A = \{1,-2,0,1,-3\}.\)

Notice that 1 appears in the set once even though it appears in the roster twice.

A WARNING ABOUT THE PYTHON EXAMPLES INVOLVING SETS: The mathematical set \(A = \{1,-2,0,1,-3\}\) is a constant - you cannot change the set by removing elements or inserting new elements. However, Python objects of type set are mutable so it is possible to remove elements or insert new elements, as shown in the following code example. The mathematically correct implementation of sets in Python uses objects of type frozenset because objects of type frozenset are immutable, just like mathematical sets. But there are advantages to using type set instead of frozenset: The roster method notation can be used to initialize or print a Python set, but cannot be used with a Python frozenset: You must call the frozenset constructor to create and initialize a frozenset. The authors of the original “Discrete Math” chose to use Python sets instead of frozensets in the code examples; the author of this remix made the same choice.

WARNING: Python Sets Are Mutable, Mathematical Sets Are "Frozen"

The mathematical set \(A\) does not allow removals or insertions, but the Python set \(A\) does. The frozenset \(F\) is a more faithful implementation of the mathematical set \(A\), but notice that the symbolism for Python sets more closely matches the symbolism used for mathematical sets.

3.1.2. Describing A Set: Set Builder Notation

Another way of describing a set is the use of set builder notation. We write a set as \[\{x \in D : P(x)\}.\] This is the set of all elements \(x\) from a domain \(D\) that satisfy the predicate \(P(x).\) We can use either the colon \(:\) or the vertical bar \(|\) as the separator in this notation. For example, \(\{ x \in \mathbb{N} \, | \, x^{2} \leq 50 \}\) is the set of natural numbers that are less than \(\sqrt{50}.\)

Yet another way of describing a set is to use a function or an algebraic expression, as in \[\{ f(x) : x \in D \}.\]This is the set of all values \(f(x)\) for \(x\) in the domain \(D\). For example, \(\{ 2n : n \in \mathbb{N} \}\) is the set of the even natural numbers. Again, we can use either the colon \(:\) or the vertical bar \(|\) as the separator.

Example 3 - Set Builder Notation in Python

The set \(\{x \in D: P(x)\}\) can be expressed in Python as {for x in D if P(x)}. For example, the code below defines the set \(B\) as the set of positive elements of the set \(A = \{1,-2,0,1,-3\}.\)

3.1.3. Describing A Set: Special Sets Of Numbers

You may already be familiar with the following sets of numbers, which are listed here for reference.

Special sets of numbers
  • \(\mathbb{N} = \{0, 1, 2, 3,...\}\), the set of natural numbers

  • \(\mathbb{Z} = \{...,-2, -1, 0, 1, 2,...\}\) , the set of integers

  • \(\mathbb{Z}^+ = \{1, 2, 3,...\}\), the set of positive integers

  • \(\mathbb{Q} = \left\{\left.\frac{a}{b}\right|a\in \mathbb{Z},b\in \mathbb{Z},b\neq 0\right\}\), the set of rational numbers

  • \(\mathbb{Q}^+\), the set of positive rational numbers

  • \(\mathbb{R}\), the set of real numbers

  • \(\mathbb{R}^+\), the set of positive real numbers

  • \(\mathbb{C} = \{a+bi : a\in \mathbb{R},b\in \mathbb{R},b\neq 0,i^{2}=-1\}\), the set of complex numbers.

Other special sets will be defined as needed.

3.1.4. Describing A Set: Switching Between Representations

A set can usually be described in more than one way, as shown in the following example.

Example 4 - Switching between representations

Consider the following set: \[\{x \in \mathbb{Z} : -2 \leq x < 4\}.\] This is the set of all integers \(x\) such that \(-2\) is less than or equal \(x\) and \(x\) is less than 4. Using the roster method, this set can be written as \[\{-2,-1,0,1,2,3\}.\]

You Try

Match each set described using set builder notation in parts (a) through (f) with the same set described using the roster method in parts (A) through (F).

  1. \(\{x \in \mathbb{Z} : x^2 = 1\}\)

  2. \(\{x \in \mathbb{Z} : x^3 = 1\}\)

  3. \(\{x \in \mathbb{Z} : |x| \leq 2\}\)

  4. \(\{x \in \mathbb{Z} : x^2 < 4\}\)

  5. \(\{x \in \mathbb{Z} : x < |x|\}\)

  6. \(\{x \in \mathbb{Z} : (x + 1)^2 = x^2 + 2x + 1\}\)

  1. \(\{-1,0,1\}\)

  2. \(\{\dots, -3,-2,-1,0,1,2,3,\dots\}\)

  3. \(\{1\}\)

  4. \(\{\dots, -3,-2,-1\}\)

  5. \(\{-1,1\}\)

  6. \(\{-2,-1,0,1,2\}\)

When there are too many elements in a set for us to be able to list each one, we often use ellipses (\(\dots\)) when the pattern is obvious. For example, we have \[\mathbb{Z} = \{\dots,-3,-2,-1,0,1,2,3,\dots\}.\]

3.2. Equality Of Sets

We say that two sets are equal if and only if they contain the same elements. When \(A\) and \(B\) are equal sets, we write \(A = B\). When \(A\) and \(B\) are not equal sets, we write \(A \neq B\).

The three sets \(\{2,3,5,7\},\) \(\{5,2,7,3\},\) and \(\{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}\) are equal sets because they contain the same elements. In fact, \(\{2,3,5,7\},\) \(\{5,2,7,3\},\) and \(\{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}\) are really just three different descriptions of the same set, in the same way that \(1 + 3,\) \(5 - 1,\) and \(2^{2}\) are three different descriptions of the same number, 4. The extended equality \[\{2,3,5,7\} = \{5,2,7,3\} = \{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}\] is a true statement for the same reason the extended equality \(1 + 3 = 5 - 1 = 2^{2}\) is a true statement.
Note: You may be used to using the equal sign "=" as if it means "simplifies to" in your previous math experience, but "=" actually means "represents the same thing as."

3.3. The Empty Set

Consider the set of all natural numbers whose square is equal to 2, described using set builder notation: \(\{x \in \mathbb{N} : x^2 = 2\}.\) If you use the roster method to list all the elements you will get the set \(\{ \}\) because there are no natural numbers whose square is equal to 2!

The set \(\{ \}\) is called the empty set, or the null set. The symbol \(\emptyset\) is used to represent the empty set, too, that is, \[ \emptyset = \{ \}. \]

Example 5 - Listing the Elements of a Nonempty Set

To define the empty set in Python, we must call the constructor set(). Python interprets the empty curly braces {} as an empty object of type dict, called a dictionary, that is used to represent mappings of key:value pairs.

The function in the code below checks to see if a set is empty. If the set is nonempty, its elements are listed.

It is important to note that \(\{\}\) and \(\emptyset\) are both ways to write the empty set. However, the mathematical set \(\{ \emptyset \}\) is not the empty set because it contains one element, namely the empty set. In general, the set \(A\) is not the same as the set \(\{ A \}.\)

Python Tip: The mathematical set \(\{ \emptyset \}\) must be implemented as \(\{ \text{frozenset()} \}\), which is the Python set that contains the empty frozenset. In general, anytime we want to implement a mathematical set \(B\) as an element of another mathematical set \(A\) in Python, we need to implement \(B\) as a frozenset in order to be used as an element of the Python set \(A\). This is due to the fact that elements of Python sets must be hashable; further explanation is beyond the scope of this textbook.

3.4. Subsets of a Set

Suppose \(A\) and \(B\) are two sets, and that every element \(x\) of the set \(A\) is also an element of set \(B.\) We say that \(A\) is a subset of a set \(B,\) and write \(A \subseteq B\). If a set \(C\) is not a subset of \(B,\) we write \(C \not\subseteq B\).

If \(A \subseteq B\) but \(B\) contains at least one element that is not in A, then \(A\) is called a proper subset of \(B\), denoted \(A \subset B\). That is, \(A\) is a proper subset of \(B\) if it is a subset of \(B\) but is not equal to \(B.\)

Example 6

Suppose that we have three sets \(R = \{1,5\},\) \(S = \{1,3,5\},\) and \(T = \{1,4,7\}.\)

  • \(R \subseteq S,\) since each element \(x\) of \(R\) also is an element of \(S.\)

  • \(R \subset S\) since \(3\) is an element of \(S\) but is not an element of \(R.\)

  • \(S \not\subseteq T\) since \(3\) is an element of \(S\) but is not an element of \(T.\) Likewise, \(T \not\subseteq S\) since \(4\) is an element of \(t\) but is not an element of \(S.\)

Theorem

For any set \(A\) \[\emptyset \subseteq A\] \[A \subseteq A\]

For any sets \(A\) and \(B\) \[ A = B \text{ if and only if both } A \subseteq B \text{ and } B \subseteq A\]

Example 7 - Subsets in Python

In Python, we can check whether a set \(A\) is a subset of a set \(B\) in one of the following ways:

A.issubset(B)
A <= B.

3.5. The Power Set of a Set

Given a set \(A,\) we can define a new set by collecting together all subsets of \(A\). This new set is called the power set of \(A.\) The power set of \(A\) is denoted by \(\mathcal{P}(A).\) That is, \[ \mathcal{P}(A) = \{ B \, | \, B \subseteq A \}. \] Notice that \(\mathcal{P}(A)\) is a set whose elements are themselves sets.

Example 8 - The Power Set Of A Set

Suppose that \(A = \{0,1,2\},\) then \[\mathcal{P}(A) = \{\emptyset, \{ 0 \}, \{ 1 \}, \{ 2 \}, \{0,1\}, \{0,2\}, \{1,2\}, \{0,1,2\}\}.\] Notice that the empty set is an element of \(\mathcal{P}(A)\) along with all the other subsets of \(A.\)

The empty set has only one subset, namely itself. Thus, we see that \[\mathcal{P}(\emptyset) = \{\emptyset\}.\]

We can also find the power set of a power set. For example, we have the following:

\[\begin{split} \mathcal{P}(\{ 3 \}) &= \{\emptyset, \{ 3 \}\},\\ \\ \mathcal{P}(\mathcal{P}(\{ 3 \}) &= \mathcal{P}(\{\emptyset, \{ 3 \})\\ &= \{\emptyset, \{\emptyset\}, \{ \{ 3 \} \}, \{\emptyset, \{ 3 \}\}\}. \end{split}\]

3.6. Cartesian Products

The Cartesian product of two sets \(A\) and \(B\) is the set of ordered pairs defined by,

\( A\times B=\{(a,b) \, | \, a\in A \text{ and } b\in B)\}\),

Example 9

Consider the sets, \(B=\{0,1\}\), \(T=\{0,1,2\}\), and, \(C=\{a,\ b,\ c, d\}\). Determine how many elements are in each set using the product rule, and verify by writing out each set using the roster method.

  1. \(B\ \times\ C\)

  2. \(C\times B\)

  3. \(B\ \times\ T\)

  4. \(B\ \times\ B\)

  5. \(B\ \times\ B\ \times B\)

Solution

For the set, \( B\ \times C \), notice that this will be all ordered pairs of the form, \((a,b)\), with \(a \in B\), and \(b \in C\), giving,

\(B\ \times\ C=\{(0,a), (0,b), (0,c), (0,d),(1,a), (1,b), (1,c), (1,d))\}\), which has \(2 × 4=8\), elements.

For \(C\ \times\ B\), switch the ordering, for \(B\ \times\ C\), to obtain the set with \(8\), elements,

\(C\ \times B=\{(a,0), (b,0), (c,0),(d,0),(a,1), (b,1), (c,1), (d,1)\}\),

The set \(B \times T\), will be all ordered pairs of the form, \((a,b)\), with \(a \in B\), and \(b \in T\), giving, the set with \(2 × 3=6\), elements,

\(B \times T=\{(0,0),(0,1),(0,2),(1,0),(1,1),(1,2)\}\),

The set \(B \times B\), will be all order pairs of the form, \((a,b)\), with \(a, b \in B\), giving the set with \(2 × 2=4\), elements,

\(B \times T=\{(0,0),(0,1),(1,0),(1,1)\}\),

Finally the set \(B \times B \times B\), will be the set of all ordered triples of the form, \((a,b,c)\), with \(a, b, c \in B\), giving the set with \(2 × 2 × 2=8\), elements,

\(B \times B \times B=\{(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)\}\),

Cartesian products are created using ordered pairs, so if \(A\) and \(B\) are different sets, then \(A \times B\) is different from \(B \times A\).

The Cartesian coordinate systems are Cartesian products.

The two-dimensional \(xy\)-plane is represented by \(\mathbb{R}^2=\mathbb{R}\times \mathbb{R}=\{(x,y)|x,y\in \mathbb{R}\}\), and, the three-dimensional \(xyz\)-space are represented by \(\mathbb{R}^3=\mathbb{R}\times \mathbb{R}\times \mathbb{R}=\{(x,y,z)|x,y,z\in \mathbb{R}\}\)

3.7. Cardinality Of Sets: Finite Sets

Cardinality is the formalization of the idea of the count of the number of elements in a set.
In this section, we will prefer counting from 1 instead of 0. You will see below why this makes no difference.

Set \(A\) is called a finite set if either

  • \(A\) is the empty set or

  • there is a one-to-one correspondence between \(A\) and the set \(\{ i \in \mathbb{N} \, | \, 0 < i \leq n \} = \{1, 2, \ldots , n \}\) for some positive integer \(n.\)

FingersToNumerals

This definition of "finite set" may seem abstract, but it’s just a formal description of what is likely the way you learned to count when you were young: You matched objects with number names (that is, numerals) as shown in the image.

04to15

The cardinality of a finite set \(A,\) denoted by \(|A|,\) is

  • 0 if \(A = \emptyset\) or

  • the value of \(n\) for which there is a one-to-one correspondence between \(A\) and \(\{1, 2, \ldots , n \}.\)

For a finite set \(A\) the cardinality \(|A|\) is just the number of elements in the set. The image shows that \(|\{0,1,2,3,4\}| = 5.\)

Example 10 - Cardinality of Finite Sets in Python

The cardinality of a finite set \(A\) can be computed in Python as follows:

len(A)

Example 11

Suppose that \(A\) and \(B\) are finite sets.

  • The cardinality of the Cartesian product \(A × B\) is \(|A × B|=|A| \cdot |B|\).

  • The cardinality of the power set of \(A\) is \(\left|\mathcal{P}(A)\right| = 2^{|A|}.\)

Challenge
Give informal arguments to justify each of the two bulleted statements.
Hint
Your arguments can use some of the rules introduced in Counting: Arithmetic Techniques chapter
Answer
For the Cartesian product, the elements of \(A × B\) are ordered pairs of the form \((x,y)\) where \(x\in A\) and \(y\in B.\) There are \(|A|\) choices for the first coordinate, and for each of those choices there are \(|B|\) choices for the second coordinate. Use the product rule to conclude that there are \(|A| \cdot |B|\) different ways to choose an ordered pair, so that \(|A × B|=|A| \cdot |B|.\)

For the power set of \(A,\) to choose a subset of \(A\) you must decide for each of the \(|A|\) elements in \(A\) whether to include that element in the subset. There are 2 choices for each element, so using the product rule repeatedly you can conclude that there are \((2)(2)\cdots(2)\) ways to choose a subset, where the number of factors of \(2\) is \(|A|\). So \(\left|\mathcal{P}(A)\right| = 2^{|A|}.\) You can also think of this as counting all possible bitstrings of length \(n,\) where a 1 bit means "include the corresponding element" and a 0 bit means "omit the corresponding element."

3.8. Venn Diagrams

A Venn diagram, named after the English mathematician John Venn, consists of one or more circles, with each circular region representing a set. An example can be seen here.

We write the elements of a set within the circular region that represents the set; anything written outside the circular region is not an element of the set. If an element is written in the overlap of two or more regions, then it is an element of each of the sets.

The circles are often drawn inside a larger rectangle which represents a universal set \(U\) that we are focusing on. In the example linked above, the rectangle was omitted because every glyph was an element of at least one of the sets represented by a circular region, but if we introduced addition glyphs like ہ we would need to draw the rectangle because that glyph would need to be written outside all three circular regions.

VennVsEulerDiagrams

In this textbook, a Venn diagram must show all the possible overlaps of the sets. This is consistent with Venn’s paper from 1880.
That is, you should NOT be able to answer the question "Is x an element of set A?" when x is written in the circular region for a different set, B. In the image, the upper right example shows a Venn diagram because you could write x in the overlap of the two regions or you could write x in the the part of the region for B that is outside the circular region for A. The lower two diagrams are not Venn diagrams: In either one of those, if x is written in the region for set B, it must be true that x is not an element of A (on the lower left) or that x is an element of A (the example on the lower right). Diagrams like the lower two examples will be called Euler diagrams in this textbook.
Some sources use the term Venn diagram for all four of the examples shown in the image, but you should always assume when reading this textbook that the lower two are NOT Venn diagrams. Click here to see the light!

3.9. Set Operations

We can obtain new sets by performing operations on other sets. When performing set operations, it is often helpful to consider all of our sets as subsets of a universal set \(U.\) We can think of the universal set as the set of all of the objects under consideration.

We can represent set operations visually using Venn diagrams.

3.9.1. Union

The union of the sets \(A\) and \(B\) is the set containing those elements that are in \(A\) or \(B\) or both, and is denoted by \(A \cup B\). More formally, \[A \cup B = \{x \in U : x \in A \text{ or } x \in B\}.\]

Note that "or" is read here as the "inclusive or". We have the following Venn Diagram for \(A \cup B\):

Union

Note that, for any sets \(A\) and \(B,\) \[A \cup B = B \cup A.\]

Example 12

If we let \(A = \{1,2,3,4,5,6\}\) and \(B = \{1,3,5,7,9\},\) then \[A \cup B = \{1,2,3,4,5,6,7,9\}.\]

Example 13 - Union in Python

In Python, we can compute the union of sets \(A\) and \(B\) in one of the following ways:

A.union(B)
A | B

3.9.2. Intersection

The intersection of the sets \(A\) and \(B\) is the set containing those elements that are in \(A\) and \(B\) and is denoted by \(A \cap B\). More formally, \[A \cap B = \{x \in U : x \in A \text{ and } x \in B\}.\]

We have the following Venn Diagram for \(A \cap B\):

Intersection

Note that, for any sets \(A\) and \(B,\) \[A \cap B = B \cap A.\] If it is the case that \(A \cap B = \emptyset,\) then we say that \(A\) and \(B\) are disjoint. In other words, two sets are disjoint if and only if they contain no elements in common.

Example 14

If we let \(A = \{1,2,3,4,5,6\}\) and \(B = \{1,3,5,7,9\},\) then \[A \cap B = \{1,3,5\}.\]

Example 15 - Intersection in Python

In Python, we can compute the intersection of sets \(A\) and \(B\) in one of the following ways:

A.intersection(B)
A & B

3.9.3. Complement

The complement of a set \(A\) is the set of all elements in the universal set \(U\) which are not elements of \(A\) and is denoted by \(\overline{A}.\) More formally, \[\overline{A} = \{x \in U: x \not\in A\}.\] Note that other textbooks and internet sources may use different notation for the complement of \(A\), such as \(A'\) and \(A^{c}\), but these all stand for the same set, so that \(\overline{A} = A' = A^{c}\).

We have the following Venn Diagram for \(\overline{A}\):

ComplementA

For any set \(A,\) \[ \overline{\overline{A}} = A \] \[ \overline{A} \cup A = U \] \[ \overline{A} \cap A = \emptyset. \]

Example 16

Suppose that our universal set is \(U = \{0,1,2,3,4,5,6,7,8,9\},\) the set of all decimal digits. If we let \(A = \{1,2,3,4,5,6\}\) and \(B = \{1,3,5,7,9\},\) then \[\overline{A} = \{0,7,8,9\}\] and \[\overline{B} = \{0,2,4,6,8\}.\]

Example 17

Suppose that our universal set is \(\mathbb{Z}.\) If we let \(E\) be the set of all even integers, then \(\overline{E}\) is the set of all odd integers.

3.9.4. Other Operations

The three operators complement, intersection, and union are the most commonly used to define subsets of a universal set. You will see why this is so later in the chapter.

However, there are some other operators you should be familiar with.

Difference

The difference of the sets \(A\) and \(B\) is the set containing those elements that are in \(A\) but not in \(B\) and is denoted by \(A \setminus B\). Set difference is also denoted by \(A - B\). More formally, \[A \setminus B = \{x \in U: x \in A \text{ and } x \not\in B\}.\]

We have the following Venn Diagram for \(A \setminus B\):

A Subtract B

Note that, for any sets \(A\) and \(B\), if \(A \neq B,\) then \[A \setminus B \neq B \setminus A.\] However, if \(A = B,\) then \(A\setminus B = B \setminus A = \emptyset\).

Example 18

If we let \(A = \{1,2,3,4,5,6\}\) and \(B = \{1,3,5,7,9\},\) then \[A \setminus B = \{2,4,6\}\] and \[B \setminus A = \{7,9\}.\]

Example 19 - Difference in Python

In Python, we can compute the difference of sets \(A\) and \(B\) in one of the following ways:

A.difference(B)
A - B

Symmetric Difference

The symmetric difference of the sets \(A\) and \(B\) is the set containing those elements that are in \(A\) or \(B\) but not both \(A\) and \(B\). It is denoted by \(A \oplus B\) in this textbook, but other books and sources may use different notation such as \(A \Delta B\). More formally, \[A \oplus B = \{x \in U: (x \in A \text{ and } x \not\in B) \text{ or } (x \in B \text{ and } x \not\in A)\}.\]

We have the following Venn Diagram for \(A \oplus B\):

A symdif B

Note that, for any sets \(A\) and \(B,\) \[A \oplus B = B \oplus A.\]

Example 20

If we let \(A = \{1,2,3,4,5,6\}\) and \(B = \{1,3,5,7,9\},\) then \[A \oplus B = \{2,4,6,7,9\}.\]

Example 21 - Symmetric Difference in Python

In Python, we can compute the difference of sets \(A\) and \(B\) in one of the following ways:

A.symmetric_difference(B)
A ^ B

3.9.5. Multiple Set Operations

We can also perform more than one set operation on a collection of sets. For example, let \(A,\) \(B,\) and \(C\) be sets and consider the following set: \[(A \setminus B) \cup (C \setminus B).\]This is the set that is obtained by taking the union of the sets \(A \setminus B\) and \(C \setminus B.\) We have \[(A \setminus B) \cup (B \setminus A) = \{x \in U: (x \in A \text{ and } x \not\in B) \text{ or } (x \in C \text{ and } x \not\in B)\}.\]

We have the following Venn Diagram for \((A \setminus B) \cup (C \setminus B)\):

AminusBunionCminusB

Note that the Venn Diagram also represents \((A \cup C ) \setminus B\). In general, there are multiple ways to describe the result of multiple set operations.

Video Examples

The following two video examples feature Dr. Katherine Pinzon, Professor of Mathematics at Georgia Gwinnett College.

Video Example 1

Video Example 2

You Try

Draw Venn Diagrams for each of these combinations of the sets \(A\), \(B\), and \(C\).

  1. \(A \cap (B \cup C)\)

  2. \((A \cap B) \cup C\)

  3. \((\overline{A} \cap \overline{C}) \cup B\)

  4. \((B \cup C) \setminus A\)

3.10. Set Identities

Here is a collection of additional properties of the operations on sets. Each of these can be verified by drawing two Venn diagrams, one that represents the left-hand side of the equation and another that represents the right-hand side of the equation and showing that the resulting shadings of the Venn diagrams are the same.

Note that it is traditional to focus on complement, union, and intersection as the three primary set operations because the other operations such as difference and symmetric difference can be written in terms of those three primary operations, for example, \(A \setminus B = A \cap \overline{B}\) and \(A \oplus B = (A \cap \overline{B}) \cup (\overline{A} \cap B)\).

Associative laws: \[ A ∪ (B ∪ C) = (A ∪ B) ∪ C \] \[ A ∩ (B ∩ C) = (A ∩ B) ∩ C \]

Distributive laws: \[ A ∪ (B ∩ C) = (A ∪ B) ∩(A ∪ C) \] \[ A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) \]

De Morgan’s laws: \[ \overline{A \cup B} = \overline{A} \cap \overline{B} \] \[ \overline{A \cap B} = \overline{A} \cup \overline{B} \]

3.10.1. Operator Precedence (Order Of Operations)

To ensure that we can properly interpret an expression involving multiple set operations, we can either use parentheses or rely on operator precedence.

When an expression for sets involves parentheses, complementation, intersection, and union, we start by evaluating all expressions enclosed in parentheses from left to right, then all complementations from left to right, then all intersections from left to right, and finally all unions from left to right. (Set difference and symmetric difference were left out of this discussion because there does not seem to be a standard definition for where they fit in! But, as shown earlier, those two operations can be rewritten in terms of complementation, union, and intersection.)

For example, the expression \(\overline{A} \cup B \cap C\) represents the same set as \((\overline{A}) \cup (B \cap C)\). Parentheses must be used if you want to represent a different set such as \((\overline{A} \cup B) \cap C\).

This is the same way arithmetic expressions like \(-3 + 5 \cdot 2\) are evaluated: The value of \(-3 + 5 \cdot 2\) is \((-3) + (5 \cdot 2) = 7\), not \((-3 + 5) \cdot 2 = 4\).

3.11. Venn Diagrams, Partitions, and Bitstrings

A partition of a set \(U\) is a set of subsets of \(U\) such that each element \(x \in U\) is a member of exactly one of the subsets in the partition.

As an example you already know, one partition of the set of integers \(\mathbb{Z}\) is the set of subsets \[\{ \text{the set of even integers}, \text{the set of odd integers} \}\] Notice that every integer \(n\) belongs to exactly one of the two elements of this set.

As another example, for any subset \(A \subseteq U\) you have a partition of \(U\) into the 2 sets that are elements of \[\{ A,\,\overline{A} \}\] Note that each element of \(U\) must be in exactly one of the subsets \(A\) and \(\overline{A}\).

For two subsets \(A\) and \(B\) of a universal set \(U\), consider the Venn diagram of \(A\) and \(B\). Notice that, by considering all possible intersections of these two sets and their complements, \(U\) is partitioned into 4 subsets, namely, the 4 elements of \[\{ \overline{A} \cap \overline{B},\, \overline{A} \cap B,\,A \cap \overline{B},\,A \cap B \}\] We can refer to each of these 4 subsets by using bitstrings of length 2 as follows:

  • The leftmost bit is 1 if an element of the subset is an element of \(A\), and is 0 if an element of the subset is not an element of \(A\).

  • The rightmost bit is 1 if an element of the subset is an element of \(B\), and is 0 if an element of the subset is not an element of \(B\).

For example, in the following Venn diagram, the subset \(A \cap \overline{B}\) is labeled with the bitstring \(10\) because an element of \(A \cap \overline{B}\) is an element of \(A\) and not an element of \(B\).

ABbitstrings

If you had instead three subsets \(A\), \(B\), and \(C\) of the universal set \(U,\) you could partition the universe \(U\) into 8 subsets. In detail, if you have an element \(x \in U\), either \(x \in A\) or \(x \not\in A\), and for each of those possibilities, either \(x \in B\) or \(x \not\in B\), and for each of those possibilities, either \(x \in C\) or \(x \not\in C\). We can apply (twice) the multiplication principle that was first mentioned in chapter 2 to show that there are \(2 \cdot 2 \cdot 2\) possible subsets determined by the Venn diagrams of the 3 sets \(A\), \(B\), and \(C\). Using bitstrings of length 3, we can label these 8 subsets as shown.

ABCbitstrings

For an integer \(n > 3\), the Venn diagram is less useful for representing the partitioning of the universe created by \(n\) subsets, but we can still reason that there ought to be \(2^{n}\) subsets in the partition, where each of the subsets can be described by a unique bitstring of length \(n\) (We will be able give a formal mathematical proof of this for every positive integer \(n\) later in the textbook after we’ve discussed mathematical induction.)

3.11.1. Disjunctive Normal Form (Set Version)

Suppose you have three sets \(A\), \(B\), and \(C\), and have partitioned the universe \(U\) into the 8 subsets as discussed above. A subset of \(U\) that corresponds to any shading of the Venn diagram can be written as a union of intersections of three sets, with one set chosen from each of the pairs \(\{ A,\,\overline{A} \}\), \(\{ B,\,\overline{B} \}\), and \(\{ C,\,\overline{C} \}\).

ABC DNF EXAMPLE

As an example, consider the set shown in the image, which has 4 of the 8 regions of the Venn diagram shaded:

  • \(\overline{A} \cap \overline{B} \cap \overline{C}\) which is the region outside of all three sets,

  • \(A \cap \overline{B} \cap \overline{C},\) the region in set \(A\) but in neither \(B\) nor \(C,\)

  • \(\overline{A} \cap B \cap \overline{C},\) the region in set \(B\) but in neither \(A\) nor \(C,\)

  • \(\overline{A} \cap B \cap C,\) the region in both \(B\) and \(C\) but not in \(A.\)

Write the union of these 4 subsets to create an expression that describes the entire shaded region. \[(\overline{A} \cap \overline{B} \cap \overline{C}) \cup (A \cap \overline{B} \cap \overline{C}) \cup (\overline{A} \cap B \cap \overline{C}) \cup (\overline{A} \cap B \cap C)\]

This type of expression is called a disjunctive normal form (or DNF) for the set that it represents. We will see an analog of these in a different context in the chapter on Logic.

The advantage of using the DNF is that you can write out an expression for the shaded subset using a simple algorithm. The DNF may be neither the shortest possible expression nor the most easily understood expression for the shaded part of the Venn diagram, but the DNF is a correct expression for the shaded subset.

3.12. The Principle Of Inclusion-Exclusion (PIE)

In certain application problems, we want to compute the cardinality \(|A \cup B|\) of the union of two given finite sets \(A\) and \(B\). It is tempting to simply add \(|A|\) and \(|B|\), but as the Venn diagram below shows, each element of the intersection \(|A \cap B|\) will be counted twice, once for each bit that is \(1\), if we do so.

ABbitstrings

The correct relationship between \(|A \cup B|\), \(|A|\), and \(|B|\) is given by \[ |A \cup B| = |A| + |B| - |A \cap B|. \]

Another way to see that this is the correct relationship is to use the partition \(\{ \overline{A} \cap \overline{B},\, \overline{A} \cap B,\,A \cap \overline{B},\,A \cap B \}\) to write

\(| A | = | A \cap \overline{B} | + | A \cap B |\),
\(| B | = | \overline{A} \cap B | + | A \cap B |\), and
\(| A \cup B | = | A \cap \overline{B} | + | A \cap B | + | \overline{A} \cap B |\), so

\(| A | + | B | = | A \cap \overline{B} | + | A \cap B | + | \overline{A} \cap B | + | A \cap B | = | A \cup B | + | A \cap B |\).

Example 22

Consider the set \(U = \{ n \in \mathbb{N} : 1 \leq n \leq 60 \}\). How many elements of \(U\) are divisible by either 2 or 3 or both? How many elements of \(U\) are divisible by neither 2 nor 3?

Let \(A\) stand for the subset of \(U\) that consists of multiples of 2, and let \(B\) stand for the subset of \(U\) that consists of multiples of 3. It’s not too difficult to see that \(|A| = \frac{60}{2} = 30\) and \(|B| = \frac{60}{3} = 20\). Also, \(A \cap B\) must be the subset of \(U\) that consists of multiples of 6, so \(| A \cap B | = \frac{60}{6} = 10\). (If these computations don’t make sense to you, just start counting off the pattern \(1,\,2,\,3,\,4,\,5,\,6,\,\ldots\) and notice that every 2nd number is divisible by 2, every 3rd number is divisible by 3, and every 6th number is divisible by both 2 and 3.) Now apply the Principle Of Inclusion-Exclusion to find the number of integers in \(U\) that are divisible by either 2 or 3 or both: \(|A \cup B| = |A| + |B| - |A \cap B| = 30 + 20 - 10 = 40\). There are 40 integers in \(U\) that are divisible by either 2 or 3 or both, so there are \(60 - 40 = 20\) integers in \(U\) that are divisible by neither 2 nor 3.

If we want to compute the cardinality \(|A \cup B \cup C|\) of the union of three given finite sets \(A\), \(B\), and \(C\), we can again look at the Venn diagram of the partition of \(U\) into 8 sets to see that some of the intersections will be counted one, two, or three times, once for each bit that is \(1\).

ABCbitstrings

We can derive the following formula in much that same way that we did above; in fact, we can just apply the formula we found for two sets to \(| (A \cup B) \cup C |\) and use some of the set identities to help simplify the formula. \[ |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|. \]

Show me all the steps!
\begin{equation} \begin{aligned} | A \cup B \cup C | {} & = | (A \cup B) \cup C | \\ & = |A \cup B| + |C| - |(A \cup B) \cap C| \\ & = (|A| + |B| - |A \cap B|) + |C| - |(A \cup B) \cap C| \\ & = (|A| + |B| - |A \cap B|) + |C| - |(A \cap C) \cup (B \cap C)| \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |(A \cap C) \cap (B \cap C)|) \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |A \cap C \cap B \cap C|) \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |A \cap B \cap C|) \\ & = (|A| + |B| - |A \cap B|) + |C| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ & = |A| + |B| - |A \cap B| + |C| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ & = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ \end{aligned} \end{equation}
Example 23

Consider the set \(U = \{ n \in \mathbb{N} : 1 \leq n \leq 60 \}\) as in the previous example. How many elements of \(U\) are divisible by at least one of 2, 3, or 5?? How many elements of \(U\) are divisible by none of 2, 3, or 5?

As in the previous example, let \(A\) stand for the subset of \(U\) that consists of multiples of 2, let \(B\) stand for the subset of \(U\) that consists of multiples of 3, and now let \(C\) stand for the subset of \(U\) that consists of multiples of 5.

We have \(|A| = \frac{60}{2} = 30\), \(|B| = \frac{60}{3} = 20\), \(|C| = \frac{60}{5} = 12\), \(|A \cap B| = \frac{60}{6} = 10\), \(|A \cap C| = \frac{60}{10} = 6\), \(|B \cap C| = \frac{60}{15} = 4\), and \(|A \cap B \cap C| = \frac{60}{30} = 2\).

Apply Principle Of Inclusion-Exclusion to find the number of integers in \(U\) that are divisible by at least one of 2, 3, or 5:

\(|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|\)

\(|A \cup B \cup C| = 30 + 20 + 12 - 10 - 6 - 4 + 1 = 44\)

There are 44 integers in \(U\) that are divisible by at least one of 2, 3, or 5, and there are \(60 - 44 = 16\) integers in \(U\) that are divisible by none of 2, 3, or 5.

3.13. Cardinality Of Sets: Infinite Sets

Set \(A\) is called an infinite set if it not a finite set. That is, \(A\) is not the empty set, and for every positive integer \(n\) there is no one-to-one correspondence between \(A\) and \(\{1, 2, \ldots , n \}.\)

Intuitively an infinite set \(A\) is at least as big as the set of positive integers. You may think that \(A\) must have the same size as the set of positive integers, but cardinality is a much more …​ "interesting" concept for infinite sets, as you will see.

First, we will say that two infinite sets \(A\) and \(B\) have the same cardinality if and only if there is a one-to-one correspondence between the two sets. As an example, the set of positive integers and the set of negative integers have the same cardinality since each nonzero integer \(n\) can be paired with its additive inverse, \(-n.\)

For finite sets, if \(A\) is a proper subset of \(B\) then it must be true that the cardinality of \(A\) is not the same as the cardinality of \(B.\) This fails spectacularly for infinite sets as the next few examples show.

Example 24 - The Natural Numbers and The Positive Integers
NtoNstar

The image shows that there is a one-to-one correspondence between the natural numbers and the positive integers, so these two sets have the same cardinality. But this isn’t so bad, we only have one more number in the set of natural numbers, which is why we can just shift every thing over by 1 in the image.

Notice that this example also suggests why it really does not matter whether we start counting from 1 or 0…​ we can always "reindex" the counting if necessary.

Challenge
Write a formula for the function that represents the one-to-one correspondence between the natural numbers and the positive integers.
Hint
The function is a linear function. If you don’t remember how to find the equation of a linear function, see this appendix.
Example 25 - The Natural Numbers and The Integers
ZtoN

The image shows that there is a one-to-one correspondence between the set of all integers and the natural numbers.

NtoZ

The image shows the inverse of the one-to-one correspondence above, which is a one-to-one correspondence between the set of natural numbers and the set of integers.

These two sets have the same cardinality, which may surprise you since "intuitively" it would seem there should be about twice as many integers as natural numbers. However, this example shows that you can double (roughly) the size of an infinite set and get a new set that has the same cardinality.

Challenge
Write a formula for a function that represents one of the two one-to-one correspondences involving the natural numbers and the integers.
Hint
Either function will be defined by two linear expressions. The definition will be of the form \[ f(n) = \begin{cases} \text{some linear expression}, & \text{ if } n \geq 0 \\ \text{some linear expression}, & \text{ if } n < 0 \\ \end{cases} \] or \[ g(n) = \begin{cases} \text{some linear expression}, & \text{ if } n \text{ is even} \\ \text{some linear expression}, & \text{ if } n \text{ is odd} \\ \end{cases} \]
Example 26 - The Natural Numbers and Ordered Pairs of Natural Numbers
Cantor’s_Pairing_Function

This first image, which shows red points plotted in the \(xy\)-plane that have been labeled with natural numbers, suggests a way to define a one-to-one correspondence between the set of ordered pairs of natural numbers, \(\mathbb{N} × \mathbb{N},\) and the set of natural numbers \(\mathbb{N}.\)
Image credit: "Cantor’s Pairing Function" by crh23. The image is dedicated to the public domain under CC0.

PairsToSingles This second image displays the same one-to-one correspondence in tabular form.

In the first image, notice that for each fixed value of the second coordinate \(y \in \mathbb{N},\) the horizontal row of red points of the form \(\{ (x, y) : x \in \mathbb{N} \} = \{ (0, y), (1, y), (2, y), (3, y), \ldots \}\) has the same cardinality as \(\mathbb{N}\) and that there is one such row for every natural number \(y \in \mathbb{N}.\) That is, the set of rows has the same cardinality as the set \(\mathbb{N},\) and each of the rows has the same cardinality as \(\mathbb{N}.\) There are, in essence, as many copies of \(\mathbb{N}\) (the rows of red points) as there are elements in \(\mathbb{N},\) and these copies are joined together to form the Cartesian product \(\mathbb{N} × \mathbb{N}\)…​ but this set still has the same cardinality as \(\mathbb{N}.\)

Notice something else about this example: It shows that each pair of natural numbers can be encoded as a single natural number. In fact, this example can be generalized to show that any element in the set of all finite-length sequences of natural numbers can be encoded uniquely to a natural number (so, for example, the set of all possible finite-length strings of Unicode characters/code points can be encoded to the set of natural numbers, which may or may not be surprising to you.)

Mega-challenge!
Try to find an algebraic formula for the function \(n = f(x,y)\) that describes the one-to-one correspondence described by the images, then show that the function must map two different inputs (ordered pairs of natural numbers) to two different outputs (natural numbers) and also that every natural number is an output from the function for some input ordered pair of natural numbers.
Hint
It’s possible to write \(f(x,y)\) as a quadratic polynomial in the two variables \(x\) and \(y.\)

You may want to read about triangular numbers to get an idea of how the mapping of ordered pairs to numbers is being done. In the first image, the red points form a "triangle of infinite height" with a vertex at \((0,0)\) and sides lying along the \(x-\) and \(y-\)axes. "Row 0" of the triangle is the single point \((0,0),\) and "row \(n\)" of the triangle is made up of the red points with natural number coordinates \((x,y)\) that add up to \(n\) (that is, \(x+y = n\).)

A proof that this mapping of ordered pairs of natural numbers to individual natural numbers is in fact a one-to-one correspondence will be presented later in the textbook.

So far, every infinite set presented has the same cardinality as \(\mathbb{N}.\)

Maybe all infinite sets have the same cardinality as \(\mathbb{N}?\) Nope!

The next theorem shows that \(\mathcal{P}(\mathbb{N})\) cannot have the same cardinality as \(\mathbb{N}\) so there must be at least two "infinities."

Theorem

There is no one-to-one correspondence between \(\mathbb{N}\) and \(\mathcal{P}(\mathbb{N}).\)

Proof
This proof uses a technique called "Cantor’s diagonal argument" and is an example of the proof by contradiction technique that will be discussed later in the Proofs: Basic Techniques chapter.

SeqOfSets Let’s suppose that we had such a one-to-one correspondence. As shown in the image, we could represent the one-to-one correspondence by a sequence \(S_0 , S_1 , S_2 , \ldots\) of subsets of \(\mathbb{N},\) which is what the elements of \(\mathcal{P}(\mathbb{N})\) are. In the one-to-one correspondence, every subset of \(\mathbb{N}\) (that is, every element of \(\mathcal{P}(\mathbb{N})\)) appears as one of the \(S_{n}\) in the sequence: Every subset has been paired with a natural number and every natural number has been paired with a subset.

Next, define a subset \(M \subseteq \mathbb{N}\) as \[M = \{ n \in \mathbb{N} : n \not\in S_n \}\] That is, \(M\) is defined by the rule that for each natural number \(n,\) we have \(n \in M\) if and only if \(n \not\in S_{n}.\) So, for example, 0 is an element of \(M\) if 0 is not an element of \(S_0,\) but is not an element of \(M\) if 0 is an element of \(S_0.\) Likewise, 1 is an element of \(M\) if 1 is not an element of \(S_1,\) but is not an element of \(M\) if 1 is an element of \(S_1.\) The same is true for each of the natural numbers 2, 3, and so on: The natural number \(n\) is an element of exactly one of the sets \(S_n\) and \(M.\)

Now we show that \(M\) must be missing from the sequence \(S_0 , S_1 , S_2 , \ldots\)
\(M\) cannot be \(S_0\) since one of those sets contains 0 and the other one does not. \(M\) cannot be \(S_1\) since one of those sets contains 1 and the other one does not. The same must be true for each of the natural numbers 2, 3, and so on. So, for every natural number \(n,\) \(n\) is an element of exactly one of the two sets \(S_n\) and \(M,\) which means \(M \neq S_n\) is true for every natural number \(n.\) This means that \(M\) cannot be any of the sets in the sequence…​ it is missing!

We assumed that every subset is listed in the sequence, but just showed that there is some subset that is not listed in the sequence. Notice that, even if we tried to use a new sequence (for example, insert \(M\) at position 0 and shift all the other subsets over by adding 1 to their subscripts) the diagonal argument could be used to define another subset that is missing from the new sequence. So, every possible sequence of subsets must be missing at least one subset.

Therefore, such a one-to-one correspondence cannot exist.

3.13.1. Countable and Uncountable Sets

Set \(A\) is called countable if

  • \(A\) is a finite set or

  • there is a one-to-one correspondence between \(A\) and \(\mathbb{N}.\) In this case, \(A\) is also called countably infinite.

Set \(A\) is called uncountable if it is not a countable set. That is, \(A\) is infinite and there is no one-to-one correspondence between \(A\) and \(\mathbb{N}.\)

Several examples of countably infinite sets were given in the examples in the preceding subsection:

  • The set of positive integers \(\{ i \in \mathbb{N} \, | \, i > 0 \},\)

  • the set of integers \(\mathbb{Z},\) and

  • the set of ordered pairs of natural numbers, \(\mathbb{N} × \mathbb{N}.\)

On the other hand, the theorem in the preceding subsection shows that \(\mathcal{P}(\mathbb{N})\) is an uncountable set.

Infinite Cardinal Numbers

In advanced mathematics, the concept of "infinite cardinal number" is developed and used to represent the sizes of infinite sets. Mathematicians use these infinite cardinal numbers to make sense of cardinalities like \(|\mathbb{N}|\) and \(|\mathcal{P}(\mathbb{N})|.\) It can be proven that \[|\mathbb{Q}| = |\mathbb{N}|\] \[|\mathbb{N}| < |\mathcal{P}(\mathbb{N})|\] \[|\mathcal{P}(\mathbb{N})| = |\mathbb{R}|\] and that \[\text{for any infinite set } A, |A| < |\mathcal{P}(A)|\] which shows that there must be infinitely-many infinite cardinal numbers.

3.14. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with only minor changes.

  1. Consider as universal set, the set of all \(26\), lowercase letters of the English alphabet, \(U=\{a,b,c,…,v,w,x,y,z\}\), and the sets \(A=\{a,b,c,d,e,f,g,h\}\), \(B=\{f,g,h,i,j,k\}\), and \(C=\{x,y,z\}\). For the sets given below:

    1. List the sets below using roster form, and

    2. Draw Venn Diagrams for each of the sets

      1. \(A\cup B\)

      2. \(A\cap B\)

      3. \(A\cup C\)

      4. \(A\cap C\)

      5. \(A \setminus B\)

      6. \(B \setminus A\)

      7. \(A \setminus C\)

      8. \(C \setminus A\)

      9. \(A\cup C\)

      10. \(A\cap C\)

      11. \(\overline{A}\)

      12. \(\overline{B}\)

      13. \(\overline{C}\)

      14. \(\overline{B} \cap \overline{C}\)

      15. \( (\overline{A} \cap \overline{B}) \cup (\overline{B} \cap \overline{C})\)

  2. Using Venn Diagrams, determine which of the following are equivalent

    1. \(A \setminus (A \setminus B)),\)

      \(A\cup B,\) and

      \(A\cap B\)

    2. \(A\cup \overline{A},\)

      \(A\cap \overline{A},\)

      \(U,\) and

      \(\emptyset\)

    3. \(\overline{A}\cap \overline{B}, \)

      \(\overline{A\cap B},\)

      \(\overline{A}\cup \overline{B},\) and

      \(\overline{A\cup B}\)

    4. \(A\cup (B\cap C),\)

      \(A\cap (B\cup C),\)

      \((A\cap B)\cup (A\cap C),\) and

      \((A\cup B)\cap (A\cup C),\)

    5. \(\overline{\overline{A}\cup(C \setminus B) }),\)

      \(A\cap (B \cup \overline{C}),\) and

      \(A \setminus (C \setminus B)\)

  3. Write each of the following sets using set builder notation

    1. \(\{\ldots, -9, -7, -5, -3, -2, -1, 1, 3, 5, 7, 9, \ldots \}\)

    2. \(\{\ldots, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10,\ \ldots \}\)

    3. \(\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 \}\)

    4. \(\left\{ 1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\ldots \right\}\)

    5. \(\{0, 1, 4, 9, 16, 25, 36, 49, \ldots \}\)

    6. \(\{\ldots,-10,-6, -2, 2, 6, 10, 14, 18, 22, \ldots \}\)

    7. \(\{ 3, 9, 27, 81, 243,\ldots\}\)

    8. \(\{ 1, 9, 25, 49, 81, \ldots \}\)

  4. Write each of the following sets in roster form

    1. \(\{x \in \mathbb{R} : |2x+5|=7\}\)

    2. \(\{10n : n \in \mathbb{N}\}\)

    3. \(\{10n : n \in \mathbb{Z}\}\)

    4. \(\left\{2^n : n \in \mathbb{N}\right\}\)

    5. \(\left\{2^n : n \in \mathbb{Z}\right\}\)

    6. \(\left\{x \in \mathbb{R} : x^2=4\right\}\)

    7. \(\left\{x \in \mathbb{R} : x^3=64\right\}\)

    8. \(\left\{x \in \mathbb{Z} : x^2=5\right\}\)

    9. \(\left\{x \in \mathbb{R} : x^2= -4\right\}\)

    10. \(\left\{x \in \mathbb{Z} : |x-5|=3\right\}\)

    11. \(\left\{3n+4 : n \in \mathbb{N}\right\}\)

    12. \(\left\{3n+4 : n \in \mathbb{Z}\right\}\)

    13. \(\left\{i^n : n \in\mathbb{N}\right\}\), where \(i\) is such that \(i^2=-1\) (the imaginary unit).

  5. Consider the sets \(A=\{1, 3, 5, 7, 9, 11, 13, 15, 17\}\), \(B=\{2, 5, 7, 11\}\), and \(C=\{1, 2, 3\}\),

    1. Determine the cardinalities of following sets,

      1. \(|A|\)

      2. \(|A\cup B|\)

      3. \(|A\cap C|\)

      4. \(|\mathcal{P}(A)|\)

      5. \(|\mathcal{P}(B)|\)

      6. \(|\mathcal{P}(C)|\)

    2. Give the following power sets,

      1. \(\mathcal{P}(B)\)

      2. \(\mathcal{P}(C)\)

  6. Determine the cardinalities of following sets,

    1. \(\{n \in \mathbb{Z} : |n|\leq 10\}\)

    2. \(\{A,B, \emptyset,\{2,5,6\}\}\)

    3. \(\{\{A,B\},\{\},\{\{2,5,6\}\},\{\{2,5,6\},C\},\{A,B,C\}\}\)

    4. \(\{\{\{A,B\},\emptyset,\{\{2,5,6\},C\},\{A,B,C\}\}\}\)

  7. Consider the sets, \(B=\{0, 1\}\), \( S=\{spring, summer, fall, winter\}\), and \(C=\{ a, b, c, d,e\}\). For each of the following sets:

    1. Determine the following Cartesian products.

    2. Calculate the cardinality of each Cartesian product.

      1. \(B \times S\)

      2. \(S \times B\)

      3. \(B \times C\)

      4. \(C \times B\)

      5. \(B \times B \times B \times B\)

      6. \(S \times B \times B\)

  8. Determine the following power sets,

    1. \(\mathcal{P}(\{Alabama, Georgia, Florida, Louisiana\} )\)

    2. \(\mathcal{P}(\emptyset )\)

    3. \(\mathcal{P}(\{\emptyset\} )\)

    4. \(\mathcal{P}(\{Alabama \} )\)

    5. \(\mathcal{P}(\{Alabama, Georgia, Florida \} )\)

    6. \(\mathcal{P}(\{\{Alabama, Georgia \}, \{Florida \} \} )\)

  9. Write the shaded regions in each of the following Venn diagrams using set notation.

    GGC
  10. Determine if each of the following are true or false. Explain your reasoning.

    1. \(\{7,4,6,2,11,3,5\}\subseteq \{1,2,3,4,5,6,7,8,9,10,11,12,13\}\)

    2. \(\{1,2,3,4,5,6,7,8,9,10,11,12,13\}\subseteq \{7,4,6,2,11,3,5\}\)

    3. \(\{7,4,6,2,11,3,5\}\subseteq \{7,4,6,2,11,3,5\}\)

    4. \(\{3,8\}\nsubseteq \{7,4,6,2,11,3,5\}\)

    5. \( \{3n+4 : n \in \mathbb{N}\} \nsubseteq \mathbb{Z}\)

    6. \(\mathbb{N}\subseteq \mathbb{Z}\subseteq \mathbb{Q}\subseteq \mathbb{R}\)

    7. \(\{x \in \mathbb{R} : |x|<3\}\subseteq \{x \in \mathbb{R} \, | \, |x|<5\}\)

    8. \(\{x \in \mathbb{R} : |x|>3\}\subseteq \{x \in \mathbb{R} \, | \, |x|>5\}\)

4. Logic

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on February 18, 2025.
Added link to DNF and CNF website
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Logic is the study of reasoning. Logic is used to create, analyze, and validate arguments, where an argument is a finite sequence of statements that ends with a conclusion based on inferences made from earlier statements in the argument.

Among the applications of logic to computer science are design of electronic circuits and validation of algorithms and programs.

Key terms and concepts covered in this chapter:

  • Propositional logic (also called "propositional calculus")

    • Logical operators (also called "logical connectives")

      • Negation ("not")

      • Conjunction ("and")

      • Disjunction ("or")

      • Conditional ("implication")

        • The converse, inverse, and contrapositive of a conditional

      • Biconditional ("if and only if," abbreviated as "iff")

    • Truth tables

    • Well-formed formulas

    • Satisfiability, tautology, and contradiction

    • Normal forms (conjunctive and disjunctive)

  • Predicate logic

    • Predicates as "statement-valued functions"

    • Quantification of Predicates

      • Universal quantifier

      • Existential quantifier

  • To be added to this chapter after May 23, 2025:

    • Limitations of propositional and predicate logic (e.g., expressiveness issues)

    • Boolean algebra and Boolean circuits

4.1. Propositional Logic

A proposition is a statement that declares a fact that is either True or False (but not both!)

Propositions
  • Atlanta is the capital of California.

  • \(1 + 1 = 2\)

  • \(1 + 1 = 3\)

Not propositions
  • How much is this cookie?

  • Please sit down.

  • Wow!

  • This sentence is false.

  • \(x + 1 = y\)

Propositional logic consists of a set of formal rules for combining propositions in order to derive new propositions.

A goal of propositional logic is to have a method for creating valid arguments that are sequences of propositions, where the correctness and validity of the argument is based solely on the propositions' truth values (True and False), ignoring the actual content of the propositions. Compare this to doing algebra: You can write \(2 (x + 3 y) = 2x + 6y\) because it is a correct and valid step to distribute multiplication over addition. You can do the algebra and ignore the specific numerical values (the "numerical content") that \(x\) and \(y\) stand for.

In propositional logic, it is traditional to use propositional variables such as p, q, and r to stand for the possible assignments of truth values to propositions; often, the propositional variables themselves are referred to as the propositions. Again, compare this to the algebraic example where you can treat \(x\) and \(y\) as numbers even though they are actually variables that stand for numbers.

What is the advantage of using symbols? A long time ago, philosophers discovered that is easier to follow lines of reasoning by putting our thoughts into symbols. This was an important step in the eventual development of modern technological society and, in particular, electronic computers. Before a computer can do its work, humans need to put our thoughts into them; however, a spoken language like English can be too difficult to use because many different phrases can represent the same logical statements.

Example 1 - Generalizing and abstracting an argument

Consider the following argument consisting of three propositions:

  • Sarah earned a B.S. in Computer Science.

  • Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.

  • Therefore, Sarah earned a C or better in Discrete Math.

This argument is valid: If you ASSUME that the first two propositions are True, then you can correctly conclude that the third proposition (which follows the introductory phrase "Therefore,") must be True as well.

Notice that you could change the name "Sarah" to "Daniel" without affecting the validity of the argument.

You can generalize the argument by changing "Sarah" to "the student." In fact, you can generalize much more by noticing that the form of the argument matches the following argument.

  • Individual X is a member of category A

  • Any individual that is a member of category A must also be a member of category B.

  • Therefore, individual X is a member of category B

You can make this argument completely abstract using propositional variables.

  • p is true.

  • The implication "if p then q" is true, too.

  • Therefore, q is true

You can build compound propositions (also called "propositional functions") from simpler propositions. For example, in the preceding example, the compound propostion "if p then q" was introduced as a new proposition built from the propositional variables p and q. In the next section, you will learn how to represent compound propositions using symbols.

4.2. Logical Operations and Truth Tables

In this section, compound propostions will be represented by using propositional variables and logical operator symbols (also called "logical operations" or "logical connectives") Once again, you can compare this to how numerical and algebraic relationships can be represented in symbols using algebraic expressions built with the usual arithmetic operator symbols with variables and numerals.

In Python, we use \(\texttt{Boolean}\) variables to represent propositions and define functions for each compound proposition. Each compound proposition can implemented using the \(\texttt{Boolean}\) operations \(\texttt{not}\), \(\texttt{and}\), and \(\texttt{or}\) discussed in the section Operators and Expressions in the chapter "Appendix: An Introduction to Python".

A truth table can be used to display the truth values of a compound proposition that is built from propositional variables and logical operators. A truth table is created with rows representing all possible interpretations of the propositional variables, that is, all possible assignments of truth values to the propositional variables. Each column of a truth table displays the truth values for either one of the propositional variables or a compound proposition built up from propositional variables and/or simpler compound propositions. As an analogy, think of how a table can be used to display the numerical input and output values of a function represented by an algebraic expression.

The most commonly-used logical operators are described in the rest of this section.

4.2.1. Negation

"I am not an astronaut."

The negation of a proposition \(p\), denoted in mathematics by \(\neg p\) and read as "not \(p\)", is the proposition "It is not the case that \(p\)". The proposition \(\neg p\) has the opposite truth value to \(p.\)
Other textbooks and sources may use \(\overline{p}\) or \(\sim \! p\) to represent \(\neg p.\)

\(p\) \(\neg p\)

True

False

False

True

For example, the negation of the proposition "Today is Friday." would be "It is not the case that today is Friday." or more succinctly "Today is not Friday."
For a proposition \(p,\) exactly one of \(p\) and \(\neg p\) is True and exactly one is False. Two propositions can be contrary (that is, they could not both be True) without being negations of each other: As an example, both of the propositions "Today is Friday." and "Today is Saturday." could be False, so "Today is Saturday." is not the negation of "Today is Friday."

Notice that the two propositions \(p\) and \(\neg (\neg p)\) must always have the same truth value (You can see this by inserting a column for \(\neg (\neg p)\) in the truth table shown earlier in this subsection.)

In the first few truth tables in this chapter, "True" and "False" are spelled out, but it is more often the case that these words are abbreviated to their first letters, "T" and "F" in truth tables.
Example 2 - Negation in Python

The code below prints the truth table for negation. Note that the values True and False are constants in Python, and that not p implements the negation \(\neg p\) in Python.

Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

4.2.2. Conjunction

"I am a rock and I am an island."

Let \(p\) and \(q\) be propositions. The conjunction of \(p\) and \(q\), denoted in mathematics by \(p \land q\) and read as "\(p\) and \(q\)", is True when both \(p\) and \(q\) are True and is False otherwise.

\(p\) \(q\) \(p \land q\)

True

True

True

True

False

False

False

True

False

False

False

False

Notice that the two propositions \(p \land q\) and \(q \land p\) always have the same truth value.

Example 3 - Conjunction in Python

The code below prints the truth table for conjunction. Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

4.2.3. Disjunction

"They studied hard or they are extremely bright."

Let \(p\) and \(q\) be propositions. The disjunction of \(p\) and \(q\), denoted in mathematics by \(p \lor q\) and read as "\(p\) or \(q\)", is True when at least one of \(p\) and \(q\) are True and is False otherwise.

\(p\) \(q\) \(p \lor q\)

True

True

True

True

False

True

False

True

True

False

False

False

Notice that the two propositions \(p \lor q\) and \(q \lor p\) always have the same truth value.

Example 4 - Disjunction in Python

The code below prints the truth table for disjunction. Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

4.2.4. Conditional

"If you get a 100 on the final exam, then you earn an A in the class."

Let \(p\) and \(q\) be propositions. The conditional statement \(p \rightarrow q\), read as "if p then q", "p implies q", or, more formally, "the conditional with hypothesis p and conclusion q", is the proposition that is False when p is True and q is False, and True otherwise. The conditional statement \(p \rightarrow q\) is also called "the implication \(p \rightarrow q\)".
The conditional \(p \rightarrow q\) can also be denoted by \(p \Rightarrow q\) or \(p \implies q.\) In addition, there are many other ways to express the conditional \(p \rightarrow q\) in English, two of which are "p only if q" and "q if p".

\(p\) \(q\) \(p \rightarrow q\)

True

True

True

True

False

False

False

True

True

False

False

True

  • \(p \rightarrow q\) and \(q \rightarrow p\) do NOT always have the same truth value!

  • The conditional can be considered a "contract" which fails only when the conditions are met and the results are not fulfilled.

  • The conditional may or may not represent a "cause-and-effect" relationship. For example, the conditional "if Shakespeare wrote Hamlet then \(2 + 2 = 4\)" is a True proposition because the conclusion "\(2 + 2 = 4\)" is True, but the arithmetic equation is not an effect that was caused by the authorship of Hamlet.

Example 5 - You Try: Conditional in Python

Complete the code below by clicking one of the "edit" links then replacing \(#FIX ME#\) with an expression involving \(p\), \(q\), and some of the Python operators not, and, and or. Once correctly defined, the correct truth table for the conditional statement should print.

The Converse, Contrapositive and Inverse of a Conditional Statement

Given propositions p and q, we can form three additional compound propositions that are related to the conditional \(p \rightarrow q\):

  • \(q \rightarrow p\), called the converse of \(p \rightarrow q\)

  • \( \neg q \rightarrow \neg p\), called the contrapositive of \(p \rightarrow q\)

  • \( \neg p \rightarrow \neg q\), called the inverse of \(p \rightarrow q\)

The extended truth table for the conditional and the three related propositions is shown below.

\(p\) \(q\) \(p \rightarrow q\) (conditional) \(q \rightarrow p \) (converse) \( \neg q \rightarrow \neg p\) (contrapositive) \( \neg p \rightarrow \neg q\) (inverse)

True

True

True

True

True

True

True

False

False

True

False

True

False

True

True

False

True

False

False

False

True

True

True

True

From the truth table it can be seen that

  • \(p \rightarrow q\) and the converse \(q \rightarrow p\) do NOT always have the same truth value!

  • \(p \rightarrow q\) and its contrapositive \( \neg q \rightarrow \neg p\) MUST have the same truth value!

  • The converse \(q \rightarrow p\) and the inverse \( \neg p \rightarrow \neg q\) MUST have the same truth value.

In the section on logically equivalent propositions we will discuss the bullet points in the preceding note in more detail.

The next example illustrates these four propositions.

Example 6 - Conditional, Converse, Contrapositive and Inverse.
  1. Translate the statement "If the number of students in class is divisible by 4, then the number of students in class is divisible by 2" using a conditional.

  2. Form and translate the converse, contrapositive, and inverse.

Solution
  1. Let

    \(p\) be the proposition "The number of students in class is divisible by 4."

    \(q\) be the proposition "The number of students in class is divisible by 2."

    The conditional \(p\rightarrow q\) translates as "If the number of students in class is divisible by 4, then the number of students in class is divisible by 2."

  2. The converse \(q \rightarrow p\) may be translated as "If the number of students in class is divisible by 2, then the number of students in class is divisible by 4."

    The contrapositive \( \neg q \rightarrow \neg p\) may be translated as "If the number of students in class is not divisible by 2, then the number of students in class is not divisible by 4."

    The inverse \( \neg p \rightarrow \neg q\) may be translated as "If the number of students in class is not divisible by 4, then the number of students in class is not divisible by 2."

Notice that in this example, the conditional must be True, based on properties of factors of integers, but its converse could be False: Consider the case where the number of students in class is equal to 26, so \(p\) is False and \(q\) is True, and \(p\rightarrow q\) is True but \(q\rightarrow p\) is False. This also shows, again, that the conditional need not represent a "cause-and-effect" relationship, since NOT being "divisible by 4" does not let us conclude anything about being "divisible by 2".

4.2.5. Biconditional

"It is raining outside if and only if it is a cloudy day."

Let \(p\) and \(q\) be propositions. The biconditional \(p \leftrightarrow q\), read as "p if and only if q", is the proposition that is True when p and q have the same truth value, and False otherwise. The biconditional is also called "the bi-implication". Note that \(p \leftrightarrow q\) can also be denoted by \(p \Leftrightarrow q\) or \(p \iff q.\)

\(p\) \(q\) \(p \leftrightarrow q\)

True

True

True

True

False

False

False

True

False

False

False

True

You can use a truth table to show that two propositions \(p \leftrightarrow q\) and \(q \leftrightarrow p\) always have the same truth value.

The biconditional \(p \leftrightarrow q\) is read as "p if and only if q" because it has the same truth table as the conjunction of the two conditionals "p if q" (that is, \(q \rightarrow p\)) and "p only if q" (that is, \(p \rightarrow q\)).

Example 7 - You Try: Biconditional in Python

Complete the code below by clicking one of the "edit" links then replacing \(#FIX ME#\) with an expression involving \(p\), \(q\), and some of the Python operators not, and, and or. Once correctly defined, the correct truth table for the biconditional statement should print.

It is important to contrast the conditional with the biconditional. Consider the conditional example "If you get a 100 on the final exam, then you earn an A in the class." This means that when you get a 100 on the final you also get an A in the class. The conditional represents a one-way contract: You earn an A in the class if you get a 100 on the final exam. There is nothing said about the result (the grade you earn in the class) if you do NOT meet the condition (get a 100 on the final exam).

As a biconditional the example would say "You get a 100 on the final exam if and only if you earn an A in the class." This becomes a two-way contract: You earn an A in the class if you get a 100 on the final, but you do not earn an A in the class if you do not get a 100 on the final.

4.2.6. Other Compound Propositions

The negation, disjunction, conjunction, conditional, and biconditional are the most commonly-used logical operators for forming compound propostions and will be the ones used throughout the rest of this chapter. However, there are at least three others you should know about.

Exclusive Disjunction

"I took either 2 Advil or 2 Tylenol."

Let \(p\) and \(q\) be propositions. The exclusive disjunction of \(p\) and \(q\) (also known as xor), denoted in mathematics by \(p \oplus q\), is True when exactly one of \(p\) and \(q\) are True and False otherwise.

\(p\) \(q\) \(p \oplus q\)

True

True

False

True

False

True

False

True

True

False

False

False

Notice that the two propositions \(p \oplus q\) and \(q \oplus p\) always have the same truth value.

The NAND and NOR Operators

The NAND and NOR operators correspond to two important digital logic gates used in electronic devices.

NAND

"An onion is not both a fruit and a vegetable."

Let \(p\) and \(q\) be propositions. In this textbook, the NAND of \(p\) and \(q\) is denoted by \(p \uparrow q\) and is False when both \(p\) and \(q\) are True and is True otherwise. That is, the NAND is the negation of \(p \land q\) - think of NAND as "Not AND."

\(p\) \(q\) \(p \uparrow q\)

True

True

False

True

False

True

False

True

True

False

False

True

Notice that the two propositions \(p \uparrow q\) and \(q \uparrow p\) always have the same truth value.

NOR

"This pen’s ink is neither red nor blue."

Let \(p\) and \(q\) be propositions. In this textbook, the NOR of \(p\) and \(q\) is denoted by \(p \downarrow q\) and is True when both of \(p\) and \(q\) are False and is False otherwise. The NOR is also referred to as "joint denial" since it is True exactly when neither \(p\) nor \(q\) is True. That is, the NOR is the negation of \(p \lor q.\)

\(p\) \(q\) \(p \downarrow q\)

True

True

False

True

False

False

False

True

False

False

False

True

Notice that the two propositions \(p \downarrow q\) and \(q \downarrow p\) always have the same truth value.

4.2.7. Well-Formed Formulae and Operator Precedence (Order Of Operations)

A well-formed formula (or wff for short) is a string of symbols that represents a compound propsition.

Here is a recursive definition of wff:

  • A propositional variable is a well-formed formula.

  • If \(\alpha\) ("alpha") and \(\beta\) ("beta") are well-formed formulas, then the following are also well-formed formulas:

    • \(\left( \neg \alpha \right)\)

    • \(\left( \alpha \land \beta \right)\)

    • \(\left( \alpha \lor \beta \right)\)

    • \(\left( \alpha \rightarrow \beta \right)\)

    • \(\left( \alpha \leftrightarrow \beta \right)\)

The definition of wff allows you or, even better, a computer to analyze any string of symbols to determine whether the string of symbols is a wff. For example, \((p \land q \lor r)\) isn’t a wff but both \((p \land (q \lor r))\) and \(((p \land q) \lor r)\) are wffs. You could write code to implement an algorithm to validate a string as a wff.

It can be shown that every compound proposition can be represented by at least one wff. However, a wff may be difficult to read quickly if it contains many parentheses. As an example, it is not easy to read \(( (p \rightarrow q) \lor ( (\neg r) \land (s \leftrightarrow t) ) )\). For this reason, we can introduce operator precedence rules that allow us to eliminate some of the parentheses.

To evaluate a compound proposition, we start by evaluating

  • all expressions enclosed in parentheses from left to right, then

  • all negations from left to right, then

  • all conjunctions from left to right, then

  • all disjunctions from left to right, then

  • all conditionals from left to right, and finally

  • all biconditionals from left to right.

This allows us to drop some parentheses from a wff that represents a compound proposition.

For example, the compound proposition \(\neg p \lor q \land r \rightarrow s\) represents the same proposition as the wff \((((\neg p) \lor (q \land r)) \rightarrow s)\). At least some of the parentheses must be used if you want to represent a different proposition such as \((\neg p \lor q) \land (r \rightarrow s)\).

4.2.8. Truth Tables of Compound Propositions

To compute the truth values of a longer compound proposition or wff by hand, it can be useful to break up the proposition or wff into the smaller propositions or wffs that it was built from.

Example 8

The code below reveals the truth table of the compound proposition:

\((p \land q) \lor \neg q\)

Recall: \(\neg q\) is mathematical shorthand for not q.

You Try

Edit the code above to reveal the truth value of the compound proposition:

\((p \lor \neg q) \land \neg p\)

Hint: You only need to change line 10.

When creating your own truth table it is crucial to be systematic about ensuring you have all possible truth values for each of the simple propositions. Each simple proposition has two possible truth values, so the number of rows in the table should be \(2^n\) where \(n\) is the number of propositions (Do you recall why the number of rows must be \(2^n\)?) You should also consider breaking complex propositions into smaller pieces.

Example 9

Create a truth table for the compound proposition:

\((p \land q) \rightarrow (p \land r)\) for all values of \(p, q, r\).

Solution

It should have 8 rows - since there are three simple propositions and each one has two possible truth values.

\(p\) \(q\) \(r\) \(p \land q\) \(p \land r\) \((p \land q) \rightarrow (p \land r)\)

T

T

T

T

T

T

T

T

F

T

F

F

T

F

T

F

T

T

T

F

F

F

F

T

F

T

T

F

F

T

F

T

F

F

F

T

F

F

T

F

F

T

F

F

F

F

F

T

4.3. Logically Equivalent Propositions

Recall that an interpretation of a proposition is an assignment of truth values to the propositional variables.

Two propositions are considered logically equivalent (or simply equivalent) if they have the same truth values for every possible interpretation. It is often easiest to see this by constructing a truth table for the two propositions and comparing.

Example 10

Consider the propositions \(\neg p \lor q\) versus \(p\rightarrow q\).

\(p\) \(q\) \(\neg p \lor q\) \(p \rightarrow q\)

True

True

True

True

True

False

False

False

False

True

True

True

False

False

True

True

Since the truth table in all rows is the same for the two compound propositions, they are equivalent.

We use the symbol \(\equiv\) to denote that two propositions are logically equivalent. So in the preceding example, we would write \(\neg p \lor q \equiv p\rightarrow q\).

\(\equiv\) is NOT a logical operator used to build compound propositions, but instead is used to say that two propositions are logically equivalent. This is similar to how \(=\) is used in arithmetic: We can write \(2 + 2 = 5 - 1\) to say that \(2 + 2\) and \(5 - 1\) are numerically equivalent, but we don’t use the \(=\) sign as an arithmetic operator to actually do any arithmetic.
Saying that two propositions p and q are logically equivalent is the same as saying that the biconditional compound proposition \(p \leftrightarrow q\) is always True.
Example 11

Consider three compound propositions:

  1. \((p\land q) \rightarrow r\)

  2. \((p \rightarrow q) \land (p \rightarrow r)\)

  3. \(p \rightarrow (q \land r)\)

The code below reveals the truth table for 1. Modify it for 2 and 3 in order to determine which set of compound propositions are equivalent.

Hint: You only need to change line 11.

4.3.1. Tautologies, Contradictions and Contingencies

A proposition is a tautology if its truth value is always True. That is, a tautology is True for every possible interpretation of its propositional variables.

A proposition is called satisfiable if there is at least one interpretation for which the proposition is True.

A proposition is unsatisfiable if there is no interpretation for which the proposition is True.

A proposition is a contradiction if its truth value is always False. That is, a contradiction is False for every possible interpretation of its propositional variables. This is just another way of saying that it is unsatisfiable.

A proposition that is neither a tautology nor a contradiction is said to be a contingency since its truth value can be either True or False, contingent on the truth value assigned to its propositional variables.

Example 12 - Tautology and Contradiction

\(p \lor \neg p\) is an example of a tautology.

\(p \land \neg p\) is an example of a contradiction.

This can be seen in the truth table.

\(p\) \(\neg p\) \( p \lor \neg p\) \(p \land \neg p\)

True

False

True

False

False

True

True

False

Notice that the truth values for \(p \lor \neg p\) are all True and \(p \land \neg p\) are all False.

The two compound propositions in the previous example are so important that they have their own names,

The Law of Excluded Middle

Given any proposition \(p\), the compound proposition \[p \lor \neg p\] is a tautology (that is, the compound proposition is always True.)

The Law of Contradiction

Given any proposition \(p\), the compound proposition \[p \land \neg p\] is a contradiction (that is, the compound proposition is always False.)

4.3.2. De Morgan’s Laws

Two important logical equivalences are De Morgan’s Law. These describe how to "distribute" the \(\neg\) operator across the \(\land\) and \(\lor\) operators.

De Morgan’s Laws

\(\neg (p \land q)\equiv \neg p \lor \neg q\)

\(\neg (p \lor q)\equiv \neg p \land \neg q\)

De Morgan’s Laws can be verified by creating truth tables for \(\neg (p \land q) \leftrightarrow \neg p \lor \neg q\) and \(\neg (p \lor q) \leftrightarrow \neg p \land \neg q\) to show that these propositions are True for every interpretation of \(p\) and \(q\).

4.3.3. Some Other Logical Equivalencies

Here is a collection of additional equivalencies of compound propositions. Each of these can be verified by constructing a truth table to show that the biconditional of the left-hand side and the right-hand side of the logical equivalence is true for all interpretations of the propositional variables.

Double Negation: \[ p \equiv \neg (\neg p) \]

Commutative laws: \[ p \lor q \equiv q \lor p \] \[ p \land q \equiv q \land p \]

Associative laws: \[ p \lor (q \lor r) \equiv (p \lor q) \lor r \] \[ p \land (q \land r) \equiv (p \land q) \land r \]

Distributive laws: \[ p \lor (q \land r) \equiv (p \lor q) \land (p \lor r) \] \[ p \land (q \lor r) \equiv (p \land q) \lor (p \land r) \]

4.3.4. Disjunctive Normal Form (DNF)

It is traditional to focus on negation \(\neg\), conjunction \(\land\), and disjunction \(\lor\) as the three primary logical operations. This is because any compound proposition can be rewritten in terms of these three operations and the propositional variables present in the original compound proposition.

One way to justify this is by using an expression in disjunctive normal form (DNF), which is a disjunction of one or more conjunctions, where only one of the conjunctions can be true for any interpretation of the propositional variables. This description should become clearer after reading the following example.

Example 13 - Finding A Logically Equivalent Proposition (DNF) From A Truth Table

Suppose we have a truth table for an unknown compound proposition. Perhaps someone wrote the truth table but did not write down the expression for the compound proposition in the header of the rightmost column.

\(p\) \(q\) \(r\) \(\text{unknown}\)

T

T

T

F

T

T

F

F

T

F

T

T

T

F

F

F

F

T

T

F

F

T

F

T

F

F

T

T

F

F

F

F

We can write a new compound proposition that is equivalent to the unknown one, using the propositional variables \(p\), \(q\), and \(r\) and the logical operators \(\neg\), \(\land\) and \(\lor\) as follows:

  • For each row of the truth table that has T in the rightmost column, write the conjunction that would have a T in only that one row of its truth table.

  • Form the disjunction of all the conjunctions found in the previous step. This new expression is called a disjunctive normal form (DNF) for the unknown proposition.

For the truth table above, we have three rows with T in the rightmost column. The first of the three rows corresponds to \(p \land \neg q \land r\), which is only True if \(p\) is True, \(q\) is False, and \(r\) is True. In the same way, the second of the three rows corresponds to \(\neg p \land q \land \neg r\), and the third of the three rows corresponds to \(\neg p \land \neg q \land r\). We now form the disjunction of these three expressions. \[(p \land \neg q \land r) \lor (\neg p \land q \land \neg r) \lor (\neg p \land \neg q \land r)\]

This new compound proposition has a truth table that is the same as the one for the unknown proposition. This means that the expression we found is logically equivalent to the unknown proposition.

\(p\) \(q\) \(r\) \(\text{unknown}\) \((p \land \neg q \land r) \lor (\neg p \land q \land \neg r) \lor (\neg p \land \neg q \land r)\)

T

T

T

F

F

T

T

F

F

F

T

F

T

T

T

T

F

F

F

F

F

T

T

F

F

F

T

F

T

T

F

F

T

T

T

F

F

F

F

F

4.3.5. Conjunctive Normal Form (CNF)

In some applications of propositional logic, it is more useful to find a logically equivalent expression for a given proposition that is written as a conjunction of several disjunctions. This conjunctive normal form (CNF) can be constructed as shown in the following example.

Example 14 - Finding A Logically Equivalent Proposition (CNF) From A Truth Table

Consider the same unknown proposition we used in the previous example.

\(p\) \(q\) \(r\) \(\text{unknown}\)

T

T

T

F

T

T

F

F

T

F

T

T

T

F

F

F

F

T

T

F

F

T

F

T

F

F

T

T

F

F

F

F

One way to find a CNF is as follows.

  • Find the disjunctive normal form for the negation of the unknown proposition.

  • Apply De Morgan’s Laws to the DNF for the negation of the unknown proposition found in the first step - the result will be a CNF for the double negation of the unknown proposition (which is logically equivalent to the unknown proposition).

\(p\) \(q\) \(r\) \(\text{unknown}\) \(\neg \text{unknown}\)

T

T

T

F

T

T

T

F

F

T

T

F

T

T

F

T

F

F

F

T

F

T

T

F

T

F

T

F

T

F

F

F

T

T

F

F

F

F

F

T

From the truth table above, we obtain the following DNF for the negation of the unknown proposition: \((p \land q \land r) \lor (p \land q \land \neg r) \lor (p \land \neg q \land \neg r) \lor (\neg p \land q \land r) \lor (\neg p \land \neg q \land \neg r)\).

Next, we negate the DNF, using De Morgan’s Laws, and simplify the resulting expression \[\neg [ (p \land q \land r) \lor (p \land q \land \neg r) \lor (p \land \neg q \land \neg r) \lor (\neg p \land q \land r) \lor (\neg p \land \neg q \land \neg r) ],\] which simplifies to the CNF we wanted to find, \[(\neg p \lor \neg q \lor \neg r) \land (\neg p \lor \neg q \lor r) \land (\neg p \lor q \lor r) \land (p \lor \neg q \lor \neg r) \land (p \lor q \lor r).\]

The last expression is logically equivalent to the unknown proposition.

Here is a website that allows you to build the DNF and CNF for a given propositional function.

4.3.6. Functional Completeness

A set \(S\) of logical operators is called functionally complete if every compound proposition is logically equivalent to a compound proposition involving only operators that are members of \(S.\)

Theorem

The set \(\{ \neg , \land , \lor \}\) is functionally complete.

Proof
An informal justification can use the method shown in the previous examples for finding a DNF or CNF. A formal proof requires the mathematical induction proof technique and the recursive defintion of well-formed formulae given earlier in this chapter.

The importance of the NAND and NOR in electronic circuits arises from the following theorem.

Theorem
  1. The set \(\{ \uparrow \}\) is functionally complete. That is, any compound proposition is logically equivalent to a compound proposition involving the same variables and only the \(\uparrow\) operator.

  2. The set \(\{ \downarrow \}\) is functionally complete. That is, any compound proposition is logically equivalent to a compound proposition involving the same variables and only the \(\downarrow\) operator.

Proof
These proofs are exercises for you. See the Challenge Exercises at the end of this chapter.

4.4. Predicates and Quantifiers

Up to this point, most of our propositions have been of the form "Sarah earned a B.S. in Computer Science" - the proposition describes a single individual constant (in this case, "Sarah.")

However, we often need to discuss an entire category of individuals at once, which is equivalent to replacing the constant "Sarah" by a variable. We will discuss this idea in this section.

4.4.1. Predicates

A predicate is a statement that includes one or more variables such that when values are assigned to the variables the predicate becomes a proposition.

Example 15 - Predicates
  • \(x \leq 3\)

  • Computer \(c\) is infected.

  • Country \(x\) is on continent \(y\).

Predicates are denoted as \(P(x)\) or \(Q(x,y)\) where \(P\) and \(Q\) represent the statements and \(x\) and \(y\) are variables. After a value is assigned to each variable, the predicate becomes a proposition which has a truth value. That is, we "evaluate" a predicate by substituting inputs into the variables and get a proposition as the output.

Example 16

Let \(P(x)\) be the predicate \(x \leq 3\).

What are the propositions \(P(5)\) and \(P(2)\)? What are the truth values of \(P(5)\) and \(P(2)\)?

Example 17

Let \(P(x)\) be the predicate "The sum of the first \(n\) positive odd integers is equal to \(n^{2}\)."

What are the propositions \(P(1{\small,}000)\) and \(P(1{\small,}000{\small,}000)\) ? Notice that code correctly outputs the two propositions as strings (of type str in Python). The predicate does not tell us whether the propositions it outputs are True or False.

Example 18

Let \(Q(x,y)\) be the statement \(x-y=4\).

The Python code displays each of the three propositions \(Q(6,2)\), \(Q(1,5)\), and \(Q(-2,2)\) and describes their truth values.

4.4.2. Quantifiers

Consider the statements

  • For all integers \(x\), \(x^2\geq 0\).

  • Some student in the class has a birthday in July.

Each of these statements considers a proposition over an entire population or set, called the domain, and describes whether at least one element, or all of the elements in the domain satisfy the proposition. There are two commonly-used quantifiers, the universal quantifier and the existential quantifier.
The domain is also called the domain of discourse or the universe of discourse.

The Universal Quantifier, \(\forall,\) represents the statement "for all", "for every", "for each". When it comes before a statement, it means that statement is true for all values in the domain.

Example 19

Universal Quantifier \(\forall x, x + 1 \gt x\)

Let \(P(x)\) be the statement \(x + 1 \gt x\). Is this true for all integers x?

We use the example domain [-2, -1, 0, 1, 2] because code can not check all integers.
Example 20

Universal Quantifier \(\forall x, x + x \gt x\)

Let \(P(x)\) be the statement \(x + x \gt x\). Is this true for all integers x?

The Existential Quantifier, \(\exists,\) represents the statement "there exists", "for some", "at least one". When it comes before a statement, it means the statement is true for at least one value in the domain.

Example 21

Existential Quantifier \(\exists x, x^2 = 4\)

Let \(P(x)\) be the statement \(x^2 = 4\). Is this true for at least one integer x?

Example 22

Existential Quantifier \(\exists x, x^3 = 4\)

Let \(P(x)\) be the statement \(x^3 = 4\). Is this true for at least one integer x?

Again, we use the example domain [-2, -1, 0, 1, 2] because code can not check all integers.

Recall the previous example statements:

  • For all integers \(x\), \(x^2 \geq 0\).

Let \(P(x)\) be the predicate "\(x^2 \geq 0\)". Then we write the statement as \(\forall x P(x)\), where the domain is the set of all integers. This quantified statement will be true since anytime you square a nonzero integer it is positive and \(0^2=0\).

  • Some student in the class has a birthday in July.

Let \(Q(s)\) be the predicate "student \(s\) has a birthday in July". Then we write the statement as \(\exists s Q(s)\), where the domain is the set of all students in the class. This statement will be true as long as at least one student in the class has a birthday in July. It will be false, otherwise.

4.4.3. Negation of Quantifiers

It is important to consider the negation of a quantified expression.

  • "Every student in this class has taken Programming Fundamentals."

This is a universally quantified statement and can be expressed as \(\forall x P(x)\) where \(P(x)\) is the statement "\(x\) has taken Programming Fundamentals" and the domain consists of all the students in this class. The negation of the statement would be "It is not true that every student in this has taken Programming Fundamentals." Equivalently,

  • "There is a student in this class who has NOT taken Programming Fundamentals."

This is an existentially quantified statement expressed as \(\exists x \neg P(x)\).

This demonstrates that the negation of a universally quantified statement is an existential statement. In symbols, we have \(\neg \forall x P(x)\equiv \exists x \neg P(x)\).

Similarly, the negation of an existential statement is a universal statement. \(\neg \exists x P(x) \equiv \forall x \neg P(x)\).

De Morgan’s Laws with Quantifiers

For any predicate \(P(x)\) \[\neg \forall x P(x)\equiv \exists x \neg P(x)\] and \[\neg \exists x P(x) \equiv \forall x \neg P(x)\]

Example 23
  • Someone in the class can speak Latin.

Using quantifiers, we write this statement as \(\exists x L(x)\) where \(L(x)\) is the proposition "\(x\) speaks Latin." and the domain is the students in the class. Its negation would be \(\forall x \neg L(x)\).

  • All the students in the class can not speak Latin.

You Try

Find the negation of the statement "For all integers \(x\), \(x^2 \geq x\)."

The predicate of a quantified statement could be a compound statement. For instance,

  • Some dogs are big and fluffy.

This is written as \(\exists x (B(x) \land F(x))\) where \(B(x)\) is the proposition "\(x\) is big." and \(F(x)\) is the proposition "\(x\) is fluffy." and the domain is dogs. Negating this statement would give

\(\neg \exists x (B(x) \land F(x)) \equiv \forall x \neg (B(x) \land F(x)) \equiv \forall x (\neg B(x) \lor \neg F(x))\)

In words,

  • All dogs are not big or not fluffy.

4.4.4. Nested Quantifiers

There are times it will take more than one quantifier to express a statement.

  • For all integers \(x\), there exists an integer \(y\), such that \(x+y=0\).

This statement contains both a universal and an existential quantifier. \(\forall x \exists y S(x,y)\) where \(x\) and \(y\) are integers and \(S(x,y)\) is the proposition \(x+y=0\). This statement means, if you have any integer \(x\) (for instance \(x=5\)) then you can find an integer \(y\) (for instance \(y=-5\)) such that \(x+y=0\).

The order of the quantifiers matters. \(\exists x \forall y S(x,y)\) would be

  • There exists an integer \(x\), such that for all integers \(y\), \(x+y=0\).

Note that in this statement you find an integer \(x\) so that when you add any integer \(y\) to it you always get 0.

The first statement, for all integers \(x\), there exists an integer \(y\) such that \(x+y=0\), is true. For any integer \(x\) you could choose \(y=-x\) and \(x+y=x+(-x)=0\). While the second statement, there exists an integer \(x\), such that for all integers \(y\), \(x+y=0\), is false.

Example 24

Let \(Q(x,y)\) be the statement \(xy=0\). If the domain for both variables consists of all integers, what are the truth values of the following statements?

  • \(Q(0,3)\) is True since \(0\cdot 3=0\)

  • \(Q(6,2)\) is False since \(6\cdot 2=12\)

  • \(\exists x Q(x,4)\) is True. Use the value of \(x=0\), and since \(0\cdot 4=0\) there is at least one integer \(x\) so that \(x\cdot 4=0\).

  • \(\forall x \exists y Q(x,y)\) is True. If you have any integer \(x\), you can pick the value \(y=0\) and get \(x\cdot 0=0\).

You Try - Determine the truth value of each statement and justify the answer.
  • \(\forall y Q(1,y)\)

  • \(\exists x \forall y Q(x,y)\)

  • \(\forall x \forall y Q(x,y)\)

To negate nested quantifiers, repeatedly apply De Morgan’s Laws of negating a quantifier and a predicate.

Namely, \(\neg \forall x P(x) \equiv \exists x \neg P(x)\) and \(\neg \exists x P(x) \equiv \forall x \neg P(x)\).

Example 25 - Negation of quantified statements

Find the negation of the statment "For all integers \(x\), there exists an integer \(y\) such that \(x=-y\)."

Solution

Using quantifiers, we write this statement as \(\forall x \exists y N(x,y)\) where \(N(x,y)\) is the proposition "\(x=-y\)." and the domain of \(x\) and \(y\) is the integers. Its negation would be \(\exists x \forall y \neg N(x,y)\).

  • There exists an integer \(x\), such that for all integers \(y\), \(x \neq -y\).

You Try

Find the negation of the statement "Some student in the class will solve every practice problem."

Hint: Let \(x\) be a student in the class, \(y\) be a practice problem, and \(P(x,y)\) be the statement "student \(x\) has solved practice problem \(y\)".

4.5. Applications of Logic

Remixer’s Note: This section is taken from the original “Discrete Math” book with minor edits to include base-two notation and a link to the NANDgame website.

In this section we consider two applications of logic to information technology and computer science. The first involves bitwise operations, and the second designing and analyzing logic circuits.

4.5.1. Bitwise operations

A bitwise operation is a Boolean operation that operates on the individual bits (\(0s\), or \(1s\)) of the operand(s) and are summarized

Bitwise Operations
  1. The bitwise AND, denoted by "&", applies the and \(\land\) to the corresponding bits of each operand.

  2. The bitwise OR, denoted by "\(|\)", applies the or \(\lor\) to the corresponding bits of each operand.

  3. The bitwise XOR, denoted by "\({}^{\wedge}\)", applies the disjunctive or \(\oplus \) to the corresponding bits of each operand.

  4. The bitwise NOT, denoted by "!", applies the negation \(¬\) (flips \(0\longleftrightarrow 1\) ), to the corresponding bits of each operand.

We summarize the truth tables for the bitwise boolean operators.

\(p\) \(q\) \(AND\) & \( \ OR\ | \) \(XOR\) \({}^{\wedge}\) \(IF\) \(\Rightarrow\) \(IFF\) \(\Leftrightarrow\)

1

1

1

1

0

1

1

1

0

0

1

1

0

0

0

1

0

1

1

1

0

0

0

0

0

0

1

1

Example 26 - Bitwise Operations

Find the bitwise \(AND, OR, XOR\) for the following binary numbers,

\[ A = 111101\] \[ B = 001111\]

Solution

Using the truth tables for Boolean operators, where the results are noted in the bottom row, we have

Bitwise AND Bitwise OR Bitwise XOR

111101

111101

111101

001111

001111

001111

001101

111111

110010

4.5.2. Logic Circuits

Logic circuits are important in designing the arithmetic and logic units of a computer processor.

Consider the problem of adding two \(8\)-bit numbers in binary. In binary, \((0)_2 + (0)_2 = (0)_2\) and \((0)_2 + (1)_2 = (1)_2 + (0)_2 = (1)_2\), but as in decimal addition, \((1)_2 + (1)_2 = (10)_2\) requires a carry, that is, the "sum bit" is \(0\) with a "carry bit" of \(1\) to the next significant column on the left.
Note: The bits are enclosed in parentheses which are followed by the subscript \(_2\) to emphasize that binary notation, not decimal notation, is being used. This notation is covered in much more detail in the Number Bases chapter.

Thinking then of adding a specific column of two binary digits, say \(A\) and \(B\), involves as input the bits \(A, B\) and the carry in from the previous column say \(C_{in}\). The output will be the sum \(S\) and the carry out to the next column, say \(C_{out}\). These are the basic components of what is called a binary adder.

binary adder
Figure 8. A Binary adder

The logic table for binary addition based on the digital inputs \(A, B, C_{in}\), and digital outputs \(S\) and \(C_{out}\) is summarized in the table.

Table 1. Truth table for Binary adder
\(A\) \(B\) \(C_{in}\) \(\mathbf{S}\) \(\mathbf{C_{out}}\)

1

1

1

\(\mathbf{1}\)

\(\mathbf{1}\)

1

1

0

\(\mathbf{0}\)

\(\mathbf{1}\)

1

0

1

\(\mathbf{0}\)

\(\mathbf{1}\)

1

0

0

\(\mathbf{1}\)

\(\mathbf{0}\)

0

1

1

\(\mathbf{0}\)

\(\mathbf{1}\)

0

1

0

\(\mathbf{1}\)

\(\mathbf{0}\)

0

0

1

\(\mathbf{1}\)

\(\mathbf{0}\)

0

0

0

\(\mathbf{0}\)

\(\mathbf{0}\)

It can be shown that the logic for the outputs \(S\), and \(C_{out}\) is given by the following propositions \[ C_{out}=(A\land B)\lor \left(B\land C_{in}\right)\lor \left(A\land C_{in}\right)\] \[S=\left(\sim A\land \sim B\land C_{in}\right)\lor \left(\sim A\land B\land \sim C_{in}\right)\lor \left(A\land \sim B\land \sim C_{in}\right)\lor \left(A\land B\land C_{in}\right) \]

Implementing these logical outputs based on the inputs \((A,B, C_{in})\), is through the use of electronic circuits called logic gates.

The basic logic gates, are the Inverter or Not gate, the And gate, the Or gate and the Xor gate. The graphical representation for each is shown below.

basic gates
Figure 9. Basic gates

We end this section by first analyzing logic circuits to give their outputs in terms of their input variables, and then, constructing logic circuits based on logical statements.

Example 27 - Output of a logic circuit in terms Input

Determine the output of the following logic circuit in terms of the input variables, \(p, q\), and \(r\).

logic gate 3
Solution

Proceeding left to right, determine the output of the leftmost gates first using the basic gate outputs.

logic gate 3a

The output of the logic circuit is \( ( p \lor q)\land ( \neg p \lor \neg q)\)

In the next two examples, we design logic circuits based on logical propositions. The idea is to work backward using order of operations from the right to the left.

Example 28 - Design a Logic Circuit

Design a logic circuit for \((p\vee\lnot\ q)\land\lnot\ p\).

Solution

Working backwards from right to left we have the following sequence of gates

1) An AND gate \((p\vee\lnot\ q)\underline{\land} \lnot\ p\).

2) The inputs to the AND gate are \((p\vee\lnot\ q)\) and \(\lnot\ p\).

3) These inputs come from the output of an INVERTER, for \(\underline{\lnot}\ p\) and an OR gate \((p \underline{\vee}\lnot\ q)\).

4) There are two inputs to the OR gate \((p \underline{\vee}\lnot\ q)\), being \(p\), and the output of an INVERTER, \(\underline{\lnot} q\).

Putting these now in left to right order we obtain the following logic circuit.

logic gate 4
Example 29 - Design a Logic Circuit

Design a logic circuit for \(r\land (p\lor (r\land \neg q))\).

Solution

Working backwards from right to left we have the following sequence of gates

1) An AND gate \(r\underline{\land} (p\lor (r\land \neg q))\).

2) The inputs to the AND gate are \(r\) and \(p\lor (r\land \neg q)\).

3) The input, \(p\lor (r\land \neg q)\), comes from the output of an OR gate for \(p \underline{\lor} (r\land \neg q)\).

4) The inputs to the OR gate, \(p \underline{\lor} (r\land \neg q)\), are \(p\) and \((r\land \neg q)\), which is an AND gate.

5) The inputs to the AND, gate, \(r \underline{\land} \neg q\), are \(r\) and the output of an INVERTER, \(\underline{\neg} q\).

Putting these now in left to right order we obtain the following logic circuit.

logic gate 5
How about a game of nand?
Here is a link to a website that lets you build a computer, starting from the most basic level of the NAND component.

4.6. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with no changes.

  1. Which of these statements are propositions? Explain your reasoning

    1. Is Atlanta the capital of Georgia?

    2. All birds fly

    3. \(2\ \times\ \ 3\ =\ 5\)

    4. \(5\ +\ 7\ =\ 7+5\)

    5. \(x\ +\ 2\ =\ 11\)

    6. Answer this question.

    7. The rain in Spain

  2. Construct truth tables for,

    1. \(a\vee b\Rightarrow\lnot b\)

    2. \((a\vee\lnot b)\ \Leftrightarrow\ a\)

    3. \((a\Rightarrow b)\ \bigwedge\ (b\ \bigwedge\ \lnot c)\)

    4. \((a\ \bigvee\ b)\ \Rightarrow\ (\ \lnot c\ \bigvee\ a)\)

    5. \((a\ \bigvee\ b)\ \bigwedge\ (c\ \bigvee\lnot d\ )\)

    6. \((\lnot c\ \bigwedge\ \ b)\ \bigvee\ \ (a\Rightarrow\ \lnot d\ )\)

  3. Using truth tables, determine if each of the following is a tautology, contradiction, or neither (conditional)

    1. \(\neg ((a\lor b)\lor (\neg a\land \neg b))\)

    2. \(\left(\left(a\vee b\right)\land\lnot a\right)\Rightarrow b\)

    3. \(\left(\left(a\vee b\right)\land a\right)\Rightarrow b\)

    4. \(p\land r)\lor (\neg p\land \neg r)\)

    5. \(\neg ((p\lor q)\lor (\neg p\land (\neg q\lor r)))\)

    6. \(\neg (p\land q)\lor (q\lor r)\)

  4. Using truth tables determine which of the following are equivalent

    1. \(\left(p\Rightarrow q\right)\Rightarrow r\),

      \(\left(p\land\lnot q\right)\vee r,\) and

      \(\left(p\land\lnot q\right)\land r\)

    2. \((a\lor b)\land c,\)

      \((c\land a)\lor (c\land b),\) and

      \(\neg ((\neg a\land \neg b)\lor \neg c)\)

  5. Let \(C(x)\) be the statement "\(x\) has visited Canada." where the domain consists of the students at GGC. Express each of the quantifications in English.

    1. \(\exists x C(x)\)

    2. \(\forall x C(x)\)

    3. How would you determine whether each of these statements is true or false?

  6. Determine the truth value of each of these statements if the domain for all variables, \(m , n\) is the set of all integers, \(\mathbb{Z}\), explaining your reasoning.

    1. \(\forall n:\left(n^2\geq 1\right)\)

    2. \(\forall n:\left(n^2\geq 0\right)\)

    3. \(\ \exists\ n:(n^2=3)\)

    4. \(\ \exists\ m\forall\ n:(m+n=n-m)\)

    5. \(\forall\ n\exists\ m:\ (n\cdot\ m=m)\)

    6. \(\ \exists\ n\forall\ m:\ (n\cdot\ m=m)\)

    7. \(\ \exists\ n\forall\ m:\ (n\cdot\ m=n)\)

  7. Consider each of the compound propositions. (i) Translate each using logical symbols and letters, stating what each letter represents, (ii) Negate each using plain English sentences, and (iii) Translate the negated statements using logical symbols and quantifiers.

    1. If it snows today, then I will go skiing tomorrow.

    2. Mei walks or takes the bus to class.

    3. Every person in this class understands mathematical induction.

    4. In every mathematics class there is some student who falls asleep during lectures.

    5. There is a building on the campus of some college in the United states in which every room is painted white.

  8. Let \(p\), be the proposition ”My bicycle needs a tire replaced,” \(q\), be the proposition ”I will go cycling”, and, \(r\), be the proposition ”Rain is in the forecast.”

    1. Express each of these compound propositions using plain English sentences.

      1. \(\neg p\vee q\)

      2. \(\neg p\Rightarrow \neg q\)

      3. \((\neg p\wedge r)\Rightarrow q\)

      4. \((\neg p\wedge r)\Rightarrow q\)

      5. \((\neg p\wedge q)\vee r\)

    2. Write these compound propositions using \(p\), \(q\) and, \(r\) and logical connectives (including negation).

      1. If my bicycle tire does not replacement I will go cycling.

      2. My bicycle tire does not replacement, there is rain in the forecast but I will go cycling

      3. Whenever there is rain in the forecast, I do not go cycling.

      4. If there is rain in the forecast or my tire needs replacement I will not go cycling.

      5. Rain is not forecast whenever I go cycling.

      6. Rain is not forecast and my tire does not need replacement whenever I go cycling.

  9. Design logic circuits with the following output

    1. \((p\lor (q\land \neg r))\lor \neg (p\land q)\)

    2. \((p\lor (q\land r))\land \neg (p\land q)\)

  10. Consider the predicate \(Q(x,y): x\ \cdot\ y=5\), where the domain of \(x\) and \(y\) is all positive real numbers \(\mathbb{R}^+\), or \(x,\ y\ >0\). Determine the true value of the following, an explain your reasoning.

    1. \(Q(1,5)\)

    2. \(Q\left(2,\frac{5}{2}\right)\)

    3. \(\exists\ y,\ Q\left(7,y\right)\)

    4. \(\ \forall\ y,\ Q\left(7,y\right)\)

    5. \(\exists\ x\ \forall\ y,\ Q\left(x,y\right)\)

    6. \(\ \forall\ \ x\ \exists\ \ y,\ Q\left(x,y\right)\)

  11. Consider the predicate \(R(x,y):\ 2x+y=0\), where the domain of \(x\) and \(y\) is all rational numbers, \(\mathbb{Q}\). Determine the true value of the following, an explain your reasoning.

    1. \(R(0,0)\)

    2. \(R(2,-1)\)

    3. \(R\left(\frac{1}{5},-\frac{2}{5}\right)\)

    4. \(\exists y,\ R\left(0.2,y\right)\)

    5. \(\ \forall y,\ R\left(7,y\right)\)

    6. \(\exists\ x\forall\ y,\ R\left(x,y\right)\)

    7. \(\ \forall\ x\ \exists\ y,\ R\left(x,y\right)\)

  12. Calculate the bitwise \(AND\), the bitwise \(OR\), and the bitwise \(XOR\) of the following pairs of bytes, or sequence of bytes

    1. \(01111111\) and \(11101001\)

    2. \(1110010111111010\) and \(0101110101100011\)

  13. Give the output for each of the logic circuits in terms of the input variables,

    1. The logic circuit, with input variables, \(p, q\), \(r\).

      logic gate 1
    2. The logic circuit, with input variables, \(a, b\), \(c\).

      logic gate 2
  14. Design a logic circuit for \(r\land (p\lor (r\land \neg q))\).

4.7. Challenge Exercises

  1. Complete parts (a) and (b) to show that every compound proposition is logically equivalent to one that uses the same propositional variables but only the NAND operator \(\uparrow.\)

    1. For each of the three propositions \(\neg p\), \(p \land q,\) and \(p \lor q,\) write a logically equivalent proposition that uses only \(p,\) \(q,\) and \(\uparrow.\)

    2. Use the theorem that every compound proposition is equivalent to a compound proposition in disjunctive normal form to justify that the compound proposition is also equivalent to one that uses only the NAND operator \(\uparrow.\)

  2. Complete parts (a) and (b) to show that every compound proposition is logically equivalent to one that uses the same propositional variables but only the NOR operator \(\downarrow.\)

    1. For each of the three propositions \(\neg p\), \(p \land q,\) and \(p \lor q,\) write a logically equivalent proposition that uses only \(p,\) \(q,\) and \(\downarrow.\)

    2. Use the theorem that every compound proposition is equivalent to a compound proposition in disjunctive normal form to justify that the compound proposition is also equivalent to one that uses only the NOR operator \(\downarrow.\)

5. Proofs: Basic Techniques

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on February 25, 2025
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Recall from the Logic chapter that an argument is a finite sequence of statements that ends with a final statement, the conclusion, that is based on inferences made from the earlier statements, called premises or hypotheses.

An argument is valid if it is of a form such that if all premises are True then the conclusion must be True, too. An argument is not valid just because it exists! Just as a proposition is a single statement that may be either True or False (but not both), an argument is a finite sequence of statements that may be either valid or invalid (ut not both.)

A proof is a valid argument made up of propositions. In a proof, some premises may be axioms or postulates, which are propositions that we simply ASSUME to be True. Other premises used in a proof may be previously-proven propositions called theorems. There are many other terms used for theorems depending on the context, such as lemma (a minor theorem needed to prove a more important major theorem) and corollary (a theorem that is a conclusion based on a premise that is a more important theorem), but each of these specialized terms describes a theorem.

Key terms and concepts covered in this chapter:

  • Propositional inference rules (concepts of modus ponens and modus tollens)

    • Notions of implication, equivalence, converse, inverse, contrapositive, negation, and contradiction

  • The structure of mathematical proofs

  • Proof techniques

    • Direct proofs

    • Proof by counterexample (Disproving by counterexample)

    • Proof by contraposition (proof by contrapositive)

    • Proof by contradiction

  • To be added to this chapter after May 23, 2025:

    • logical equivalence and circles of implication

5.1. Rules of Inference for Propositions

To create a proof, we must proceed from True propositions to other True propositions without introducing False propositions into the argument. To do this, we use rules of inference, which are ways to draw a True conclusion from one or more premises that are already known to be True (or assumed to be True). That is, a rule of inference is an argument form that corresponds to a tautology, and so is valid.

Example 1 - Notation For A Rule Of Inference

A rule of inference can be represented as follows.

  • \(p_{1}\)

  • \(p_{2}\)

  • \(\vdots\)

  • \(p_{n}\)

  • \(\therefore q\)

The propositional variables \(p_{1},\,p_{2},\,\ldots,\,p_{n}\) represent the premises of the argument, and \(q\) represents the conclusion of the argument. The symbol \(\therefore\) is read as "therefore".

This rule of inference is interpreted to mean that if all of the propositions \(p_{1},\,p_{2},\,\ldots,\,p_{n}\) are True then the proposition \(q\) MUST be True. The rule of inference in this example corresponds to the tautology \((p_{1} \land p_{2} \land \ldots \land p_{n})\rightarrow q\).

Note that the conclusion \(q\) must be the last proposition, but the order in which the premises \(p_{1},\,p_{2},\,\ldots,\,p_{n}\) are listed in the argument does not matter since we use the conjunction of all premises to prove the conclusion. The premises \(p_{1},\,p_{2},\,\ldots,\,p_{n}\) are usually presented in an order that follows the flow of thought.

Example 2 - Valid and Invalid Arguments

Consider the following three arguments.

A Valid Argument

The following argument was used as an example in the Logic chapter.

  • Sarah earned a B.S. in Computer Science.

  • Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.

  • Therefore, Sarah earned a C or better in Discrete Math.

The argument is of the form

  • p

  • if p then q

  • \(\therefore\) q

This argument form is named modus ponens which is translated roughly from Latin as "method of affirming". Modus ponens is a valid argument form because it corresponds to the tautology \((p \land (p \rightarrow q)) \rightarrow q\).

\(p\) \(q\) \(p \rightarrow q\) \(p \land (p \rightarrow q)\) \(q\) \((p \land (p \rightarrow q)) \rightarrow q\)

T

T

T

T

T

T

T

F

F

F

F

T

F

T

T

F

T

T

F

F

T

F

F

T

That is, modus ponens is a rule of inference.

An Invalid Argument

Consider this second argument.

  • Arya earned a C or better in Discrete Math.

  • Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.

  • Therefore, Arya earned a B.S. in Computer Science.

This argument is NOT valid. If we assume that the first two propositions are True, we do not have enough information to reach the conclusion that Arya earned a B.S. in Computer Science: Arya may have earned that degree, or may still be working towards the degree, or may have changed majors and earned a degree in the new major, or perhaps there is some other possibility - we cannot determine whether the conclusion is True or False based on the assumption that the two premises are True.

The argument corresponds to the argument form

  • q

  • if p then q

  • \(\therefore\) p

This argument form is an example of a fallacy or non sequitur, which is Latin for "it does not follow". The argument form is invalid because it corresponds to a proposition that is NOT a tautology, as can be seen in the truth table for the compound proposition \((q \land (p \rightarrow q)) \rightarrow p\).

\(p\) \(q\) \(p \rightarrow q\) \(q \land (p \rightarrow q)\) \(p\) \((q \land (p \rightarrow q)) \rightarrow p\)

T

T

T

T

T

T

T

F

F

F

T

T

F

T

T

T

F

F

F

F

T

F

F

T

Notice that there is at least one row of the truth table in which both \(q\) and \(p \rightarrow q\) are True but \(p\) is False! This means that we can NOT infer that \(p\) is True whenever \((q \land (p \rightarrow q))\) is True. The argument form is invalid because \((q \land (p \rightarrow q)) \rightarrow p\) is not a tautology.

This particular fallacy is used by people so often that it has its own name: The converse error, or fallacy of the converse.

A Third Argument - You Try

Write the argument form for the following argument.

  • Jing did not earn a B.S. in Computer Science.

  • Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.

  • Therefore, Jing did not earn a C or better in Discrete Math.

Find a compound proposition that corresponds to the argument form you wrote, and write the truth table for that compound proposition.

Question
Is the argument valid?
Hint
The argument is valid if and only if the compound proposition is a tautology.
Answer
No. The argument is invalid and is an example of the inverse error or fallacy of the inverse.

Another way to see that this argument is invalid is to consider a case where Jing did earn a C or better in Discrete Math even though the two premises are True; for example, Jing could have earned a B.S. in Mathematics instead of Computer Science and also earned a C or better in Discrete Math.

In the following subsections we will discuss some of the more common rules of inference.

5.1.1. Transitivity Of The Conditional

The following rule of inference is called pure hypothetical syllogism, but we will use the less formal name transitivity. It is the basis of conditional proof in mathematics.

Transitivity (Pure Hypothetical Syllogism)
  • \(p \rightarrow q\)

  • \(q \rightarrow r\)

  • \(\therefore p \rightarrow r\)

This rule of inference corresponds to the tautology \(((p \rightarrow q) \land (q \rightarrow r)) \rightarrow (p \rightarrow r)\).

By applying transitivity multiple times, you can build a finite chain of implications of any length you want:

  • \(p \rightarrow p_{1}\)

  • \(p_{1} \rightarrow p_{2}\)

  • \(p_{2} \rightarrow p_{3}\)

  • \(\vdots\)

  • \(p_{k-1} \rightarrow p_{k}\)

  • \(p_{k} \rightarrow r\)

  • \(\therefore p \rightarrow r\)

5.1.2. Rules Of Inference And Fallacies Arising From The Conditional

Recall that if you have propositions \(p\) and \(q,\) you can form the conditional with hypothesis \(p\) and consequent \(q,\) which is written as \(p \rightarrow q\), as well as three other related conditionals.

  • \(p \rightarrow q\), the conditional

  • \(q \rightarrow p\), the converse of \(p \rightarrow q\)

  • \(\neg q \rightarrow \neg p\), the contrapositive of \(p \rightarrow q\)

  • \(\neg p \rightarrow \neg q\), the inverse of \(p \rightarrow q\)

Also, recall that \((p \rightarrow q) \equiv (\neg q \rightarrow \neg p)\). That is, \((p \rightarrow q) \leftrightarrow (\neg q \rightarrow \neg p)\) is a tautology. This means that the conditional is logically equivalent to its contrapositive. The conditional is NOT logically equivalent to either its converse or its inverse, as was shown using truth tables in the Logicchapter.

From the four conditionals you can get two rules of inference and two fallacies. Together, these four argument forms are referred to as the mixed hypothetical syllogisms.

First, here are the two rules of inference.

Modus Ponens ("Method Of Affirming")
  • \(p \rightarrow q\)

  • \(p\)

  • \(\therefore q\)

This rule of inference corresponds to the tautology \(((p \rightarrow q) \land p) \rightarrow q\).

Modus Tollens ("Method Of Denying")
  • \(p \rightarrow q\)

  • \(\neg q\)

  • \(\therefore \neg p\)

This rule of inference corresponds to the tautology \(((p \rightarrow q) \land \neg q) \rightarrow \neg p\).

This rule of inference corresponds to replacing the conditional \(p \rightarrow q\) by its logically equivalent contrapositive \(\neg q \rightarrow \neg p\) in the tautology, which gives \(((\neg q \rightarrow \neg p) \land \neg q) \rightarrow \neg p,\) then applying modus ponens to this new tautology.

Next, here are the two fallacies. They are included because they are very common errors to be aware of and to avoid.

Inverse Error
  • \(p \rightarrow q\)

  • \(\neg p\)

  • ∴¬q

This fallacy arises by mistakenly treating the inverse \(\neg p \rightarrow \neg q\) as if it were logically equivalent to \(p \rightarrow q\). It is also called the "fallacy of the inverse" and "fallacy of denying the hypothesis".

Converse Error
  • \(p \rightarrow q\)

  • \(q\)

  • p

This fallacy arises by mistakenly treating the converse \(q \rightarrow p\) as if it were logically equivalent to \(p \rightarrow q\). It is also called the "fallacy of the converse" and the "fallacy of affirming the consequent".

The following image uses an Euler diagram of two sets \(A \subseteq B\) to explain why the converse error is a fallacy. The image can also be used to explain why the inverse error is a fallacy.

ConverseErrorForSets

Suppose you were told that \(c\) is an element of \(B\) in the preceding image. Can you determine whether \(c\) is an element of \(A\), too?

5.1.3. Other Common Rules Of Inference

Any tautology of the form \(p \rightarrow q\) can be used to define a rule of inference. In particular, we can define a rule of inference corresponding to the tautology \((p_{1} \land p_{2} \land \ldots \land p_{n})\rightarrow p_{1}\) for each integer \(n \geq 1\). This means that there are at least as many possible tautologies as there are natural numbers! How do we deal with infinitely many tautologies?

In general, there is a small number of rules of inference that are used in most proofs. Proofs often are built up to a large size by applying just a few rules of inference multiple times.

Here are some of the more commonly-used rules of inference. In the remix author’s opinion, it’s better to practice using these rules of inference rather than to focus on memorizing them as formal rules with special names.

Proof by Cases
  • \(p \rightarrow r\)

  • \(q \rightarrow r\)

  • \(p \lor q\)

  • \(\therefore r\)

This rule of inference corresponds to the tautology \(((p \rightarrow q) \land (q \rightarrow r) \land (p \lor q)) \rightarrow r\).

Elimination (Disjunctive Syllogism)
  • \(p \lor q\)

  • \(\neg q\)

  • \(\therefore p\)

This rule of inference corresponds to the tautology \(((p \lor q) \land \neg q) \rightarrow p\).

Resolution
  • \(p \rightarrow q\)

  • \(\neg p \rightarrow r\)

  • \(\therefore q \lor r\)

This rule of inference corresponds to the tautology \(((p \rightarrow q) \land (\neg p \rightarrow r)) \rightarrow (q \lor r)\).

Notice that this rule of inference can also be written as
\(\neg p \lor q\)
\(p \lor r\)
\(\therefore q \lor r\)
This form of resolution is important to automated theorem-proving.

Contradiction Rule
  • \(\neg p \rightarrow (q \land \neg q)\)

  • \(\therefore p\)

This rule of inference corresponds to the tautology \((\neg p \rightarrow (q \land \neg q)) \rightarrow p\).

Note that this tautology is often written in the alternate form \(((\neg p \rightarrow q) \land (\neg p \rightarrow \neg q)) \rightarrow p\), which can be more useful in some contexts.

There are many more rules of inference we could write down and give names to. Instead, we’ll just list a few tautologies.

You Try

For each of the tautologies shown, write the argument form for the corresponding rule of inference. If needed, refer to Example 2 in this chapter to see how the argument, argument form, and tautology are related.

  • \((p \land q) \rightarrow p\)

  • \(p \rightarrow (p \lor q)\)

  • \(p \rightarrow (q \rightarrow (p \land q))\)

5.2. Rules Of Inference for Quantified Statements

In this section, four rules of inference that apply to quantified predicates are presented. In all of these rules of inference, the values of the variable(s) are assumed to be restricted to a universal set \(U.\)

5.2.1. Rules of Inference for Universally-Quantified Predicates

Universal instantiation states that, from the premise that \(\forall x P(x)\) is True, where \(x\) ranges over all elements of the universal set \(U,\) you can conclude that \(P(c)\) must also be True, where \(c \in U\) is any arbitrarily-chosen element of \(U.\)

Universal Instantiation (Universal Specification)
  • \(\forall x P(x)\)

  • \(\therefore P(c) \text{ for any } c \in U\)

Universal generalization states that, from the premise that \(P(x)\) is True for every arbitrarily-chosen value of \(x\) that is an element of the universal set \(U,\) you can conclude \(\forall x P(x)\) must also be True, where \(x\) ranges over all elements of the universal set \(U.\)

Universal Generalization
  • \(P(c) \text{ for every } c \in U\)

  • \(\therefore \forall x P(x)\)

You will see later in the textbook that universal generalization is applied in every proof that uses the mathematical induction proof technique.

5.2.2. Rules of Inference for Existentially-Quantified Predicates

Existential instantiation states that, from the premise \(\exists x P(x),\) you can conclude that there must be at least one \(c \in U\) such that \(P(c)\) is true. This allows you to pick a "constant" \(c\) that makes the predicate \(P(x)\) True instead of needing to refer repeatedly to the existential quantifier.

Existential Instantiation (Existential Elimination)
  • \(\exists x P(x)\)

  • \(\therefore P(c) \text{ for some element } c \in U\)

Existential generalization states that, from the premise that \(P(c)\) is True for at least one \(c \in U,\) you can conclude that \(\exists x P(x)\) must also be True.

Existential Generalization
  • \(P(c) \text{ for some element } c \in U\)

  • \(\therefore \exists x P(x)\)

5.3. Proof Techniques

In this section several examples of formal mathematical proofs are given to illustrate different proof techniques. Many of the techniques correspond to certain rules of inference that were discussed earlier in this chapter.

Each proof starts by stating a conjecture, which is a proposition with undetermined truth value. The goal of each proof is to determine the truth value of the relevant conjecture.

To simplify the description of the proof techniques, we’ll only consider the case where the proof has a single premise \(p\), that is, we’ll always assume that our proof involves a single conditional \(p \rightarrow q\). This may seem like an oversimplification, but it is not: We are simply renaming the conjunction \((p_{1} \land p_{2} \land \ldots \land p_{n})\) of all of the actual premises by using the single propositional variable \(p\), that is we are defining \(p\) by the logical equivalence \(p \equiv (p_{1} \land p_{2} \land \ldots \land p_{n})\)

Here we’ll present examples of proofs using several different techniques. Most of these proofs establish an arithmetic fact that you probably have always known (or assumed) is True; instead, you can focus on the form of the proof: Note the steps that are used, and how the argument flows.

Another important proof technique, mathematical induction, will be discussed in a later chapter.

Example 3 - Why do we need proofs at all?

Consider the following proposition: "For all natural numbers \(n,\) the value of \(n^{2} + n + 41\) is a prime integer."

For each of the natural numbers 0, 1, …​, 10, the predicate \(P(n)\): "\(n^{2} + n + 41\) is a prime integer." evaluates to a proposition that is True. In fact, \(P(n)\) evaluates to a proposition that is True for each of the natural numbers \(n\) that is less than or equal to 40.

It may seem that we have "checked enough cases" to conclude that \(P(n)\) will evaluate to a proposition that is True for every possible natural number value of \(n.\) However, \(P(41)\) is the proposition "\(41^{2} + 41 + 41 \text{is a prime integer}\)," which is False - notice that \(41^{2} + 41 + 41 = 41 \cdot (41 + 1 + 1) = 41 \cdot 43\) is a composite number.

This example shows that it is not enough to verify that a proposition or predicate is True for just a few cases, unless those cases happen to cover every possibility.

5.3.1. Direct Proof

In a direct proof, we make an argument that a conditional statement \(p \rightarrow q\) must be True. This means that we can assume that the premise \(p\) is True and apply modus ponens to prove that \(q\) must be True, too.

Theorem

If \(a,\) \(b,\) and \(c\) are integers such that both \(a\) and \(b\) are divisible by \(c,\) then for any integers \(m\) and \(n\) the integer \(ma+nb\) must be divisible by \(c.\)
This statement can be rephrased as "If the integers \(a\) and \(b\) are multiples of the integer \(c\), then any sum of integer multiples of \(a\) and \(b\) must also be a multiple of \(c."\)

Proof

Before starting the formal proof, let’s look at a specific example. The integers 10, 6, and 2 are such that both 10 and 6 are divisible by 2. Suppose we have a pair of integers that we will use as multipliers, say 11 and 7, to form a new number \((11)(10)+(7)(6).\) Is the new number also divisible by 2? The answer is obviously "Yes" since we can just simplify the sum and divide by 2, but how can we justify this for every choice of the pair of multipliers? Notice that we can factor 2 out of 10 and 6 in the sum that defines our new number: \[ (11)(10) + (7)(6) = (11)(5)(2) + (7)(3)(2) = [(11)(5)+(7)(3)](2) \] We can find the common factor of 2 and "factor it out" as if it were a variable. This appears to work no matter what values are chosen for the first three integers and the pair of multipliers, so should be generalizable to a formal proof.

For the formal proof, start by supposing that \(a,\) \(b,\) and \(c\) are integers such that both \(a\) and \(b\) are divisible by \(c.\) This means that there are integers \(q\) and \(t\) such that \[ a=qc \text{ and } b=tc \]

For any integers \(m\) and \(n\), we can rewrite the expression \(ma+nb\) as \[ ma+nb = m(qc)+n(tc) = (mq)c + (nt)c = (mq+nt)c \]

The last part of the extended equality shows that \(ma+nb\) is a multiple of \(c,\) that is, \(ma+nb\) is divisible by \(c.\)

This shows that the statement of the theorem is a True proposition.

Q.E.D.

You Try

Write a proof of the following statement: If \(a,\) \(b,\) and \(c\) are integers such that \(a\) is divisible by \(b\) and \(b\) is divisible by \(c,\) then \(a\) must be divisible by \(c.\)

5.3.2. Proof By Contraposition

In a proof by contraposition, we make an argument that \(p \rightarrow q\) is True by instead first arguing that its contrapositive \(\neg q \rightarrow \neg p\) is True and secondly applying the logical equivalence of the conditional and its contrapositive. Start by assuming that the premise \(\neg q\) is True and apply modus ponens to prove that \(\neq p\) must be True, too, then apply logical equivalence.

Theorem

Let \(n\) represent an integer.

If \(n^{2}\) is even, then the integer \(n\) is even.

Proof

It is easier to prove "if it’s not the case that the integer \(n\) is even, then it’s not the case that \(n^{2}\) is even," so start by supposing that it’s not the case that the integer \(n\) is even.

This means that \(n\) must be odd, so there is an integer \(q\) such that \[ n = 2q + 1\]

This in turn means that \[ n^{2} = (2q + 1)^{2} = 4q^{2} + 4q + 1 = 2(2q^{2} + 2q) + 1\]

The last part of the extended equality shows that \(n^{2}\) is odd: When \(n^{2}\) is divided by 2, the remainder is 1. That is, \(n^2\) is not even.

This shows that the contrapositive of the statement of the theorem, "if the integer \(n\) is not even, then \(n^{2}\) is not even" is a True proposition. Since every conditional and its contrapositive are logically equivalent, this argument proves that "If \(n^{2}\) is even, then the integer \(n\) is even" is a True proposition.

Q.E.D.

You try

Prove the following statement: If \(n^{2}\) is odd, then the integer \(n\) is odd.

5.3.3. Proof By Counterexample

In a proof by counterexample, we disprove a proposition of the form \((\forall x \in D) P(x)\) by arguing that there is at least one value \(c \in D\) such that \(\neg P(c)\) is True.

Conjecture

For every natural number \(n,\) there are natural numbers \(a,\) \(b,\) and \(c\) such that \(n = a^{2} + b^{2} + c^{2}.\)

Disproof of Conjecture

In this case, we can simply compute values of the expression \(a^{2} + b^{2} + c^{2}\) until we find a "gap." \[ 0^{2} + 0^{2} + 0^{2} = 0 \] \[ 1^{2} + 0^{2} + 0^{2} = 1 \] \[ 1^{2} + 1^{2} + 0^{2} = 2 \] \[ 1^{2} + 1^{2} + 1^{2} = 3 \] \[ 2^{2} + 0^{2} + 0^{2} = 4 \] \[ 2^{2} + 1^{2} + 0^{2} = 5 \] \[ 2^{2} + 1^{2} + 1^{2} = 6 \] \[ 2^{2} + 2^{2} + 0^{2} = 8 \] \[ 2^{2} + 2^{2} + 1^{2} = 9 \] \[ 3^{2} + 0^{2} + 0^{2} = 9 \] Notice that you cannot write 7 as a sum of three squares of natural numbers (There may be other numbers that cannot be written in this form, too, but 7 is the least such number and we only need to find one counterexample.)

This proves the negation of the Claim, namely

Theorem

There exists at least one natural number \(n,\) such that for all natural numbers \(a,\) \(b,\) and \(c,\) \(n \neq a^{2} + b^{2} + c^{2}.\)

This may seem like a strange conjecture to consider until you find out that every natural number \(n\) can be written as \(n = a^{2} + b^{2} + c^{2} + d^{2}\) for some natural numbers \(a,\) \(b,\) \(c,\) and \(d,\) and that this may have been known about 1,800 years ago.

5.3.4. Proof by Contradiction

In a proof by contradiction, we disprove the proposition \(\neg p\) by making an argument that the conditional \(\neg p \rightarrow (q \land \neg q)\) must be True for some proposition \(q\) and apply the Contradiction Rule to conclude that \(p\) must be True.
Note: Sometimes, we argue instead that the proposition \(((\neg p \rightarrow q) \land (\neg p \rightarrow \neg q))\) must be True, and use the fact that this proposition is logically equivalent to \(\neg p \rightarrow (q \land \neg q)\) and apply the Contradiction Rule.

Theorem

There are no positive integers \(a\) and \(b\) such that \(\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.\)

Proof

It may be helpful to write the proposition out in symbols: \[ \neg (\exists a \in \mathbb{Z}) (\exists b \in \mathbb{Z}) \left( a > 0 \land b>0 \land \left( \frac{a}{b} \right)^{2} = 2 \right) \]

To prove this proposition by contradiction, we ASSUME that its negation is True, that is we use the premise \[ \text{ Premise: } \neg \neg (\exists a \in \mathbb{Z}) (\exists b \in \mathbb{Z}) \left( a > 0 \land b>0 \land \left( \frac{a}{b} \right)^{2} = 2 \right) \] In words, we assume: There are integers \(a\) and \(b\) such that \(a > 0,\) \(b > 0,\) and \(\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.\)

We know that we can reduce the fraction so that \(a\) and \(b\) have no common prime factors (You know how to do this - just divide both numerator and denominator by their greatest common divisor. In fact, we will prove that this can be done in the mathematical induction chapter later in the textbook, but for now just treat it like a "known fact".)

To eliminate the fraction we can rewrite the equation as \[ a^{2} = 2 b^{2} \]

From this new equation, \(a^{2}\) must be divisible by 2. Use the theorem we proved earlier in this section to conclude that \(a\) must be divisible by 2, too. This means that there is an integer \(q\) such that \(a = 2q.\) Substitute the last expression for \(a\) in the equation to get \[ (2q)^{2} = 2 b^{2} \] which we can rewrite as \[ 4 q^{2} = 2 b^{2} \] or \[ 2 q^{2} = b^{2} \] From this equation, \(b^{2}\) is divisible by 2, and we can conclude that \(b\) is divisible by 2, too.

So…​ we have positive integers \(a\) and \(b\) that have no common prime factors, and we have proven that \(a\) and \(b\) have a common prime factor, namely 2. We have arrived at a contradiction.

Apply the Contradiction Rule to infer that the premise must be False.

We have proven the following theorem.

Theorem

There are no positive integers \(a\) and \(b\) such that \(\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.\)

That is, the square root of 2 is not a rational number.

Here is another example of proof by contradiction.

Theorem - Generalized Pigeonhole Principle

Suppose that \(n\) and \(k\) are positive integers. If each of \(n\) objects is assigned to one of \(k\) categories, then at least one category contains at least \(\displaystyle{\left\lceil \frac{n}{k} \right\rceil}\) objects.

Proof

First, recall that \(\lceil x \rceil\) is the ceiling function whose output is the least integer that is greater than or equal to \(x\) (that is, \(\lceil x \rceil\) rounds a real number \(x\) up to the next greatest integer.) The graph of the function is shown in section 3 of this appendix.

Next, before starting a formal proof, let’s look at a specific example to understand what we need to prove. Suppose we want to assign 13 people to the 3 categories "high school student," "post-secondary student," and "other". It’s not hard to see that when we assign people to the categories, the sum of the numbers in the categories has to be 13, so at least one of the categories has at least \(5 = \displaystyle{\left\lceil \frac{13}{3} \right\rceil}\) people. That is, if each category had at most \(4 = \displaystyle{\left\lceil \frac{13}{3} \right\rceil - 1}\) people, then the sum of those numbers would be at most \((3)(4) = 12,\) but we know that the sum should be equal to 13 - this is a contradiction. You can formalize this argument to use \(n\) and \(k\) instead of the specific values 13 and 3, which will provide a formal proof using the "proof by contradiction" technique.

Now, to prove this proposition by contradiction, suppose that the conditional is False. That is, we assume that

  • It is True that \(n\) and \(k\) are positive integers.

  • It is true that each of \(n\) objects has been assigned to one of \(k\) categories.

  • It is False that "At least one category contains at least \(\displaystyle{\left\lceil \frac{n}{k} \right\rceil}\) objects."

So we are assuming that the negation of the last bulleted statement must be True, that is "Every category contains fewer than \(\displaystyle{\left\lceil \frac{n}{k} \right\rceil}\) objects" is True. You should verify that this is the correct negation by using De Morgan’s laws for quantifiers.

Label the categories with the integers \(1, 2, \ldots k\) and let the integers \(c_1, c_2, \ldots, c_k\) be the counts of objects in each of the categories, that is, assume that the number of objects assigned to category \(i\) is \(c_i.\) From the assumption, every \(c_i\) is less than or equal to \(\displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1.\) The total number of objects assigned to categories is therefore \[ c_1 + c_2 + \cdots + c_k \leq k \left( \displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1 \right) \]

For any real number \(x,\) it is true by definition of the ceiling function that \(\displaystyle{x \leq \left\lceil x \right\rceil < x+1}\)

This means that \[ c_1 + c_2 + \cdots + c_k \leq k \left( \displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1 \right) < k \left( \displaystyle{\left( \frac{n}{k} + 1 \right)} - 1 \right) \] and the expression on the right simplifies to \(n.\)

So the number of objects assigned to the categories must be strictly less than \(n,\) but we also have as a premise that all \(n\) objects were assigned. This is a contradiction.

Apply the Contradiction Rule to infer that it is False that the last bulleted statement is False, that is, conclude that the conditional statement of the theorem must be True.

We have proven the theorem.

Theorem

Suppose that \(n\) and \(k\) are positive integers.

If each of \(n\) objects is assigned to one of \(k\) categories, then at least one category contains at least \(\displaystyle{\left\lceil \frac{n}{k} \right\rceil}\) objects.

5.3.5. Proof By Exhaustion (Proof By Cases)

Sometimes it is convenient to break a proof into a finite number of cases. For example, it may be easier to prove a statement that involves an integer by considering a first case where the integer is odd and a separate second case where the integer is even, then combining the two separate cases to create a single proof for all integers \(n.\)

In a general proof by cases, you make an argument that a conditional statement \((p_{1} \lor \cdots \lor p_{n}) \rightarrow r\) must be True. This means that if any one of the "cases" \(p_{i},\) where \(i \in \{ 1, 2, \ldots , n \},\) is True, you can apply the tautology \(p_{i} \rightarrow (p_{1} \lor \cdots \lor p_{n})\) and the transitivity rule of inference to prove that \(r\) must be True, too.

A proof by exhaustion is a special kind of proof by cases where the premise is of the form \((p_{1} \lor \cdots \lor p_{n} \lor \neg (p_{1} \lor \cdots \lor p_{n}) ).\) Notice that this premise must be True since if all of \(p_{1},\) \(p_{2},\)… \(p_{n}\) are False then \(\neg (p_{1} \lor \cdots \lor p_{n})\) is True.

If there are two cases, proof by exhaustion corresponds to using the "proof by cases" rule of inference discussed above in the section "Other Common Rules Of Inference." The tautology can be rewritten in the simpler form \(((p \rightarrow r) \land (\neg p \rightarrow r)) \rightarrow r\) because \(p \lor \neg p\) must always be True.

If there are more than two cases, this corresponds to using the tautology \(((p_{1} \rightarrow r) \land ... \land (p_{n} \rightarrow r) \land (\neg (p_{1} \lor \cdots \lor p_{n}) \rightarrow r)) \rightarrow r\).

Example 4 - Working with multiple cases

Let’s prove the following theorem: \[ \text{If \(n\) is an integer, then \(n(n+1)\) is an even integer.} \]

Proof

Consider the following two cases:

  • Case 1: \(n\) is an odd integer.
    In this case, \(n+1\) must be an even integer, and \(n(n+1)\) is the product of an odd integer and an even integer so must be even. (Note that this could be made more formal by stating that there is some integer \(j\) such that \(n+1 = 2j\) so that \(n(n+1) = n(2j) = (2j)n = 2(jn).\))

  • Case 2: \(n\) is an even integer.
    In this case, \(n+1\) must be an odd integer, and \(n(n+1)\) is the product of an even integer and an odd integer so must be even. (Note again that this could be made more formal by stating that there is some integer \(k\) such that \(n = 2k\) so that \(n(n+1) = (2k)(n+1) = 2((k)(n+1)).\))

Since the statement "\(n\) is an odd integer or \(n\) is an even integer" must be True no matter what value the integer \(n\) has, this shows that the statement of the theorem is a True proposition.

Q.E.D.

6. Number Bases

This chapter was last updated on April 1, 2025.

It’s likely that you learned how to represent positive integers using the base-ten place-value system when you were young. This system uses ten symbols, the Hindu-Arabic numerals ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, and ‘9’, to represent any natural number.

In the base-ten place-value system that uses decimal notation, each natural number is represented by a numeral which is a string of one or more of the ten Hindu-Arabic digits. However, in some computer science contexts, it is more useful to represent natural numbers in a place-value system that uses a different number base such as binary (base-two), octal (base-eight), or hexadecimal (base-sixteen). Using or thinking about numbers represented by numerals in these other systems can help you develop more efficient algorithms and recognize when certain numerals can be interpreted as encodings of multiple pieces of information.

In this chapter, you will learn how to represent natural numbers using place-value systems with bases other than ten.

Key terms and concepts covered in this chapter:

  • Number bases

    • binary

    • hexadecimal

    • octal

    • decimal

6.1. Numbers, Numerals, and Digits

In everyday life, it is common to treat numbers and numerals as if they are the same. However, it is important in this chapter to distinguish these concepts.

  • A number is an abstract idea or mental concept of a count, a measure, a rank in an ordering, etc..

  • A numeral is a word, phrase, symbol, or string of symbols that is used to represent a number.

  • A digit is a single symbol that represents a number. A digit is a numeral, but multiple digits can be combined to create other numerals. For example, each of the ten Hindu-Arabic numerals is a digit that represents a natural number that is less than ten, and multiple Hindu-Arabic numerals can be combined to create numerals that represent numbers that are greater than or equal to ten.
    NOTE: The word "digit" comes from the Latin word digitus which means "finger" or "toe."

Example 1 - Numerals, Numbers, and Digits

Consider the following words:

  • "five",

  • "cinco",

  • "خمسة" ("khamsa"),

  • "पाँच" ("paanch"),

  • "五" ("wǔ").

These words, taken from different languages, are examples of numerals, words that represent a number. In fact, all those numerals represent the same number, but no one of the numerals is the number. The number itself is an abstract idea that can be referred to using any of those numerals.

In the same way,

  • the Roman "Ⅴ",

  • the Braille "⠼⠑",

  • the Coptic Epact "𐋥",

  • the Eastern Arabic "٥",

  • the Western Arabic "5".

are other ways of representing the same number that the words above represent.

Semiotics
In semiotics, the study of signs, symbols, and signification, the number is a sign, made up of the signified mental concept and the signifier (the numeral.) Further discussion is beyond the scope of this textbook, and beyond the remixer’s expertise! Here is a link if you want to learn more about semiotics.
This is not a pipe!
Read this Wikipedia article and pay attention to the quote from the artist, René Magritte.

6.2. Review Of The Base-Ten Place Value System

It’s likely that everything you are about to read in this section is knowledge you’ve had for many years. The purpose of stating everything so explicitly is to provide an example you can compare with place-value systems that use other bases.

A base-ten numeral is a string formed from one or more digits (i.e., Hindu-Arabic numerals.)

  • The string is read from left-to-right.

  • Each digit in the string represents a multiple of a power of the base, ten, depending on the its position in the string.

  • The rightmost place represents a multiple of 1 (which is ten raised to the power zero) and each of the other places represents a multiple of a power of ten that is one greater than the power of ten represented by the place to its right.

  • Notice that the base itself, the number ten, is represented by the string "10" in this place-value system. The string "10" represents the number described by the phrase "1 ten plus 0 ones".

  • As an example, the string "101" represents the number described by the phrase "1 hundred plus 0 tens, plus 1 ones," where 1 hundred is the same as ten tens. That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \] The expression on the right-hand side of the previous equation is referred to as expanded form in school-level mathematics.

NOTE: The Hindu-Arabic numerals evolved from earlier Brahmi versions. You can look at the image at this web page which shows a few steps in this evolution. Also, you can learn more about the history of the Hindu-Arabic numerals from the links in the "Notes" and "References" sections of this Wikipedia page.

6.2.1. An Algorithm That Computes The Digits Of A Base-Ten Numeral

In this subsection, an algorithm is presented for computing the digits in the expanded form of a base-ten numeral of the natural number \(n.\) This may seem to be a complicated way to do something very simple since we could just read off the digits, but the important thing to notice about this algorithm is that the role that ten plays can be played by any other positive integer constant greater than one! This means that this algorithm can be adapted to find the "digits" in the expanded form of a base-\(b\) numeral for the number \(n\) for any base \(b\) we choose.

  • Task: Given the natural number n, compute an array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] so that each of \(r_{0}, r_{1}, \ldots , r_{k}\) is represented by a single digit in base-10, and \[ n = r_{k} \cdot 10^{k} + \ldots + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0} \] where \(k\) is the greatest natural number such that \(10^{k} < n.\)

    • Input: The natural number \(n\)

    • Steps:

      1. Set \(a\) equal to \(n\)

      2. Set \(s\) to the empty array (We will append the values \(r_{0}, r_{1}, \ldots , r_{k}\) to the array \(s\) as we compute them)

      3. Divide \(a\) by 10 to find natural numbers \(q\) and \(r\) such that both \(a = q \cdot 10 + r\) and \(0 \leq r < 10.\)

      4. Append \(r\) to the end of array \(s.\)

      5. If \(q \neq 0\)

        1. set \(a\) equal to \(q\)

        2. go to step 3

      6. Return the sequence \(s.\)

    • Output: An array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] where each number is represented by a single digit, and \[ n = r_{k} \cdot 10^{k} + \ldots + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0}.\]

That is, we rewrite \(n\) as \(r_{0} + 10 \cdot (r_{1} + 10 \cdot (r_{2} + \ldots r_{k-1} + (10 \cdot (r_{k})) \ldots ))\)

Example 2 - Finding The Digits Of A Base-Ten Numeral

The following equations summarize how the preceding algorithm determines the digits in the base-ten expanded form numeral for the number 432.

\begin{equation} \begin{aligned} 432 {} & = 43 \cdot 10 + 2 & q {} & = 43 & r {} & = 2 & s & = [2] \\ 43 {} & = 4 \cdot 10 + 3 & q {} & = 4 & r {} & = 3 & s & = [2, 3] \\ 4 {} & = 0 \cdot 10 + 4 & q {} & = 0 & r {} & = 4 & s & = [2, 3, 4] \\ \end{aligned} \end{equation}

Notice that the items in \(s = [ r_{0}, \, r_{1}, \, r_{2} \)] are the numbers corresponding to the digits of the numeral \(“432”\) in reverse order, so \begin{equation} \begin{aligned} 432 & = r_{2} \cdot 10^{2} + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0} \\ & = 4 \cdot 10^{2} + 3 \cdot 10^{1} + 2 \cdot 10^{0} \end{aligned} \end{equation}

Notice that the algorithm is rewriting \(432\) as the sum \(2 + 10 \cdot (3 + 10 \cdot 4)).\)

6.3. The Base-Two Place Value System (Binary Notation)

This subsection describes the base-two (binary) place value system. You will see that much of what is written here is the result of replacing "ten" by "two" in the description of the base-ten (decimal) system in the previous section.

A base-two numeral is a string formed from one or more of the two binary digits (or bits) ‘0’ and ‘1’.

  • The string is read from left-to-right.

  • Each digit in the string represents a multiple of a power of the base, two, depending on the its position in the string.

  • The rightmost place represents a multiple of 1 (which is two raised to the power zero) and each of the other places represents a multiple of a power of two that is one greater than the power of two represented by the place to its right.

  • Notice that the base itself, the number two, is represented by the string "10" in this place-value system. The string "10" represents the number described by the phrase "1 two plus 0 ones".

  • As an example, the string "101" represents the number described by the phrase "1 four plus 0 twos, plus 1 ones," where 1 four is the same as two twos. That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \text{ (🤯: Wait…​ WHAT?!?) }\] Yes, this equation, which may appear to be written in the base-ten system, is correct in the base-two place value system, too! "10" is how the number two is represented in base-two notation!
    As an analogy, the string "pie" signifies different things in English (a baked dessert) and Spanish (a foot.) You must take care to know which context you are working in!

It is traditional to use some extra notation to indicate when the strings "10" and "101" are not base-ten numerals to avoid confusion. In this textbook, numerals in any base other than ten will be written between a pair of parentheses followed by a subscript indicating the base. The subscript is written as a base-ten numeral. For example, we could rewrite the previous equation as \[(101)_{2} = (1)_2 \cdot (10)_2 \cdot (10)_2 + (0)_2 \cdot (10)_2 + (1)_2 \cdot (1)_2 \] which translates into base-ten as \(5 = 1 \cdot 2 \cdot 2 + 0 \cdot 2 + 1 \cdot 1.\) We can also write \(5 = (101)_2\) which is a way of saying that the base-ten numeral and the base-two numeral signify the same number. + NOTE: The reason we use base-ten numerals as the subscripts on numerals in other bases is because base-ten is so dominant: It is the "privileged" base, so we need to indicate when a different base is being used…​ and we don’t need to use the parentheses or subscripts if we are already working in base-ten.

The parentheses and subscript are not necessary if it is clear from the context that a numeral is not a base-ten numeral. For example, \[ \text{chmod 755 hello.txt} \] is a Unix/Linux command that changes the file permission bits (read, write, execute) of the file "hello.txt" for the file’s owner, the file’s group, and any other user. In this example, the string "755" is not a base-10 numeral, but is in octal (base-eight). Octal will be discussed later in the chapter. No subscript is used in the Unix/Linux command because it is natural to an experienced user of that operating system to use octal in the context.
In fact, the octal numeral "755" is used here as an encoding of three bitstrings, where each bitstring is of length 3; this idea is discussed in a later subsection of this chapter.

Also, we can omit the parentheses and subscripts if we want to tell a couple of "jokes:"

You are ready now to learn how to represent numbers using base-two numerals.

6.3.1. An Algorithm That Computes The Digits Of A Base-Two Numeral

In this subsection, an algorithm is presented for computing the digits in the expanded form of a base-two numeral of the natural number \(n.\) This algorithm has been adapted from the one stated for base-ten in the previous section. Notice that all numerals used in this algorithm are base-ten numerals unless otherwise indicated.

  • Task: Given the natural number n, compute an array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] so that each of \(r_{0}, r_{1}, \ldots , r_{k}\) is represented by a single digit in base-2, and \[ n = r_{k} \cdot 2^{k} + \ldots + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0} \] where \(k\) is the greatest natural number such that \(2^{k} < n.\)

    • Input: The natural number \(n\)

    • Steps:

      1. Set \(a\) equal to \(n\)

      2. Set \(s\) to the empty array (We will append the values \(r_{0}, r_{1}, \ldots , r_{k}\) to the array \(s\) as we compute them)

      3. Divide \(a\) by 2 to find natural numbers \(q\) and \(r\) such that both \(a = q \cdot 2 + r\) and \(0 \leq r < 2.\)

      4. Append \(r\) to the end of array \(s.\)

      5. If \(q \neq 0\)

        1. set \(a\) equal to \(q\)

        2. go to step 3

      6. Return the sequence \(s.\)

    • Output: An array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] where each number is represented by a single digit, and \[ n = r_{k} \cdot 2^{k} + \ldots + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0}.\]

That is, the algorithm rewrites \(n\) as \(r_{0} + 2 \cdot (r_{1} + 2 \cdot (r_{2} + \ldots r_{k-1} + (2 \cdot (r_{k})) \ldots ))\)

Example 3 - Finding The Digits Of A Base-Two Numeral (Binary Notation)

The following equations summarize how the preceding algorithm determines the digits in the base-two expanded form numeral for the number 13.

\begin{equation} \begin{aligned} 13 {} & = 6 \cdot 2 + 1 & q {} & = 6 & r {} & = 1 & s & = [1] \\ 6 {} & = 3 \cdot 2 + 0 & q {} & = 3 & r {} & = 0 & s & = [1, 0] \\ 3 {} & = 1 \cdot 2 + 1 & q {} & = 1 & r {} & = 1 & s & = [1, 0, 1] \\ 1 {} & = 0 \cdot 2 + 1 & q {} & = 0 & r {} & = 1 & s & = [1, 0, 1, 1] \\ \end{aligned} \end{equation}

Notice that the items in \(s = [ r_{0}, \, r_{1}, \, r_{2} , \, r_{3} \)] are the numbers (in base-ten notation) corresponding to the digits of the numeral \(“(1101)_2”\) in reverse order, so \begin{equation} \begin{aligned} 13 & = r_{3} \cdot 2^{3} + r_{2} \cdot 2^{2} + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0} \\ & = 1 \cdot 2^{3} + 1 \cdot 2^{2} + 0 \cdot 2^{1} + 2 \cdot 2^{0} \end{aligned} \end{equation}

The algorithm rewrites \(13\) as \(1 + 2 \cdot (0 + 2 \cdot (1 + 1 \cdot 2))).\)
Again, notice that the items in the array \([1, 0, 1, 1\)] are listed in reverse order, so \(13 = (1101)_2\) where the base-ten numeral and the base-two numeral represent the same number, thirteen.

Here is a link to an alternate method of finding the base-two numeral for a number.

If you made it to this sentence without skipping any of the discussion above, congratulations! If you did skip some of the discussion, go back and try your best to understand what the algorithm in the previous example is computing: The array \(s\) holds the digits, in reverse order of the binary notation for the number \(n.\) Compare what is done in this algorithm to the one for base-ten in the previous section…​ they are computing the digits for a numeral, but in different bases. If you can understand this algorithm, you will likely understand the rest of the chapter.

6.4. The Base-\(b\) Place Value System

If you made it here, you are ready to learn how to find, given any natural number \(n,\) the numeral that represents \(n\) in the base-\(b\) place value system (It is assumed that the base \(b\) is a natural number greater than or equal to 2.) You can compare the algorithm and example in this subsection to the ones in the preceding subsections for base-ten and base-two.

A base-\(b\) numeral is a string formed from one or more digits out of a set that contains \(b\) symbols, where each symbol is called a "base-\(b\) digit."

  • The string is read from left-to-right.

  • Each digit in the string represents a multiple of a power of the base, \(b,\) depending on the its position in the string.

  • The rightmost place represents a multiple of 1 (which is \(b\) raised to the power zero) and each of the other places represents a multiple of a power of \(b\) that is one greater than the power of \(b\) represented by the place to its right.

  • Notice that the base itself, the number \(b,\) is represented by the string "10" in the base-\(b\) place value system. The string "10" represents the number described by the phrase "1 \(b\) plus 0 ones".

  • As an example, the string "101" represents the number described by the phrase "1 b-_squared plus 0 _b, plus 1 ones." That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \text{ (🤯: Again?!?) }\] Yes, this equation is correct, too, in the base-b place value system!

To avoid confusion, you can enclose each numeral in a pair of parentheses followed by the subscript \(b\) to indicate the base, where \(b\) is written as a base-ten numeral. For example, the previous equation can be written as \[(101)_{b} = (1)_b \cdot (10)_b \cdot (10)_b + (0)_b \cdot (10)_b + (1)_b \cdot (1)_b \] which translates into base-ten as \(b^2 + 1 = 1 \cdot b \cdot b + 0 \cdot b + 1 \cdot 1.\)

6.4.1. An Algorithm That Computes The Digits Of A Base-\(b\) Numeral

This is an adaptation of the algorithm presented earlier for base-two. Notice that all numerals used in this algorithm are base-ten numerals unless otherwise indicated.

  • Task: Given the natural number n, and positive integer constant \(b > 1\) compute an array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] so that each of \(r_{0}, r_{1}, \ldots , r_{k}\) can be represented by a single digit in base-\(b\), and \[ n = r_{k} \cdot b^{k} + \ldots + r_{1} \cdot b^{1} + r_{0} \cdot b^{0} \] where \(k\) is the greatest natural number such that \(b^{k} < n.\)

    • Input: The natural number \(n\)

    • Steps:

      1. Set \(a\) equal to \(n\)

      2. Set \(s\) to the empty array (We will append the values \(r_{0}, r_{1}, \ldots , r_{k}\) to the array \(s\) as we compute them)

      3. Divide \(a\) by \(b\) to find natural numbers \(q\) and \(r\) such that both \(a = q \cdot b + r\) and \(0 \leq r < b.\)

      4. Append \(r\) to the end of array \(s.\)

      5. If \(q \neq 0\)

        1. set \(a\) equal to \(q\)

        2. go to step 3

      6. Return the sequence \(s.\)

    • Output: An array of natural numbers \(s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} \)] where each number is represented by a single digit in base-\(b,\) and \[ n = r_{k} \cdot b^{k} + \ldots + r_{1} \cdot b^{1} + r_{0} \cdot b^{0}.\]

The algorithm rewrites \(n\) as \(r_{0} + b \cdot (r_{1} + b \cdot (r_{2} + \ldots r_{k-1} + (b \cdot (r_{k})) \ldots )).\) The result \(s\) contains the numbers that let you write \(n\) in base-\(b\) notation.

6.4.2. Octal Notation (Base-8)

Example 4 - Finding The Digits Of A Base-8 Numeral (Octal Notation)

The following equations summarize how to determine the digits in the base-8 expanded form numeral for the number 100.

Note that for base-8 we use the eight digits ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, and ‘7’.

\begin{equation} \begin{aligned} 100 {} & = 12 \cdot 8 + 4 & q {} & = 12 & r {} & = 4 & s & = [4] \\ 12 {} & = 1 \cdot 8 + 4 & q {} & = 1 & r {} & = 4 & s & = [4, 4] \\ 1 {} & = 0 \cdot 8 + 1 & q {} & = 0 & r {} & = 1 & s & = [4, 4, 1] \\ \end{aligned} \end{equation}

Notice that \(s = [ 4, \, 4, \, 1 \)] are the numbers (in base-ten notation) corresponding to the base-8 digits of the numeral \(“(144)_8”\) in reverse order. You can verify that \(100 = 1 \cdot 8^{2} + 4 \cdot 8^{1} + 4 \cdot 8^{0}.\) This means that \(100 = (144)_8.\)

6.4.3. Hexadecimal Notation (Base-16)

Example 5 - Finding The Digits Of A Base-16 Numeral (Hexadecimal Notation)

The following equations summarize how to determine the digits in the base-16 expanded form numeral for the number 500.

Note that for base-16, we need sixteen digits! It is traditional to use the ten Hindu-Arabic numerals followed by the first six uppercase English letters as the digits: ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, and ‘F’. So \(10 = (A)_{16},\) \(11 = (B)_{16},\) \(12 = (C)_{16},\) \(13 = (D)_{16},\) \(14 = (E)_{16},\) and \(15 = (F)_{16}.\)
Note: Some programming languages like Python use the lowercase letters 'a' through 'f' instead of the uppercase letters.

The remainders stored in the array \(s\) are represented in base-ten notation, and will need to be replaced by the corresponding hexadecimal digits in the base-16 numeral for 500.

\begin{equation} \begin{aligned} 500 {} & = 31 \cdot 16 + 4 & q {} & = 31 & r {} & = 4 & s & = [4] \\ 31 {} & = 1 \cdot 16 + 15 & q {} & = 1 & r {} & = 15 & s & = [4, 15] \\ 1 {} & = 0 \cdot 16 + 1 & q {} & = 0 & r {} & = 1 & s & = [4, 15, 1] \\ \end{aligned} \end{equation}

As before, we have \(500 = 1 \cdot 16^{2} + 15 \cdot 16^{1} + 4 \cdot 16^{0},\) which you can verify is true. To write the base-16 numeral for 500, you need to replace "15" in base-ten by \((F)_{16}.\) So \(500 = (1F4)_{16}.\)

6.4.4. A Theorem (To Be Proven Later)

We can summarize what the algorithm does as a mathematical theorem, though technically at this point, it’s only a conjecture, an educated guess based on a few cases that seem to indicate that the algorithm will always work. You will learn a technique that will prove the theorem by validating the algorithm for all choices of natural numbers \(n\) and \(b>1\) in the Proofs: Mathematical Induction chapter.

Theorem

Let \(b\) be an integer greater than 1. Any positive integer \(n\) can be expressed uniquely in the form \[n = r_kb^k + r_{k - 1}b^{k-1} + \cdots + r_1b^1 + r_0b^0,\]where \(k\) is a nonnegative integer, \(r_0,r_1,\dots,r_k\) are nonnegative integers less than \(b,\) and \(r_k \neq 0.\)

6.5. Converting From Base-\(b\) to Base-Ten

In this section you will learn how to rewrite a base-\(b\) numeral in base-ten.

Example 6

What is the decimal expansion of the positive integer with base 7 expansion \((1063)_7\)?

Solution

We have

\[\begin{split} (1063)_7 &= 1 \cdot 7^3 + 0 \cdot 7^2 + 6\cdot 7^1 + 3 \cdot 7^0\\ &=1 \cdot 343 + 0 \cdot 49 + 6 \cdot 7 + 3 \cdot 1\\ &= 343 + 0 + 42 + 3\\ &= 388. \end{split}\]

Several common bases used in computer science are base \(2\), base \(8\), and base \(16\), which are referred to as binary, octal, and hexadecimal, respectively. Binary digits are often referred to as bits. Note that, when finding the hexadecimal expansion of a positive integer, in addition to the usual digits \(0\) through \(9,\) we require an additional 6 digits. We will represent these by the letters \(\mathrm{A}\) through \(\mathrm{F}\), where \((\mathrm{A})_{16} = 10,\) \((\mathrm{B})_{16} = 11,\) \((\mathrm{C})_{16} = 12,\) \((\mathrm{D})_{16} = 13,\) \((\mathrm{E})_{16} = 14,\) and \((\mathrm{F})_{16} = 15.\)

Example 7 - Hexadecimal expansion

Find the decimal expansion of the positive integer whose hexadecimal expansion is \((5\mathrm{B}\mathrm{F})_{16}.\)

Solution

We have

\[\begin{split} (5\mathrm{B}\mathrm{F})_{16} &= 5\cdot 16^2 + 11 \cdot 16^1 + 15 \cdot 16^0\\ &= 5\cdot 256 + 11 \cdot 16 + 15 \cdot 1\\ &= 1280 + 176 + 15\\ &= 1471. \end{split}\]

6.6. Base Conversion Among Binary, Octal, and Hexadecimal

One of the ways that octal (base-eight) and hexadecimal (base-sixteen) are used in computer science is to abbreviate long bitstrings. The following examples will show how this is done.

Suppose you need to convert a numeral from hexadecimal to binary. One method would be to first convert from hexadecimal to decimal, and then convert the result from decimal to binary. However, it is much more efficient to notice that since \(2^4 = 16,\) you can express each hexadecimal digit as a block of 4 bits (that is, a bitstring of length 4) as follows:

\[\begin{array}{llll} (0)_{16} = (0000)_2 & (1)_{16} = (0001)_{2}& (2)_{16} = (0010)_2 & (3)_{16} = (0011)_2 \\ (4)_{16} = (0100)_2& (5)_{16} = (0101)_2& (6)_{16} = (0110)_2 & (7)_{16} = (0111)_2\\ (8)_{16} = (1000)_2& (9)_{16} = (1001)_2& (\mathrm{A})_{16} = (1010)_2& (\mathrm{B})_{16} = (1011)_2\\ (\mathrm{C})_{16} = (1100)_2& (\mathrm{D})_{16} = (1101)_2& (\mathrm{E})_{16} = (1110)_2& (\mathrm{F})_{16} = (1111)_2. \end{array}\]

You can then concatenate the blocks, and remove any leading zeros if you need to.

Example 8 - Hexadecimal to Binary Conversion

Find the binary expansion of \((4\mathrm{C}\mathrm{A}7)_{16}.\)

Solution

Each hexadecimal digit can be replaced by a block of 4 bits:

\[\begin{array}{llll} (4)_{16} = (0100)_2 & (\mathrm{C})_{16} = (1100)_2 & (\mathrm{A})_{16} = (1010)_2 & (7)_{16} = (0111)_2. \end{array}\]

This means that you can write either \[(4\mathrm{C}\mathrm{A}7)_{16} = (0100110010100111)_{2}\] or, if the leading zero is not needed, \[(4\mathrm{C}\mathrm{A}7)_{16} = (100110010100111)_{2}.\] Why wouldn’t you always delete leading zeroes? Notice that a bitstring of length 4 can be used to encode a sequence of "Yes/No" or "True/False" answers. As an example, since \((6)_{16} = (0110)_2,\) the hexadecimal digit \((6)_{16}\) can be used to encode the sequence of 4 answers "No, Yes, Yes, No" to a Yes/No survey, and in this context the leftmost bit should be kept to make it clear that the answer to the first question was "No" (as opposed to the sequence "Yes, Yes, No, blank" where the fourth question was not answered.)

To convert a numeral from binary to hexadecimal, first break up the binary notation into blocks of 4 bits, adding a suitable number of leading zeros if necessary. Next, convert each block of 4 bits to a hexadecimal digit and concatenate the results, removing any leading zeros if necessary.

Example 9 - Binary to Hexadecimal Conversion

Find the hexadecimal expansion of \((110 1011 1111)_2.\)

Solution

The are three blocks of 4 bits: \[0110,\ 1011,\ 1111.\] Since \((0110)_2 = (6)_{16},\) \((1011)_2 = (\mathrm{B})_{16},\) and \((1111)_2 = (\mathrm{F})_{16},\) \[(11010111111)_{2} = (6\mathrm{B}\mathrm{F})_{16}.\]

A similar method can be used to convert between octal and binary. Since \(2^3 = 8,\) each octal digit can be written uniquely as a block of 3 bits as follows:

\[\begin{array}{llll} (0)_{8} = (000)_2 & (1)_{8} = (001)_{2}& (2)_{8} = (010)_2 & (3)_{8} = (011)_2 \\ (4)_{8} = (100)_2& (5)_{8} = (101)_2& (6)_{8} = (110)_2 & (7)_{8} = (111)_2. \end{array}\]

We then concatenate blocks, removing any leading zeros if necessary.

Also, the following table can be used to covert quickly between decimal, hexadecimal, octal, and binary in a similar way.

Conversion table for different bases

Decimal

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Hexadecimal

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

Octal

0

1

2

3

4

5

6

7

10

11

12

13

14

15

16

17

Binary

0

1

10

11

100

101

110

111

1000

1001

1010

1011

1100

1101

1110

1111

6.7. Exercises

  1. Convert to decimal (base 10)

    1. \((10262)_7\)

    2. \((30A8)_{16}\)

    3. \((1000010001100)_2\)

    4. \(({12307)}_{60}\)

  2. Convert \(\left(2039\right)_{10}\) from decimal (base 10) to

    1. base 7

    2. binary

    3. hexadecimal (base 16)

    4. octal (base 8)

  3. Convert \(\left(2599\right)_{10}\) from decimal to

    1. base 5

    2. binary

    3. hexadecimal

    4. base 3

  4. Convert the following hexadecimal numerals to binary numerals

    1. \(\left(6F203\right)_{16}\)

    2. \(\left(3FA20C45\right)_{16}\)

    3. \(\left(FACE\right)_{16}\)

  5. Convert the following binary numerals to hexadecimal numerals

    1. \(\left(1111100111010101101\right)_2\)

    2. \(\left(\ 10001111101011\right)_2\)

    3. \(\left(1100101011111110\right)_2\)

7. Sequences and Recursion

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 3, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Sequences are functions with domain a nonempty subset of the natural numbers. That is, sequences are ordered lists of objects indexed by some or all of the natural numbers. The indexed objects in the list are called the terms of the sequence and may be any kind of object - numbers, sets, functions, strings, steps of a proof, steps of an algorithm, etc.

Recursion is a process that you can use to define an object, compute a value, or describe the construction of an object or set of objects, by using a sequence of steps where each step after the initial step refers to one or more previously completed steps.

Key terms and concepts covered in this chapter:

  • Sequences

  • Recursive mathematical definitions

    • Recursive definitions of sequences and functions

      • Factorials

      • Arithmetic and geometric progressions

      • The Fibonacci sequence (also called the Fibonacci numbers)

      • Other sequences and functions

    • Recursive definitions of sets of objects (e.g., rooted trees, valid Java identifiers)

    • The "Towers of Hanoi" game

  • Recurrence relations

    • Solving recurrence relations

7.1. Sequences

A sequence is a function \(s\) from a nonempty subset of the natural numbers \(\mathbb{N} = \{ 0, 1, 2, 3 \ldots \}\) to a set \(C.\) That is, the domain of the sequence is some set of nonnegative integers, and the codomain can be any set. Each "input" value of \(n\) in the domain is called an index. The "outputs" of the sequence are called the terms of the sequence, and are usually denoted by \(s_{n},\) which is usually used instead of the function notation \(s(n),\) but the meaning is the same: The output value that corresponds to the input \(n.\)

7.1.1. Sequences of Numbers

Two common ways to describe or define a sequence of numbers are

  • a single formula, called a closed form for the sequence, that can be used to compute a term from the value of the index \(n,\) or

  • a recursive rule that includes

    • stating the values of the first few terms of the sequence, called the initial value(s) of the sequence, and

    • a recurrence relation that describes how the \(n\)th term of the sequence \(a_{n}\) can be computed using one or more terms that have index less than \(n.\)

In this subsection, several examples of sequences of numbers are presented. You may have seen some of these sequences in your previous mathematics experience but others may be new to you.

An arithmetic sequence is a sequence of numbers generated by a linear expression.

Example 1 - Arithmetic Sequences

Consider the sequence \(a_n=3n+1\) defined for all natural numbers \(n.\) The first 5 terms of this sequence are shown below.

\(a_0=\ 3\left(0\right)+1=1\)

\(a_1=\ 3\left(1\right)+1=4\)

\(a_2=3\left(2\right)+1=7\)

\(a_3=3\left(3\right)+1=10\)

\(a_4=3\left(4\right)+1=13\)

Notice that this sequence can be defined recursively using a recurrence relation : \[a_0=1 \text{ and } a_n=a_{n-1}+3 \text{ for } n \in \mathbb{N}_{>0}.\] So

\(a_0=1\)

\(a_1=a_{1-1}+3=a_0+3=1+3=4\)

\(a_2=a_{2-1}+3=a_1+3=4+3=7\)

\(a_3=a_{3-1}+3=a_2+3=7+3=10\)

\(a_4=a_{4-1}+3=a_3+3=10+3=13\)

Notice that an arithmetic sequence is determined by an initial term and a common difference. The arithmetic sequence \(a_n=3n+1\) is the sequence with initial term \(a_0=1\) and common difference \(d=3\). The general arithmetic sequence is \(a_n=c + d\cdot n\) with initial term \(c\) and common difference \(d.\)

A geometric sequence is a sequence of numbers generated by an exponential expression.

Example 2 - Geometric Sequences

Consider the sequence \(b_n=2\cdot\ 3^n\) defined for all natural numbers \(n.\) The first 5 terms of this sequence are shown below.

\(b_0=2\cdot\ 3^0=2\)

\(b_1=2\cdot\ 3^1=6\)

\(b_2=2\cdot\ 3^2=18\)

\(b_3=2\cdot\ 3^3=54\)

\(b_4=2\cdot\ 3^4=162\)

Notice that this sequence can be defined recursively using a recurrence relation : \[b_0=2 \text{ and } b_n=3\cdot b_{n-1} \text{ for } n \in \mathbb{N}_{>0}.\] So

\(b_0=2\)

\(b_1=3\cdot b_{1-1}=3\cdot2=6\)

\(b_2=3\cdot b_{2-1}=3\cdot6=18\)

\(b_3=3\cdot b_{3-1}=3\cdot18=54\)

\(b_4=3\cdot b_{3-1}=3\cdot54=162\)

Notice that a geometric sequence is determined by an initial term and a common ratio. The geometric sequence \(b_n=2\cdot\ 3^n\) is the sequence with initial term \(b_0=2\) and common ratio \(r=3\). The general geometric sequence is \(b_n=c \cdot r^n\) with initial term \(c\) and common ratio \(r.\)

The factorial is usually defined using a recurrence relation, but with its own notation that does not use a subscript for the index.

Example 3 - The Factorial Function

The factorial of a natural number can be treated as a term of a sequence, called the factorial function. The commonly-used notation for a term of this sequence is \(n!\) and the sequence is defined as \[n! = n \cdot (n-1) \cdots 2 \cdot 1\] with \(1! = 1\) and \(0! = 1.\)

Notice that the factorial can be defined a bit more precisely by using the recurrence relation \[0!=1 \text{ and } n!=n\cdot (n-1)! \text{ for } n \in \mathbb{N}_{>0}.\] So the first few values of the factorial are

\(0!=1\)

\(1!=1\cdot (1-1)!=1\cdot1=1\)

\(2!=2\cdot (2-1)!=2\cdot1=2\)

\(3!=3\cdot (3-1)!=3\cdot2=6\)

\(4!=4\cdot (4-1)!=4\cdot6=24\)

The next sequence is the very famous Fibonacci numbers, named for the Italian mathematician Fibonacci who was also known as Leonardo of Pisa. However, this sequence and its properties were known and discussed by Indian poets and mathematicians such as Pingala, Virahanka, and Hemachandra before Fibonacci was born. The sequence became known to Europeans when Fibonacci used the sequence in his 1202 book Liber Abaci to solve a counting problem that involves breeding pairs of rabbits
The book Liber Abaci ("Book of Calculation") is credited with popularizing the use of the base-ten Hindu-Arabic numeral system in Europe, too.

Example 4 - The Fibonacci Numbers

The Fibonacci sequence can be defined recursively as \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]

\(f_{2} = f_{2-1} + f_{2-2} = f_{1} + f_{0} = 1+0 = 1\)

\(f_{3} = f_{3-1} + f_{3-2} = f_{2} + f_{1} = 1+1 = 2\)

\(f_{4} = f_{4-1} + f_{4-2} = f_{3} + f_{2} = 2+1 = 3\)

\(f_{5} = f_{5-1} + f_{5-2} = f_{4} + f_{3} = 3+2 = 5\)

\(f_{6} = f_{6-1} + f_{6-2} = f_{5} + f_{4} = 5+3 = 8\)

\(f_{7} = f_{7-1} + f_{7-2} = f_{6} + f_{5} = 8+5 = 13\)

Note: The definition used in this textbook matches the ones used in several textbooks, but be warned that other may use definitions that are slightly different (e.g., some sources state the initial values as \(f_{0}=1\) and \(f_{1}=1.\)

7.1.2. Non-numerical Sequences

As mentioned above, the terms of a sequence can be any object. Here are some examples.

Example 5 - A sequence of functions

Consider the sequence \(p_{n}(x)\) of functions that are defined for real number inputs \(x.\)

\(p_{0}(x) = 1,\) that is, the constant function 1,

\(p_{1}(x) = x,\)

\(p_{2}(x) = x^{2},\)

\(p_{3}(x) = x^{3},\)

and in general,

\(p_{n}(x) = x^{n}.\)

This is the sequence of \(n\)th power functions. The subscript of each of the functions matches the power that the input, \(x,\) will be raised to.

Notice that we can define the sequence recursively by \[p_{0}(x)=1, \, f_{1}=1, \text{ and } p_{n}(x) = x \cdot p_{n-1}(x) \text{ for } n \geq 1.\]

The ordered list of steps used in an algorithm is a sequence.

Example 6 - An algorithm for long division
  • Task: Given two positive integers a and b, compute the quotient q and remainder r so that
    \(a = q \cdot b + r\) and \(0 \leq r < b.\)

  • Input: Two positive integers a and b

  • Steps:

    1. Get the input values a and b.

    2. Set r equal to a and set q equal to 0.

    3. If r is less than b, skip to Step 5.

    4. Set r equal to r - b and add 1 to q

    5. If r is greater than or equal to b, then repeat Step 3

    6. Return the output values q and r, and stop.

  • Output: Integers q and r such that both \(a = q \cdot b + r\) and \(0 \leq r < b.\)

    • q is the quotient, that is, the number of times Step 3 was executed.

    • r is the remainder, that is, the result of the last execution of Step 3 (or Step 1 in cases where Step 3 is never executed.)

7.2. Recursion

A recursive definition of a class of objects consists of two steps.

Basis Step

Specify the foundational (usually, the simplest) objects in the class of objects.

Recursion

Describe how to build new objects from one or more already-constructed objects in the class of objects.

7.2.1. Recursively-Defined Structures

For some mathematical objects, it is easier to describe the construction of the objects using a recursive definition.

You may recall that well-formed formulae were defined in the Logic chapter using a recursive definition. We formalize that definition here.

Example 7 - Well-Formed Formulae

The set of well-formed formulas is defined recursively as follows:

Basis Step

A propositional variable is a well-formed formula.

Recursion

We can construct new well-formed formulae from already-constructed well-formed formulae as follows. Suppose that \(\alpha\) and \(\beta\) are already-constructed well-formed formulae. We can construct the following new well-formed formulae:

  • \(\left( \neg \alpha \right)\)

  • \(\left( \alpha \land \beta \right)\)

  • \(\left( \alpha \lor \beta \right)\)

  • \(\left( \alpha \rightarrow \beta \right)\)

  • \(\left( \alpha \leftrightarrow \beta \right)\)

In the next example, we describe how to construct rooted trees, a type of graph.

Example 8 - Rooted Trees

A rooted tree is a type of graph. Graphs are described informally in the Introducing Discrete Mathematics chapter, and will be defined formally in the Graphs chapter.

The set of rooted trees, is defined recursively as follows.

Basis Step

A single vertex r is a rooted tree. The vertex r is called the root of this rooted tree.

Recursion

We can construct a new rooted tree from already-constructed rooted trees as follows. Suppose that for some positive integer n we have n already-constructed rooted trees \(T_{1}, \, \ldots \, T_{n}\) where vertex \(r_{i}\) is the root of rooted tree \(T_{i}\) for positive integers \(i \leq n\) such that
(1) no vertex is in more than one of these rooted trees and
(2) no edge has endpoints in two of these rooted trees.
We can construct a new rooted tree by first adding a new vertex r that is not a vertex of any of the rooted trees \(T_{1}, \, \ldots \, T_{n}\) and then creating new edges from r to each of the already-constructed old root vertices \(r_{1}, \, \ldots \, r_{n}.\) The root of the new rooted tree is the vertex r that was added.

RootedTreeRecursionV2

The preceding image shows the basis step and represents, in part, the results of the first and second uses of the recursion step; note that infinitely-many rooted trees are constructed at each use of the recursion step, so we cannot show all the rooted trees produced at any step other than the basis step. Also, we would need to complete infinitely many steps to construct all possible rooted trees, but any one particular rooted tree you want to construct will be produced after only finitely many uses of the recursion step.

7.2.2. Recursively-Defined Functions

A recursively defined function has two parts:

  • Basis Step: Specify the value of the function at zero

  • Recursion Step: Give a rule for finding its value at an integer from its value at smaller integers.

This is similar to a recurrence relation, but using function notation.

Example 9

Consider again the Fibonacci numbers, but this time given by a function \(f(n)\) where \(f(0)=0\), \(f(1)=1\) and \(f(n)=f(n-1)+f(n-2)\) for integers \(n \geq 2.\)

Applying the formula gives \begin{align*} f(2)&=f(1)+f(0)=1+0=1\\ f(3)&=f(2)+f(1)=1+1=2\\ f(4)&=f(3)+f(2)=2+1=3\\ f(5)&=f(4)+f(3)=3+2=5\\ f(6)&=f(5)+f(4)=5+3=8\\ \end{align*} Thinking of this as a recurrence relation we would write \(f_0=0, f_1=1\) and \(f_n=f_{n-1}+f_{n-2}\). Generating the sequence \({0,1,1,2,3,5,8,\ldots}\).

7.3. Solving Recurrence Relations

Recall from earlier in this chapter that a recurrence relation is used to recursively define a sequence of numbers, based on one or more initial conditions, that is, the value(s) of the lowest-indexed term(s).

The phrase "solving a recurrence relation" means finding a closed form that defines the same sequence as the recurrence relation.

Example 10

Solve the recurrence relation \(a_n=a_{n-1}+3\) when \(a_1=2\).

Solution:

We are looking for a closed formula, so we will successively apply the recurrence relation until we see a pattern. \begin{align*} a_2&=a_1+3=2+3\\ a_3&=a_2+3=(2+3)+3 =2+3\cdot 2\\ a_4&=a_3+3=(2+2\cdot 3)+3=2+3\cdot 3\\ \vdots\\ a_n&=a_{n-1}+3=(2+3(n-2))+3=2+3(n-1)\\ \end{align*} So our closed formula is \(a_n=2+3(n-1)\).

You Try

Solve the recurrence relation \(b_n=3b_{n-1}\) when \(a_1=5\).

There are techniques used to solve certain classes of recurrence relations. For now, we will focus on only one case, the class of second-order linear homogeneous recurrence relations.

Example 11 - Solving a second-order linear homogeneous recurrence relation

Consider the recurrence relation \(a_n= b \cdot a_{n-1} + c \cdot a_{n-2}\) where \(b\) and \(c\) are constants and the initial values \(a_0\) and \(a_1\) will be ignored for now.

Notice that you can find at least one solution of the form \(a_n = r^{n}\) for a "suitable" nonzero value of \(r.\) A "suitable" value of \(r\) can be found by stating that you want the following equation to be True for all natural numbers \(n \geq 2\) (and showing that such a "suitable" value of \(r\) actually exists!) \[ r^{n} = b \cdot r^{n-1} + c \cdot r^{n-2} \]

This means that \(r^{2} = b \cdot r^{1} + c \cdot r^{0}\) or more simply \[r^{2} = b \cdot r + c\] and you can solve for \(r\) by factoring or using the quadratic formula.

Notice that if the quadratic equation has two different solutions, then either one of those values can be used as the value of \(r,\) so you’ve actually found two solutions. In fact, you have found all that you need to find every solution, as described in the specific example below.

As an example, consider the recurrence relation \(a_n= 5 \cdot a_{n-1} - 6 \cdot a_{n-2}.\) Based on the previous argument, \(a_n = r^{n}\) is a solution as long as \(r^{2} = 5r - 6,\) that is, \(r^{2} - 5r + 6 = 0.\) The quadratic equation has two solutions: \(r = 2\) ands \(r = 3,\) so each of the closed forms \(a_n = 2^{n}\) and \(a_n = 3^{n}\) describes a solution (but notice that the initial values for \(a_1\) are different.) It is not too difficult to see that any constant multiple of either of the two solutions will give another solution; for example, \(a_n = (-7) \cdot 2^{n}\) and \(a_n = 5 \cdot 3^{n}\) are two more solutions. Also, a sum of any two solutions will still be a solution, so \(a_n = (-7) \cdot 2^{n} + 5 \cdot 3^{n}\) is yet another solution. In fact, we will be able to prove in the chapter on mathematical induction that every solution of this recurrence relation is of the general closed form \(a_n = \alpha \cdot 2^{n} + \beta \cdot 3^{n}\) where \(\alpha\) ("alpha") and \(\beta\) ("beta") are constants, which can be adjusted to match any initial values \(a_0\) and \(a_1\) you want to use: Notice that after substituting 0 for \(n\) you get \(a_0 = \alpha + \beta,\) and after substituting 1 for \(n\) you get \(a_1 = 2 \alpha + 3 \beta,\) so you need to solve a system of two linear equations in two unknowns to determine the values of \(\alpha\) and \(\beta.\)

Note: In the case where the quadratic has only one solution \(r\) (that is, \(r\) is a "double root"), the general closed form solution is \(a_n = (\alpha \cdot n + \beta) \cdot r^{n}\) where \(\alpha\) and \(\beta\) are constants.

You Try

First, find the general closed form solution for the recurrence relation \(b_n=-b_{n-1}+20b_{n-2}.\) Next, find the constants \(\alpha\) and \(\beta\) if the initial conditions \(a_0 =1\) and \(a_1=2\) must also be satisfied.

7.4. Towers Of Hanoi

The Tower Of Hanoi

The Towers of Hanoi is a game that was introduced and sold by the French mathematician Édouard Lucas in the 1880s. Lucas stated that the game is "professedly of Indo-Chinese origin" but it seems that Lucas invented this story to market the game.
Image credit: "PSM V26 D464 The tower of hanoi.jpg". This work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer.

At the start of the game, a set of disks of different radii are stacked on a single peg to form a "tower." The disks are stacked so that the radii of the disks decrease as you move up the stack. There are also two empty pegs. The game is won by moving the stack of disks from the original peg to another peg using the following rules

  1. Only one disk can be moved at a time.

  2. The disk at the top of a stack can be moved to the top of another stack or on to an empty peg.

  3. A disk can never be placed on top of a disk that has a smaller radius.

The Towers of Hanoi can be used to explore recursive algorithms, complexity of algorithms, and recurrence relations, based on the following questions.

  • What is the minimal number of moves needed to win the game when there is 1 disk? 2 disks? 3 disks? \(n\) disks?

  • What relationship (if any) is there between the minimal number of moves needed to win an \(n\)-disk game and the minimal number of moves needed to win an \((n-1)\)-disk game?

MORE TO COME!

8. Functions

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 11, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Informally, a function \(f\) from set D to set C is a rule that assigns to each input element in D exactly one output element from C. The set D is called the domain of the function, and the set C is called the codomain of the function. This informal definition was given in the chapter Introducing Discrete Mathematics.

The informal definition implies that every element in the domain D is an "input" that is assigned an output value in the codomain C. The informal definition does not imply that every element in the codomain C is an output for some input in the domain D. The highlighted sentence may seem unimportant since you usually only care about the outputs you can actually get from a function, but the example presented in the next section shows why it is important to be precise about what set the codomain is. A formal definition of function is introduced in this chapter to address this need for precision.

Key terms and concepts covered in this chapter:

  • Functions

    • Domain

    • Codomain

    • Range

  • Properties of functions (injectivity, surjectivity, bijectivity)

    • A bijective function is the same as a one-to-one correspondence

    • An injective function is one that assigns every pair of different inputs to a pair of different outputs

    • A surjective function is one whose range is equal to its codomain (that is, every element of the codomain is an output assigned to one or more inputs)

    • A bijective function is both injective and surjective

  • Inverse functions

  • Composition of functions

8.1. Why Specifying the Codomain is Important

The following example compares two implementations, in different programming languages, of the "same" function: The input values are the same, and the rule that assigns to each input its one and only output is the same. However, the two implementations use different codomains which effects how the output values can be used.

Example 1 - The floor() functions in Python and Java

In computing, integer data types are used to represent loop counters or indices into arrays, while floating-point data types are used to represent real numbers (like decimals) in scientific or financial calculations. In general, a number that can be stored using an integer type can also be stored using a floating-point type, but the ways in which that number can be used will depend on the data type. The following code examples show why it is important to keep this in mind when coding.

First, recall that the floor of x, written as \(\lfloor x \rfloor,\) is the greatest integer less than or equal to the real number x. The floor() function is available in both Python and Java as an implementation of \(\lfloor x \rfloor.\) In both programming languages floor() takes a double precision floating-point number as its input, but Python floor() returns a value of integer data type while Java floor() function returns a value of floating-point data type.

More detail about floating point
On most hardware, both Python’s float data type and Java’s double data type are implementations of IEEE 754 double precision 64-bit floating-point numbers. For example, the decimal number 1.4 is encoded by the bitstring \[001111111111011001100110011001100110011001100110011001100110011\] of length 64 whether you use a Python float with value 1.4 or a Java double with value 1.4. The underlying bitstring used for the encoding 1.4 is the same in both languages.
The floor() function in Python takes a float as input and returns an int
The floor() function in Java takes a double as input and returns a double.
This means that the floor() functions in Python and Java have the same domain and use, essentially, the same rule to compute output values, but have different codomains (that are represented by different data types in the two languages.)

Notice that in the Python code below, the return value from the floor() function is an int which we can then use as an index into list L.

To step through the code, click on the "Next" button.

Now notice that in the Java code below, that the return value from the floor() function is a double which we cannot use as an index into array L without error. We must use a composition of functions (Java’s floor() function followed by the function that casts an input of type double to an output of type int) to get the correct data type for an array index.

To step through the code, click on the "Next" button.

The formal definition given in the next section will let us distinguish between two functions that use the same rule and have the same domain but have different codomains.

8.2. A Formal Definition Of Function

Definition

A function \(f\) from set \(A\) to set \(B\) is an ordered triple \((f,\, A,\, B)\) consisting of sets \(f,\) \(A,\) and \(B\) such that

  • \(f\) is a subset of the Cartesian product \(A \times B\) and

  • each element of \(A\) appears as the first coordinate of exactly one pair \(( a, \, b) \in f.\)

That is, \(f \subseteq A \times B\) and for each element \(a \in A\) there is exactly one \(b \in B\) such that \((a,\, b) \in f\). The set \(f\) of ordered pairs is called the graph of the function. The set \(A\) is called the domain of the function and the set \(B\) is called the codomain of the function.
Note: This definition of a function as an ordered triple is based on the Bourbaki definition in the 1970 book Théorie des ensembles.

Why would we need such a highly technical formal definition? The reason why the ordered triple is used in the definition is that we need to be able to distinguish two functions that have the same graph, as a set of ordered pairs, but different codomains. Two functions can have different codomains even if their graphs, as sets of ordered pairs, are the same set (Notice that if two functions have the same graph then they must have the same domain.) If this is not clear, see the example "Three closely-related functions, no two of which are equal," which comes after the definitions listed below.

  • We write \(f : A \rightarrow B\) to state that \(f\) is a function from set \(A\) to set \(B.\)

  • We often refer to the ordered triple as "f" without explicitly mentioning the other two members of the ordered triple. That is, we refer to the function as its set of ordered pairs \(f\), but it is very important to remember that the actual definition includes the domain and codomain, too.

  • It is important to note that the graph of \(f\) is the set of ordered pairs which we often represent by plotting points, but that plot is only a representation of the graph (in the same way that "five" and "cinco" are verbal representations of a number but are not the number itself.)

  • We write \(f(a)=b\) instead of \((a,\, b) \in f\). The value \(b = f(a)\) is called the image of \(a\) assigned by \(f,\) and \(a\) is called the pre-image of \(b.\)

  • The range of \(f\) is the set \(\{ f(a) : a \in A \}\), that is, the set of all images (output values) assigned by \(f.\) The range is the set of \(b \in B\) such that there is at least one ordered pair \(( a, \, b) \in f.\)

  • Two functions are equal if they have the same graph, the same domain, and the same codomain. That is, the functions \((f,\, A,\, B)\) and \((g,\, S,\, T)\) are equal if they are identical as ordered triples: \(f = g\) and \(A = S\) and \(B = T.\) We can also simply say that "\(f\) and \(g\) are the same function."

  • Notice that the graph \(f\) in the formal definition replaces the rule used in the informal definition. Given the graph, which is the set of ordered pairs, we can state a rule as "given an input \(a \in A,\) the output is the one \(b \in B\) such that \((a,\, b) \in f.\)" This is exactly how you would use a table of values to represent a function: Find the row with the input value then choose the value in the output column

The graph (i.e., the set of ordered pairs), the domain, and the codomain determine the function, NOT the formula, words, table, plot, or code used to describe a rule for the function.

The graph of a function determines how to assign each input to its output. For example, the functions \(f: \mathbb{R} \rightarrow \mathbb{R}\) and \(g: \mathbb{R} \rightarrow \mathbb{R}\) defined by the formulae \(f(x) = |x|\) and \(g(x) = \sqrt(x^{2})\) are equal, and in fact are one and the same function, because \(f = g\) as sets, so the "two" functions have the same graph, the same domain, and the same codomain. The "two" functions are just two ways of describing the same ordered triple.

Example 2 - Three closely-related functions, no two of which are equal.

Consider functions \(f\), \(g\), and \(h\) defined as follows:

\(f : \mathbb{R} \rightarrow \mathbb{R}\) is defined by \(f = \{ (x,\,x^{2}) \mid x \in \mathbb{R} \}.\)

\(g : \mathbb{R} \rightarrow \mathbb R_{\ge 0}\) is defined by \(g = \{ (x,\,x^{2}) \mid x \in \mathbb{R} \}.\)

\(h : \mathbb{N} \rightarrow \mathbb{N}\) is defined by \(h = \{ (x,\,x^{2}) \mid x \in \mathbb{N} \}.\)

No two of these functions are equal even though they can all be described by the rule "the output is the square of the input" and have identical formulas: \(f(x) = x^{2},\) \(g(x) = x^{2},\) and \(h(x) = x^{2}.\) Each of the functions is defined for a different domain and/or codomain than the other two. In particular, \(f\) and \(g\) are not equal because they have different codomains, even though the two functions have the same graph and the same domain.

8.3. Properties of Functions

In this subsection you will learn about several properties of functions.

8.3.1. Injective Functions

A function \(f : A \rightarrow B\) is injective if distinct elements of the domain \(A\) are mapped to distinct elements of the range. That is, for all \(a_1\) and \(a_2\) in \(A,\) if \(a_1 \neq a_2\) then \(f(a_1) \neq f(a_2).\) Using the contrapositive, this can be stated as: For all \(a_1\) and \(a_2\) in \(A,\) if \(f(a_1) = f(a_2)\) then \(a_1 = a_2.\)
Note: Injective functions are also called one to one functions. The Remix avoids this term because it is easy to confuse "one to one function" with "one-to-one correspondence."

Example 3 - Injective Functions

Consider the functions
\(f : \mathbb{Z} \rightarrow \mathbb{Q}\) defined by \(f(n) = 2^{n}\)
\(g : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(g(n) = n^{2},\) and
\(h : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(h(n) = n + 2,\) and
\(k : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(k(n) = \frac{1}{4}((-1)^n (2n+1) - 1).\)

\(f\) is injective because different input values must be mapped to different output values. Notice that \(f(a) = f(c)\) means that \(2^{a} = 2^{c}\) from which \(2^{a} / 2^{c} = 1 = 2^{a-c}\) must be True, so \(a-c = 0\) must be True, that is, \(a = c.\)

\(g\) is not injective because the input values \(2\) and \(-2\) are mapped to the same output value: \((2)^2 = 4\) and \((-2)^2 = 4.\)

\(h\) is injective because \(h(a) = h(c)\) means that \(a+2 = c+2,\) which means that \(a=c.\)

\(k\) is not injective because the input values \(-1\) and \(0\) are mapped to the same output value, 0.

8.3.2. Surjective Functions

A function \(f\) from the set \(A\) to the set \(B\) is surjective if the image set of \(A\) is the entire set \(B\). This means than for each element \(b\) in the codomain \(B,\) there is some element \(a \in A\) with \(f(a)=b\).
Note: Surjective functions are also called onto functions.

Example 4 - Surjective Functions

Consider the functions
\(f : \mathbb{Z} \rightarrow \mathbb{Q}\) defined by \(f(n) = 2^{n}\)
\(g : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(g(n) = n^{2},\) and
\(h : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(h(n) = n + 2,\) and
\(k : \mathbb{Z} \rightarrow \mathbb{Z}\) defined by \(k(n) = \frac{1}{4}((-1)^n (2n+1) - 1).\)

\(f\) is not surjective since it is not possible for \(2^{n}\) to have a value that is less than or equal to 0.

\(g\) is not surjective because is not possible for \(n^{2}\) to have a value that is less than 0.

\(h\) is surjective because every \(b\) in the codomain \(\mathbb{Z}\) is an output for some input: Notice that \(h(b-2) = (b-2+2) = 2.\)

\(k\) is surjective because every \(b\) in the codomain \(\mathbb{Z}\) is an output for some nonnegative input - for inputs \(n \geq 0,\) the outputs \(k(n)\) are shown in the lower row of the image.

NtoZ

Notice that whether a function is surjective depends on what the function’s codomain. This is, again, why the formal definition of function is needed.

8.3.3. Bijective Functions

A function \(f\) is bijective if it is both injective and surjective.

Example 5 - Verifying a function is bijective

Verify that the function \(f\left(x\right)=3x+5\), from \(f:R\rightarrow R\), is bijective.

Solution

For injectivity, suppose \(f\left(m\right)=f(n)\). We want to show \(m=n\).

\(f\left(m\right)=f(n)\)

\(3m+5=3n+5\)

Subtracting 5 from both sides gives \(3m=3n\), and then multiplying both sides by \(\frac{1}{3}\) gives \(m=n\).

To show that \(f\left(x\right)\) is surjective we need to show that any \(c\in R\) can be reached by \(f\left(x\right)\). Specifically, to show that \(f\left(x\right)\) is surjective, we need to show that for any \(c\in R\), there is a corresponding \(x\) for which \(f\left(x\right)=c\). To show this consider \(f\left(x\right)=3x+5\). Equate to \(c\) and solve for \(x\).

\(f\left(x\right)=3x+5=c\)

Well, \(3x+5=c\) gives \(3x=c-5\) or \( x=\frac{c-5}{3}\). So, for any \(c\), there is an \(x\), namely \(x=\frac{c-5}{3}\), for which \(f\left(x\right)=c\).

8.4. Inverse Functions

Informally, a function \(f\) is invertible if each \(b\) in the codomain \(B\) is assigned to exactly one input \(a\) in the domain \(A.\)

Formally, a function \(f : A \rightarrow B\) is invertible if the ordered triple \((\{(b, a) \, | \, (a, b) \in f \},\, B,\, A)\) is a function.

The set \(\{(b, a) \, | \, (a, b) \in f \}\) is usually denoted by \(f^{-1}\) even in cases when \(f\) is not invertible.

For example if \((a,b)\), corresponds to \(f(a)=b\) , then \( f^{-1}: B \rightarrow A\), corresponds to \( f^{-1}(b)=a\).

The following theorem shows that invertibility of a function is equivalent to bijectivity, or a function being both injective and surjective.

Theorem on Invertibility

A function \(f: A \rightarrow B\) is invertible if and only if \(f\) is bijective.

Being able to solve an equation, amounts to being able to invert a function. Notationally, solving \(f(x) =b\) means solving for \(x\).

Using inverses \(f(x) =b\) is solved \(x=f^{-1}\left(b\right)\).

Consider, for example, \(f\left(x\right)=x^3\) we know

\$ f^{\left(-1\right)}\left(x\right)=root(3)(x)\$

Solving \(f\left(x\right)=2\) means solving \(x^3=2\). To solve \(f\left(x\right)=2\), we use \(x=f^{-1}\left(8\right)\), which in this case means,

\$ x=f^{-1}\left(8\right)=root(3)(8) = 2\$

An easy check \( f\left(2\right)=2^3=8\) and

\$ f^{-1}\left(8\right)=root(3)(8) = 2\$

Functions can, in many cases, be visualized graphically. For example when mapping from the real line \(\mathbb{R}\) to the real line such maps are viewed on a Cartesian plane.

In Appendix: Library of Functions, several functions and their plots are shown to illustrate the important concepts of functions, including domain, codomain, range, and invertibility.

8.5. The Algebra of Functions

If two functions \(f\left(x\right)\) and \(g\left(x\right)\) have the same domain \(A\) and same codomain \(\mathbb{R},\) then you can combine these functions using the operations addition, subtraction, multiplication, and division.

The Algebra of Functions
  1. \(\left(f+g\right)\left(x\right)=f\left(x\right)+g\left(x\right)\)

  2. \(\left(f-g\right)\left(x\right)=f\left(x\right)-g\left(x\right)\)

  3. \(\left(f\cdot\ g\right)\left(x\right)=f\left(x\right)\cdot\ g\left(x\right)\)

  4. \(\left(\frac{f}{g}\right)\left(x\right)=\frac{f\left(x\right)}{g\left(x\right)},\ \ g\left(x\right)\neq0\)

Example 6

Consider \(f\left(x\right)=x^2+1\) and \(g\left(x\right)=\sqrt x\) defined on \(f,\ g: \mathbb{R}_{\geq0} \rightarrow \mathbb{R}\). Find the rules for the functions \(\left(f+g\right)\), \(\left(f-g\right)\), \(\left(f\cdot\ g\right)\), and \(\left(\frac{f}{g}\right)?\)

Solution

The common domain is \(\mathbb{R}_{\geq0}\), since the square root is real valued only for \(\ x\ \geq0\).

\(\left(f+g\right)\left(x\right)=f\left(x\right)+g\left(x\right)=x^2+1+\sqrt x\) , for \( x ≥ 0\)

\(\left(f-g\right)\left(x\right)=f\left(x\right)-g\left(x\right)=x^2+1- \sqrt x\) , for \( x ≥ 0\)

\(\left(f\cdot\ g\right)\left(x\right)=f\left(x\right)\cdot\ g\left(x\right)=\left(x^2+1\right)\cdot\ \sqrt x\), for \( x ≥ 0\)

\(\left(\frac{f}{g}\right)\left(x\right)=\frac{f\left(x\right)}{g\left(x\right)}=\frac{x^2+1\cdot\ }{\ \sqrt x}\), for \( x > 0\).

Notice that the domain of \(\frac{f}{g}\) is \(x>0\), because \(g\left(0\right)=\sqrt0=0\), and division by \(0\) is not defined.

8.6. Composition of Functions

Suppose \(g:A\rightarrow B\) and \(f:B\rightarrow C\), then the functions \( f\) and \(g\), can be composed to obtain a function \(h:A\rightarrow C\), denoted as follows,

\(h\left(x\right)=\left(f\circ g\right)\left(x\right)=f\left(g\left(x\right)\right)\) provided \(x\ \in\ A\) and \(g\left(x\right)\in B\).

Example 7

Consider \(f\left(x\right)=\frac{1}{x}\) and \(g\left(x\right)=2x-3\), defined on \(f,g:R\rightarrow R\). Notice that \(g\left(x\right)\) is defined for all real \(x\) and \(f\left(x\right)\) is defined for all real \(x\ \neq0\). Form the compositions, \(h\left(x\right)=\left(f \circ g\right)\left(x\right)\), and \(k\left(x\right)=\left(g \circ f\right)\left(x\right)\). Also determine their respective domains.

Solution

\(h\left(x\right)=\left(f \circ g\right)\left(x\right)=f\left(g\left(x\right)\right)=f\left(2x-3\right)=\frac{1}{2x-3}\). Here \(x\) needs to be in the domain of \(g\left(x\right)\), or all real \(x\), and \(g\left(x\right)\) needs to be in the domain of \(f\left(x\right)\). In particular \(g\left(x\right)\neq 0\), or \(2x-3\ \neq 0\), or \(x\ \neq\frac{3}{2}\).

By contrast, \(k\left(x\right)=\left(g\circ f\right)\left(x\right)=g\left(f\left(x\right)\right)=g\left(\frac{1}{x}\right)=2\left(\frac{1}{x}\right)-3=\frac{2}{x}-3\). Here \(x\) needs to be in the domain of \(f\left(x\right)\), or \(x\ \neq 0\), and \(f\left(x\right)\) needs to be in the domain of \(g\left(x\right)\), or \(f\left(x\right)\) can be any real number.

Example 8 - composing inverse functions

Consider \(f\left(x\right)=x^3+1\) and \$g(x) =root(3)(x-1)\$ defined on on \(f,g:R\rightarrow R\). Show that \(\left(g \circ f\right)\left(1\right)=1, \left(g \circ f\right)\left(2\right)=2, \left(g\circ f\right)\left(3\right)=3\), and \(\left(g\circ f\right)\left(x\right)=x\)

Solution

\(f\left(1\right)=1^3+1=2\)

\(f\left(2\right)=2^3+1=9\)

\(f\left(3\right)=3^3+1=28\)

\(f\left(x\right)=x^3+1\)

Therefore,

\( \left(g\circ f\right)\left(1\right)=g\left(f\left(1\right)\right)=g\left(2\right)=\) \$ root(3)(2-1)= root(3)(1)=1\$

\(\left(g\circ f\right)\left(2\right)=g\left(f\left(2\right)\right)=g\left(9\right)=\) \$ root(3)(9-1)= root(3)(8)=2\$

\(\left(g\circ f\right)\left(3\right)=g\left(f\left(3\right)\right)=g\left(28\right)=\) \$ root(3)(28-1)= root(3)(27)=3\$

\(\left(g\circ f\right)\left(x\right)=g\left(f\left(x\right)\right)=g\left(x^3+1\ \right)=\)\$ root(3)(x^3 +1 -1)= root(3)(x^3 )=x\$

Notice, in the last example, that \(g\left(x\right)\) undoes \(f\left(x\right)\), in the following sense:

\(f:1\rightarrow 2\) and \(g:2\rightarrow 1\), or the ordered pair \(\left(1,2\right)\) in \(f\), corresponds to \(\left(2,1\right)\) for \(g\).

\(f:2\rightarrow 9\) and \(g:9\rightarrow 2\), or the ordered pair \(\left(2,9\right)\), in \(f\), corresponds to \(\left(9,2\right)\) for \(g\).

\(f:3\rightarrow 28\) and \(g:28\rightarrow 3\), or the ordered pair \(\left(3,28\right)\), in \(f\), corresponds to \(\left(28,3\right)\) for \(g\).

\(f:x\rightarrow x^3+1\) and \(g:x^3+1\rightarrow x\), or the ordered pair \(\left(x,x^3+1\right)\), in \(f\), corresponds to \(\left(x^3+1,x\right)\) for \(g\).

The function \$ g(x))= root(3)(x-1) \$ is said to be the inverse of the function \(f\left(x\right)=x^3+1\). We have shown explicitly that \(\left(g\circ f\right)\left(x\right)=x\).

8.6.1. Inverse Functions and Composition

Notice that if you happen to have two functions \(f : A \rightarrow B\) and \(g : B \rightarrow A\) such that \((g \circ f)(a) = g(f(a)) = a\) for every \(a \in A\) and \((f \circ g)(b) = f(g(b)) = b\) for every \(b \in B,\) then \(f\) and \(g\) are inverse functions.

Example 9 - finding an inverse

Find the inverse \(g\left(x\right)\) of the bijective function \(f\left(x\right)=3x+5\) for \(f,\ g:R\rightarrow R\) . Verify the inverse and show \(\left(f \circ g\right)\left(x\right)=x=\left(g \circ f\right)\left(x\right)\).

Show specifically that \(f\left(2\right)=11\), and \(g\left(11\right)=2\).

Solution

If \(f:x\rightarrow y\) corresponds to \((x,y)\), then the inverse \(g:y\rightarrow x\) corresponds to \((y,x)\). This means that the inverse of the relation \(y=f\left(x\right)=3x+5\), is the relation \(x=f\left(y\right)=3y+5\).

Solving for \(y\) in \(x=f\left(y\right)\), gives \(f^{-1}(x)=y\). Solving for \(y\) in \(x=f\left(y\right)=3y+5\), gives \(x-5=3y\) or \(\frac{x-5}{3}=y=\ f^{-1}(x)=g(x)\).

We now verify that \(\left(f\circ g\right)\left(x\right)=x=\left(g \circ f\right)\left(x\right)\).

\(\left(f\circ g\right)\left(x\right)=f\left(\frac{x-5}{3}\right)=\ 3\left(\frac{x-5}{3}\right)+5=\left(x-5\right)+5=x\),

and \(\left(g \circ f\right)\left(x\right)=g\left(3x+5\right)=\ \frac{(3x+5)-5}{3}=\frac{3x+5-5}{3}=\frac{3x}{3}=x\).

Finally \(f\left(x\right)=3x+5\), and \(f\left(2\right)=3\left(2\right)+5=6+5=11\), or \(f:2\rightarrow 11\)

and \(g\left(x\right)=\frac{x-5}{3}\) and , \(g\left(11\right)=\frac{11-5}{3}=\frac{6}{3}=2\) or \(g:11\rightarrow 2\).

8.7. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with only minor changes.

  1. What can be said about the relation \(f:A\rightarrow B\), if

    1. \(\exists z\in B\forall x\in A,f\left(x\right)\neq z\)

    2. \(\exists x,y \in A, \exists z\in B,\left(x\neq y\right)\bigwedge\left(f\left(x\right)=f\left(y\right)=z\right)\)

    3. \(\forall x,y\in A, \left(f\left(x\right)=f\left(y\right)\right)\ \rightarrow\left(x=y\right)\)

    4. \(\forall x,y\in A,\left(x\neq y\right)\rightarrow\left(f\left(x\right)\neq f\left(y\right)\right)\)

    5. \(\forall z\in B, \exists x,f\left(x\right)=z\)

    6. \(\exists x,y\in A,\left(f\left(x\right)=f\left(y\right)\right)\bigwedge\left(x\ \neq\ y\right)\)

  2. Explain why exponential function \(f(x)=2^x\) is not surjective from \(f: \mathbb{R} \rightarrow \mathbb{R}\), but is in fact a bijection from \(f: \mathbb{R} \rightarrow \mathbb{R}^+\).

  3. Use properties of logarithms to show that \(f\left(x\right)=2^x\) and \(g\left(x\right)=\log_2{x}\), where \(f, g: \mathbb{R} \rightarrow \mathbb{R}\), are inverses by verifying that \(f\left(g\left(x\right)\right)=g\left(f\left(x\right)\right)=x\).

  4. Use properties of logarithms to show that \(f\left(x\right)=10^x\) and \(g\left(x\right)=\log{x}\), where \(f, g: \mathbb{R} \rightarrow \mathbb{R}\), are inverses by verifying that \(f\left(g\left(x\right)\right)=g\left(f\left(x\right)\right)=x\).

  5. Show that the function \(f\left(x\right)=5x-3\), from \(f: \mathbb{R} \rightarrow \mathbb{R}\), is bijective and find its inverse.

  6. Show that the function \(f\left(x\right)=2x^3-1\), from \(f: \mathbb{R} \rightarrow \mathbb{R}\) is bijective and find its inverse.

  7. Consider the function \(f(x) = \left \lceil x \right \rceil\) where \(f:\mathbb{R}\rightarrow\mathbb{Z}\).

    1. Is the function a surjection? Explain.

    2. Is the function an injection? Explain

    3. Is the function a bijection? Explain

    4. Is the inverse mapping a function? Why or why not?

    5. Evaluate

      1. \(f\left(-2.1\right)\)

      2. \(f\left(-1.9\right)\)

      3. \(f\left(1.5\right)\)

      4. \(f\left(1.9\right)\)

      5. \(f\left(2\right)\)

      6. \(f\left(2.3\right) \)

    6. Suppose \(g\left(x\right)=2x\), with \(f\left(x\right)=\left\lceil x\right\rceil\). Evaluate the following:

      1. \(f\left(g\left(2.3\right)\right)\)

      2. \(g\left(f\left(2.3\right)\right)\)

  8. Explain why ceiling function \( \left \lceil x \right \rceil\) is not surjective from \(f: \mathbb{R} \rightarrow \mathbb{R}.\)

  9. Consider the function \(f(x) = \left \lfloor x \right \rfloor\) where \(f:\mathbb{R}\rightarrow\mathbb{Z}\).

    1. Is the function a surjection? Explain.

    2. Is the function an injection? Explain

    3. Is the function a bijection? Explain

    4. Is the inverse mapping a function? Why or why not?

    5. Evaluate

      1. \(f\left(-5.1\right) \)

      2. \(f\left(-3.9\right)\)

      3. \(f\left(-3.2\right)\)

      4. \(f\left(5\right) \)_

      5. \(f\left(5.3\right)\)

    6. Suppose \(g\left(x\right)=3x\), with \(f\left(x\right)=\left\lfloor x\right\rfloor\). Evaluate the following:

      1. \(f\left(g\left(5.3\right)\right)\)

      2. \(g\left(f\left(5.3\right)\right)\)

  10. The absolute value function, denoted \(f(x)=|x|\), where \(f\left(x\right):\mathbb{R} \rightarrow \mathbb{R}\), gives the distance from \(x\) to \(0\). For example, \(f\left(2.5\right)=\left|2.5\right|=2.5\). And \(f\left(-4.5\right)=\left|-4.5\right|=4.5\). Notice that if \(x \geq 0\), then \(\left|x\right|=x\). However if \(x<0\), then \(\left|x\right|=\ -x\). We can state this using the notation for piecewise functions:

    \$f(x) = |x|={( x, if x ≥ 0),(-x,if x < 0):}\$
    1. Graph \(f\left(x\right)=|x|\), for -\(10\ \le x\ \le10\)

    2. Evaluate

      1. \(f(-5)=|-5|\),

      2. \(f(-2.5)=|-2.5|\),

      3. \(f(3.5)=|3.5|\).

    3. Show that \(f\left(x\right)=\left|x\right|\), with \(f:\mathbb{R}\rightarrow \mathbb{R}\), is not injective.

    4. Show that \(f\left(x\right)=\left|x\right|\), with \(f:\mathbb{R}\rightarrow \mathbb{R}\), is not surjective.

    5. Consider \(g\left(x\right)=3x+2\), with \(g:\mathbb{R}\rightarrow \mathbb{R}\), and \(f\left(x\right)=|x|\). Find and simplify the following:

      1. \(\left(g\circ f\right)\left(x\right)\)

      2. \(\left(f\circ g\right)\left(x\right)\)

  11. A real-valued function, \(f: \mathbb{R} \rightarrow \mathbb{R}\), is said to be strictly increasing if whenever \$x<y\$, then \$f(x)<f(y)\$.

    1. State this using logical quantifiers.

    2. State a similar definition for a strictly decreasing function, and then translate using logical quantifiers.

9. Relations

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 11, 2025.
added additional example at end of chapter
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Relations are used to describe an association of data.

For example, imagine a university database of students. The database needs to have a record for each student, and the student’s record needs to include fields for the student’s name(s), the unique student ID number for the student, the student’s current status, a list of courses that the student has enrolled in or has completed (along with the grade earned in each completed course,) and possibly other data associated with the student. One way to visualize the database is as a two-dimensional table, similar to a spreadsheet worksheet, where each row corresponds to a record and each column corresponds to a field; each row can be treated as an ordered \(n\)-tuple, where \(n\) is the number of fields.

In this chapter, you will learn about the formal definition of relation, operations and properties of relations. You will also learn about some special types of relations, namely partial orderings and equivalence relations. As a special case of equivalence relations, you will learn aobut congruence relations of integers as well as modular arithmetic.

Key terms and concepts covered in this chapter:

  • Relations

  • Properties of relations (reflexivity, symmetry, transitivity, and other properties)

  • Equivalence relations

    • Equivalence classes

  • Modular arithmetic

    • Congruences

  • Partial orders

  • Well orderings

9.1. Definition of Relation

Informally, a relation on two or more sets is an association between elements of those sets. Formally, a relation is a subset of a Cartesian product of two or more sets.
For simplicity, it is assumed in this textbook that the number of sets used to form the Cartesian product is finite. It is possible to define relations as subsets of a Cartesian product of infinitely many sets, in which case the elements of the relation are infinite sequences.

Definition

An n-ary relation on the sets \(A_{1}, \, A_{2}, \, \ldots \, A_{n},\) where \(n \geq 2\) is a natural number, is a subset \(R\) of the Cartesian product \(A_{1} \times A_{2} \times \cdots \times A_{n}.\) That is, any subset \(R \subseteq A_{1} \times A_{2} \times \cdots \times A_{n}\) is a n-ary relation on the sets \(A_{1}, \, A_{2}, \, \ldots \, A_{n}.\)

The sets \(A_{1}, \, A_{2}, \, \ldots \, A_{n}\) are called the domains of the relation \(R.\)

The number of sets \(n\) is called the degree of the relation \(R.\)

Here are several examples of relations.

Example 1 - Relations
  • Let \(A\) be the set of names of students currently enrolled at a college, \(B\) be the set of all possible student ID numbers, and \(C\) be the set of all classes currently offered at the college. The set \[ R = \{ (x, y, z) \mid \text{Student } x \text{ has ID number } y \text{ and is enrolled in class } z \} \] is a 3-ary relation, also called a ternary relation, on \(A,\) \(B,\) and \(C.\)

  • Let \(S\) be the set of names of students currently enrolled at a college. The set \[ R = \{ (x, y) \mid \text{Student } x \text{ is enrolled in a class section in which student } y \text{ also is enrolled} \} \] is a 2-ary relation, usually called a binary relation on \(A\) (since the two sets in \(A \times A\) are the same set.) Most of the focus of this chapter will be on binary relations.

  • Let \(f : A \rightarrow B\) be any function, as defined formally in the Functions chapter. Recall that part of the formal definition states that \(f\) is a subset of \(A \times B,\) so the set \(f\) is a binary relation on the domains \(A\) and \(B.\)

Notice that for any sets \(A_{1}, \, A_{2}, \, \ldots \, A_{n},\) where \(n \geq 2\) is a natural number, we always have the following two relations:

  • \(\emptyset,\) called the empty relation. This relation is also called the void relation and trivial relation in other sources.

  • \(A_{1} \times A_{2} \times \cdots \times A_{n},\) called the universal relation.

9.2. Binary Relations on a Single Set

In many cases, the two domains of a binary relation are the same set; for example, the relation may involve comparing two elements of a set S in some way. In this case, the domain \(S\) is mentioned only once: A binary relation on set S is any subset \(R\) of the cartesian product \(S \times S\), that is, \(R \subseteq S \times S.\)

In the case of a binary relation on \(S,\) we often write \(aRb\) to mean the same thing as \((a,b) \in R.\) This notation may make more sense after reading the following example.

Example 2 - Binary Relations on a Set

Here are some examples of binary relations on one set.

  • \(R = \{ (x, y) \in \mathbb{R} \times \mathbb{R} \, | \, x \leq y \}\). \(R\) is a binary relation on \(\mathbb{R}.\) We write \(x \leq y\) instead of \((x,y) \in R\) (Notice that in this case, we are writing \(xRy\) but replacing the \(R\) by the symbol \(\leq.\))

  • \(D = \{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, | \, a \text{ is a divisor of } b \}\). \(D\) is a binary relation on \(\mathbb{Z}.\) In this case, we often write \(a | b\) instead of \((a,b) \in D.\) You may not be surprised to learn that this relation is called the divisibility relation on the integers.

  • \(M = \{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, | \, 2 \text{ is a divisor of } (a-b) \}\). \(M\) is a binary relation on \(\mathbb{Z},\) called congruence modulo 2. This textbook uses the common but nonstandard notation \(a \equiv_{2} b\) instead of \((a,b) \in M.\)
    Note that the ISO standard notation for this relation is \(a \equiv b \ \text{mod } 2\) and that other sources use \(a \equiv b \ (\text{mod } 2).\)

  • For a set \(S,\) recall that \(\mathcal{P}(S)\) is the power set of \(S,\) that is, the set whose elements are all possible subsets of \(S.\) Let \(R = \{ (A, B) \in \mathcal{P}(S) \times \mathcal{P}(S) \, | \, A \subseteq B \}.\) \(R\) is a binary relation on \(\mathcal{P}(S),\) and we write \(A \subseteq B\) instead of \((A,B) \in R.\)

  • For a set \(S,\) we can also define a different relation \(R = \{ (A, B) \in \mathcal{P}(S) \times \mathcal{P}(S) \, | \, A \subset B \},\) that is, \(A\) is a proper subset of \(B.\) This relation is also a binary relation on \(\mathcal{P}(S),\) and we write \(A \subset B\) instead of \((A,B) \in R.\)

  • Let \(S\) be any nonempty set. The set \(\mathbf{id}_S = \{ (x, x) \in \mathbb{R} \times \mathbb{R} \, | \, x \in S \}\) is a binary relation on \(S\) which we refer to as the identity relation on \(S\) (Other sources call this the diagonal of \(S,\) or simply the equality relation on \(S.\)) We write \(a=b\) instead of \((a,b) \in \mathbf{id}_S.\)

  • \(L = \{ \text{("rock", "paper"), ("paper", "scissors"), ("scissors", "rock")} \}\) is a binary relation on the set \(\{ \text{"rock", "paper", "scissors"} \}.\) We write \(xLy\) (that is, "\(x\) loses to \(y\)") instead of \((x,y) \in L.\)

9.2.1. Operations on Binary Relations on a Set

Given binary relations \(Q\) and \(R\) on a set \(S,\) we can define several other relations in terms of \(Q\) and \(R.\) These operations are likely familiar to you as operations on functions but they also work for binary relations on a single set.

  • The inverse of \(R\) is the relation \(R^{-1} = \{ (b,a) \, | \, (a,b) \in R \}.\)

  • The composition of Q and R is \(R \circ Q = \{ (a,c) \, | \, (a,b) \in Q \land (b,c) \in R \}.\)

  • The n th power of \(R\) is defined recursively for all \(n \in \mathbb{N}\) as follows.

    • \(R^{0} = \mathbf{id}_S\)

    • \(R^{k+1} = R \circ R^{k}\) for natural numbers \(k > 0.\)
      The recursion step uses \(k\) instead of \(n\) in preparation for the type of arguments used in the chapter on proof by mathematical induction.

Building on the \(n\)th powers of \(R,\) we can define two relations.

  • \(R^{+}\) is the relation \(\{ (a,b) \in S \times S \, | \, (a,b) \in R^{k} \text{ for some positive integer } k \}.\) That is, \(R^{+}\) is the union of all the positive \(n\)th powers of \(R.\)

  • \(R^{*}\) is the relation \(\{ (a,b) \in S \times S \, | \, (a,b) \in R^{k} \text{ for some natural number } k \}.\) That is, \(R^{*}\) is the union of all the natural number \(n\)th powers of \(R.\)

Notice that \(R^{*} = \mathbf{id}_S \cup R^{+}.\)

9.2.2. Properties of Binary Relations on a Set

In this subsection we define five properties that a relation may satisfy.

Definitions

Let \(R\) be a binary relation on the set \(S.\)

  • \(R\) is reflexive if and only if for all \(a \in S,\) \((a, a) \in R.\)

  • \(R\) is irreflexive if and only if for all \(a \in S,\) \((a, a) \not\in R.\)

  • \(R\) is symmetric if and only if for all \(a \in S\) and \(b \in S,\) \((a, b) \in R \rightarrow (b,a) \in R.\)

  • \(R\) is antisymmetric if and only if for all \(a \in S\) and \(b \in S,\) \((a, b) \in R \land (b, a) \in R \rightarrow a = b.\)
    Equivalently, \(R\) is antisymmetric if and only if for all \(a \in S\) and \(b \in S,\) \((a, b) \in R \land a \neq b \rightarrow (b,a) \not\in R.\)

  • \(R\) is transitive if and only if for all \(a \in S,\) \(b \in S,\) and \(c \in S,\) \((a, b) \in R \land (b, c) \in R \rightarrow (a,c) \in R.\)

The following theorem can make it easier to determine when a relationship has each of the five properties. The proof of the theorem is an exercise.

Theorem

Let \(R\) be a binary relation on the set \(S.\)

  • \(R\) is reflexive if and only if \(\mathbf{id}_S \subseteq R.\)

  • \(R\) is irreflexive if and only if \(\mathbf{id}_S \cap R = \emptyset.\)

  • \(R\) is symmetric if and only if \(R^{-1} = R.\)

  • \(R\) is antisymmemtric if and only if \(R^{-1} \cap R \subseteq \mathbf{id}_S.\)

  • \(R\) is transitive if and only if \(R^{2} \subseteq R.\)
    Recall that \(R^{2}\) is defined to be the composition \(R \circ R.\)

9.2.3. Closures of Binary Relations with Respect to a Property

For each of the properties reflexivity, symmetry, and transitivity, we define the closure with respect to the property of a relation \(R\) as follows: The closure is the smallest relation that has the property and includes all the elements of \(R.\) That is, you start with \(R\) and try to insert in just enough ordered pairs, if any are needed, to make sure that the new relation has the desired property.

The following theorem justifies that the reflexive closure, symmetric closure, and transitive closure exist for any relation \(R.\) The proof of the theorem is an exercise.

Theorem

Let \(R\) be a binary relation on the set \(S.\)

  • The reflexive closure of \(R\) is the relation \(R \cup \mathbf{id}_S.\)

  • The symmetric closure of \(R\) is the relation \(R \cup R^{-1}.\)

  • The transitive closure of \(R\) is the relation \(R^{+}.\)

Notice that we can also define the reflexive and transitive closure of a relation \(R\) as the relation \(R^{*},\) which is the reflexive closure of the transitive closure of \(R.\)

However, for some properties, the closure of a relation \(R\) with respect to the property may not exist!

Informal Exercise

The irreflexive closure and antisymmetric closure only exist if \(R\) satisfies certain conditions.

  1. Find a description of the relations \(R\) that do have an irreflexive closure.

  2. Find a description of the relations \(R\) that do have an antisymmetric closure.

Hint
Use the theorem from the previous subsection that describes irreflexive relations and antisymmetric relations in terms of intersections of sets.

9.3. Equivalence Relations

A binary relation \(R\) on a set \(S\) is called an equivalence relation on \(S\) if \(R\) is reflexive, symmetric, and transitive.

A first example of an equivalence relation is the diagonal, that is, the equality relation. Another example is given below.

Example 3 - The Parity Relation on the Integers

Consider the set \(R = \{ (a,b) \in \mathbb{Z} \times \mathbb{Z} \, | \, \text{Both } a \text{ and } b \text{ are odd, or both } a \text{ and } b \text{are even.} \}.\)

Let’s show that \(R\) is an equivalence relation.

  • \(R\) is reflexive, since \(aRa\) for every \(a \in \mathbb{Z}.\) That is, both \(a\) is odd and \(a\) is odd, or both \(a\) is even and \(a\) is even (since \(p \land p \leftrightarrow p\) is a tautology for any propositional variable \(p.\))

  • \(R\) is symmetric, since \(aRb\) implies \(bRa\) for every pair \(a, b \in \mathbb{Z}.\) That is, both \(a\) and \(b\) are odd whenever both \(b\) and \(a\) are odd, and both \(a\) and \(b\) are even whenever both \(b\) and \(a\) are even (since \(p \land q \leftrightarrow q \land p\) is a tautology for any propositional variables \(p\) and \(q.\))

  • \(R\) is transitive, since \(aRb\) and \(bRc\) implies \(aRc\) for every triple \(a, b, c \in \mathbb{Z}.\) That is, if both \(a\) and \(b\) are odd and both \(b\) and \(c\) are odd, then both \(a\) and \(c\) are odd, and if both \(a\) and \(b\) are even and both \(b\) and \(c\) are even, then both \(a\) and \(c\) are even (since \((p \land q) \land (q \land r) \rightarrow (p \land r)\) is a tautology for any propositional variables \(p,\) \(q,\) and \(r.\))

It is not difficult to see that this relation can also be defined as \(R = \{ (a,b) \in \mathbb{Z} \times \mathbb{Z} \, | \, 2 \text{ is a divisor of } (a-b) \}.\) So this relation is the same as the \(\equiv_{2}\) relation discussed in an earlier example.

Given an equivalence relation \(R\) on the set \(S\) and an element \(x \in S,\) we define the equivalence class of \(x\) to be \([ x ]_{R} = \{ y \in S \, | \, (x,y) \in R \}.\)

Theorem

Let \(R\) be a binary relation on the set \(S.\)

If \(R\) is an equivalence relation then the set of all equivalence classes \(\{ [ x ]_{R} \, | \, x \in S \}\) is a partition of S.

Conversely, if \(\Pi\) is a partition of \(S\), then the relation defined by \(R = \{ (x, y) \, | \, x \text{ and } y \text{ are elements of the same subset in } \Pi \}\) is an equivalence relation.

9.4. Order Relations on a Set

It is often useful to be able to compare elements of a set, based on some key property. In this subsection, several examples of such order relations will be discussed.

9.4.1. Partial Orderings

A binary relation \(R\) on a set \(S\) is called a partial order on \(S\) if \(R\) is reflexive, antisymmetric, and transitive.

Example 4 - Partial Orders
  • For any set \(S,\) the relations \(\subseteq\) is a partial order.

Total Orderings

A total ordering of a set \(S\) is a partial order \(R\) on \(S\) that has the additional property \((\forall x \in S)(\forall y \in S)(xRy \lor yRx).\)

Example 5 - Total Orderings
  • For the set of real numbers \(\mathbb{R}\) the usual order relations \(\leq\) and \(\geq\) are total orders.

Well-Orderings

A well-ordering of a set \(S\) is a total ordering \(R\) on \(S\) that has the additional property that every nonempty subset of \(S\) contains a least element with respect to the order relation.

Axiom

The relation \(\leq\) on the set \(\mathbb{N}\) of natural numbers is a well-ordering.

Notice that the above statement is not a theorem…​ it is an axiom that we assume to be true about the natural numbers!

9.5. Modular Arithmetic

For any positive integer \(m,\) you can define congruence modulo \(m\) as the relation \[\{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, : \, m \text{ divides } (a-b) \}.\] The symbol \(\equiv_{m}\) is used to represent this relation, that is \[a \equiv_{m} b \text{ if and only if } m \text{ divides } (a-b)\]

For each positive integer \(m,\) \(\equiv_{m}\) is an equivalence relation.

For any integer \(a,\) you can use the division algorithm to find the quotient and remainder such that \(a = q \cdot m + r\), where \(q\) and \(r\) are integers and \(0 \leq r < m,\) so every integer is congruent modulo \(m\) to one of the integers in the set \(\{ 0, 1, \ldots, m-1 \}.\) The set of equivalence classes \(\{ [ 0 ]_{\equiv_{m}}, \, [ 1 ]_{\equiv_{m}}, \, [ 2 ]_{\equiv_{m}}, \, \ldots \, [ m-1 ]_{\equiv_{m}} \}\) is the partition of \(\mathbb{Z}\) that corresponds to to the relation \(\equiv_{m}.\)

Example 6 - Arithmetic with Even and Odd Numbers

You likely learned, when you were quite young, that some integers are called "even" and other integers are called "odd."

Notice that every integer is either even or odd but not both, which means that the set \[\{ \text{the set of all even integers}, \, \text{the set of all odd integers} \}\] forms a partition of the set \(\mathbb{Z}\) of integers. This partition corresponds to the relation \(\equiv_{2},\) that is, congruence modulo 2: The set of all even numbers is the equivalence class \([ 0 ]_{\equiv_{2}}\) and the set of all odd numbers is the equivalence class \([ 1 ]_{\equiv_{2}}.\)

You may have learned how to do arithmetic with "even" and "odd," too, as shown in the tables.

EvenAndOdd

When you did arithmetic with "even" and "odd," you were really doing arithmetic with the equivalence classes \([ 0 ]_{\equiv_{2}}\) and \([ 1 ]_{\equiv_{2}}:\) For example, it will not matter which two odd numbers you add, the result must be even because the two numbers were odd. You can just do the operations on the remainders that you get after dividing by 2.

The following theorem proves that you can do addition and multiplication with the remainders (or, what amounts to the same thing, the equivalence classes) in the same way as was done with Evens and Odds in the previous example for the relation \(\equiv_{m},\) where \(m\) can be any integer greater than 1.

Theorem

If \(m\) is an integer greater than 1, and \(a,\) \(b,\) \(c,\) and \(d\) are integers, and \(a \equiv_{m} b\) and \(c \equiv_{m} d,\)
then \(a + c \equiv_{m} b + d\) and \(a \cdot c \equiv_{m} b \cdot d.\)

Proof

Assume that \(m,\) \(a,\) \(b,\) \(c,\) and \(d\) are integers, and \(m > 1.\) This means that \(m\) is a divisor of both \((a-b)\) and \((c-d),\) that is, both \((a-b)\) and \((c-d)\) are multiples of \(m.\)

The sum \((a-b) + (c-d)\) must also be a multiple of \(m,\) and this sum can be rewritten using properties of addition as \((a+c) - (b+d).\) This shows that \(m\) is a divisor of \((a+c) - (b+d,)\) which can also be stated as \[(a+c) \equiv_{m} (b+d).\]

The expressions \((a-b) \cdot c\) and \(b \cdot (c-d)\) must be multiples of \(m,\) and the sum of those expressions is \((a-b) \cdot c + b \cdot (c-d),\) which can be simplified using properties of multiplication and addition to \((a \cdot c) - (b \cdot d).\) This shows that \(m\) is a divisor of \((a \cdot c) - (b \cdot d),\) which can also be stated as \[(a \cdot c) \equiv_{m} (b \cdot d).\]

Q.E.D.

For example, we can write \(9 + 5 \equiv_{12} 2\) (This is an example of "clock arithmetic" using a 12-hour clock: 5 hours after 9 o’clock will be 2 o’clock.)

Example 7 - Solving a linear congruence

If it is 12 o’clock now, what is the least number of 7-hour intervals that must pass before the clock will read 4 o’clock?

This question is equivalent to finding the smallest natural number \(n\) that solves the linear congruence \(7n \equiv_{12} 4.\)

One way to solve this congruence is to treat this as a "clock arithmetic" problem:
After one 7-hour interval passes, the clock will read 7 o-clock.
After two 7-hour intervals pass, the clock will read 2 o-clock, because \(7+7=14\) and \(14 \equiv_{12} 2.\)
After three 7-hour intervals pass, the clock will read 9 o-clock, because \(2+7 = 9\).
After four 7-hour intervals pass, the clock will read 4 o-clock, because \(9+7 = 16\) and \(16 \equiv_{12} 4.\).
So the least number of 7-hour intervals that must pass before the clock will read 4 o’clock is four.

Another way to solve this congruence is to consider the remainders of natural number-multiples of 7 after dividing by 12:
\(1 \cdot 7 = 7\) and \(7 \equiv_{12} 7\)
\(2 \cdot 7 = 14\) and \(14 \equiv_{12} 2\) since \(14 = 1 \cdot 12 + 2\)
\(3 \cdot 7 = 21\) and \(21 \equiv_{12} 9\) since \(21 = 1 \cdot 12 + 9\)
\(4 \cdot 7 = 28\) and \(28 \equiv_{12} 4\) since \(28 = 2 \cdot 12 + 4\)
Since \(4 \cdot 7 \equiv_{12} 4,\) four 7-hour intervals must pass before the clock will read 4 o’clock.

You try
  • Find the smallest natural number \(n\) such that \(3n \equiv_{11} 5.\)

  • Explain why there is no natural number \(n\) that solves the congruence \(4n \equiv_{10} 5.\)

MORE TO COME!

10. Counting: Permutations and Combinations

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 17, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Many problems can be solved by counting the number of possible outcomes when choosing \(r\) elements from a set that contains \(n\) elements, with no repetition allowed (that is, each element can only be counted once). To be clear, this assumes that \(r\) and \(n\) are natural numbers and \(r \leq n.\)

Some of these problems involve choosing an ordered sequence of \(r\) elements while other problems involve choosing a subset of \(r\) elements.

This chapter presents techniques for doing this type of counting. These techniques are built on the ones you studied in the Counting: Arithmetic Techniques chapter.

Key terms and concepts covered in this chapter:

  • Permutations and combinations

    • Basic definitions

    • Pascal’s identity

    • The binomial theorem

  • binomials

10.1. The Factorial Of A Number

The factorial is a function defined for all natural numbers as described below.

Definition - The Factorial

Given a natural number \(n,\) the function \(n!\) is defined recursively as \[0! = 1\] \[ n! = n \cdot (n-1)! \]

That is, \[n! = n(n-1)(n-2) \cdots (2)(1)(1).\] This function is called the factorial of \(n\) and also "\(n-\)factorial."

10.2. Permutations

A permutation of a set of elements is an ordered arrangement of the elements without repetition of elements. A permutation may involve every element of the set, or only some of the elements of the set. That is, a permutation is a sequence of elements from the set, where no element can appear more than once in the sequence.
Note: If you studied outside the USA, you may have learned that a permutation must involve every element of the set, and that the term variation is used when the arrangment involves only some (but not all) of the elements of the set. In this textbook, all of these are called permutations.

Consider the last four uppercase English letters \(W, X, Y, Z\).

  • \(WXYZ, XWYZ, WXZY\) are permutations of the letters taken four at a time

  • \(XZW, WXY, WXZ\) are permutations of the letters taken three at a time

  • \(WX, WY, ZX\) are permutations of the letters taken two at a time

Note: To be more formal, the sequences above could be written as tuples of the form \((W, X, Y, Z),\) \((X, W, Y, Z),\) \((W, X, Z, Y),\) and so on, but the notation used is simpler without the extra symbolic clutter of parentheses and commas.

Notice that no letter is repeated in a permutation.

How can you count all the possible permutations of a set? For small sets like the set of the last four uppercase English letters, you could list all the possible permutations of \(W, X, Y, Z\) to determine that there are 24 permutations of the letters taken four at a time, 24 permutations of the letters taken three at a time, and 6 permutations of the letters taken three at a time. But what if you wanted to find all the possible permutations of all twenty six uppercase English letters \(A, B, \ldots , W, X, Y, Z\)? It would be very time consuming to write out all possible permutations of, say, the twenty six uppercase English letters taken twenty one at a time, let alone all the other possible permutations. We will develop a technique for doing this kind of counting now.

Definition

Given a set of \(n\) elements, an ordered arrangment of \(r \le n\) of the elements is called an \(r\)-permutation or a permutation of \(n\) elements taken \(r\) at a time.

The notation \(P(n,r)\) represents the number of permutations of \(n\) elements taken \(r\) at a time. Note that \(_nP_r\) is another commonly-used notation for this count.

Suppose you have a set that contains \(n\) elements and want to construct a \(2\)-permutation of the elements. There are \(n\) possible choices for the first element, and once that element is chosen, there are \(n-1\) possible choices for the second element. The product rule lets you conclude that there are \(n(n-1)\) ways to choose the two elements, taking into account that the order of the choices matters.

Now we can generalize this argument, informally: Suppose you have a set that contains \(n\) elements and want to construct an \(r\)-permutation, where \(r \le n.\) There are \(n\) possible choices for the first element, \(n-1\) possible choices for the second element, and so on, until we have \(n-(r-1)\) possible choices for the \(r\text{th}\) element. Apply the product rule repeatedly to conclude that there are \( n(n-1)\cdots (n-r+1)\) \(r\)-permutations of the \(n\) elements.

The process in the previous paragraph can also be viewed recursively. Suppose you have a set that contains \(n\) elements and want to construct an \(r\)-permutation, where \(r \le n.\) There are \(n\) possible choices for the first element, and the product rule let’s us conclude that the number of \(r\)-permutations of the \(n\) elements, \(P(n,r),\) is the product of \(n\) and \(P(n-1,r-1).\) Noticing that \(P(n-r,0) = 1\) lets us draw the same conclusion as in the previous paragraph: There are \(P(n,r) = n \cdot \left((n-1)\cdots (n-r+1) \right)\) permutations of \(n\) elements taken \(r\) at a time.

Theorem

For natural numbers \(r\) and \(n\) with \(r \leq n,\) \[P(n,r) = n(n-1)(n-2) \cdots (n-r+1).\]

Example 8 - Permutations of 5 elements taken 3 at a time

How many lines of output will be printed by the following code?

You could trace through the code using the Next button to answer the question, but it would be tedious…​ try using the formula for \(r\)-permutations instead…​ and remember to count the final print statement after the loop, too.

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 9 - Counting Permutations

The code below calculates the number of permutations given \(n\) and \(r\). Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

Question
How many permutations are there of the twenty six uppercase English letters taken twenty one at a time?
Hint
Edit the code so that it will compute ProdCount(26,21), a relatively "small" natural number…​ "small" meaning that it’s decimal expansion has only 25 digts! 😎

Here is another way to think about counting the number of permutations of \(n\) elements taken \(r\) at a time: Imagine writing down all possible permutations of \(n\) elements taken \(n\) at a time, that is all possible ordered arrangements of all \(n\) elements. Notice that if we only care about the first \(r\) elements in the list, then there are \(n-r\) elements at the end of the list that we can rearrange in any of \((n-r)!\) ways without changing the order of the the first \(r\) elements. Now apply the division rule from the Counting: Arithmetic Techniques chapter: We have a procedure, ordering all \(n\) elements, that can be completed in \(n!\) possible ways, but for each way of completing this procedure there are \((n-r)!\) possible ways with the same outcome for ordering the first \(r\) elements. The division rule lets you conclude that there are \(\frac{n!}{(n-r)!}\) ways to order the first \(r\) elements.

Theorem

For natural numbers \(r\) and \(n\) with \(r \leq n,\) \[P(n,r) = \displaystyle \frac{n!}{(n-r)!}\]

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 10

If there are 10 runners in a race, how many different ways can the gold, silver, and bronze medals be awarded?

Solution

There are 10 elements (the runners) and we are choosing 3 to win medals. So the number of ways they can be awarded is

\(P(10,3) = \displaystyle \frac{10!}{(10-3)!} = \frac{10!}{7!} = 10 \times 9 \times 8 = 720\)

Alternatively, we could have used the product rule and noticed that there are 10 ways to award the gold, then there are 9 ways to award the silver, then 8 ways to award the brozne.

You Try

From a group of 100 workers in a union, how many ways can there be a union President, Vice President, and Treasurer?

Example 11

How many permutations of the digits 0123456789 are there that contain the string 456?

Solution

We can regard this as a permutation of 8 elements: the string "456" and the other 7 individual digits. These 8 elements can occur in any order, so the number of permutations is

\(P(8,8) = 8! = 40320.\)

Challenge
How many different six-letter words (including nonsense words), can be formed using the letters in "HEROIC" where the vowels must appear together and no letters are repeated?
Hint
You can regard this a permutation of the 3 consonants and a string of 3 consecutive vowels…​ but notice that the order of the vowels matters.
Example 12 - Python functions for Factorial and Counting Permutations

This code sample illustrates how to use functions in Python’s math module to compute \(n!\) and \(_nP_r.\)

10.3. Combinations

A selection of elements from a set of \(n\) elements where order of selection does not matter is called a combination. Notice that each combination corresponds to a subset of the set of \(n\) elements.

Consider the letters \(W, X, Y, Z\) and choose three at a time where the order does not matter. In this case we are selecting a subset of size three instead of a sequence of length three. The sets \(\{ W, X, Y \}\) and \(\{ X, Y, W \}\) are just two ways of describing the same set since the order in which elements of a set are listed does not matter.

  • There are four possible ways to choose three letters without regard to order: \(\{ W, X, Y \}\), \(\{ W, X, Z \}\), \(\{ W, Y, Z \}\), and \(\{ X, Y, Z \}\).

  • There are six possible ways to choose three letters without regard to order: \(\{ W, X \}\), \(\{ W, Y \}\), \(\{ W, Z \}\), \(\{ X, Y \}\), \(\{ X, Z \}\), and \(\{ Y, Z \}\).

  • There are four possible ways to choose one letter (without regard to order): \(\{ Z \}\), \(\{ Y \}\), \(\{ X \}\), and \(\{ W \}\). (Keep reading to find out why this "reversed" ordering of the subsets was used.)

This shows that there are 4 combinations of 4 elements taken 3 at a time, 6 combinations of 4 elements taken 2 at a time, and 4 combinations of 4 elements taken 1 at a time.

Definition

Given a set of \(n\) elements, an unordered selection of \(r \le n\) of the elements is called an \(r\)-combination or a combination of \(n\) elements taken \(r\) at a time.

The notation \(C(n,r)\) represents the number of combinations of \(n\) elements taken \(r\) at a time. Note that \(_nC_r\) is another commonly-used notation for this count, as is the binomial coefficient \(n\choose r\). Any of these notations can read as "\(n\) choose \(r.\)"

Next, notice that every \(r\)-combination corresponds to \(P(r, r) = r!\) different \(r\)-permutations. That is, to select a \(r\)-permutation we could instead first select a \(r\)-combination without regard to order and then order the \(r\) elements.

As an example, suppose we have \(n\) elements and want to construct a \(3\)-permutation. We could instead first choose a \(3\)-combination of the elements. There are \(C(n,3)\) possible \(3\)-combinations. Once we have chosen a specific \(3\)-combination, we can reorder the 3 elements in \(P(3,3) = 3!\) ways. This argument shows that \(P(n,3) = C(n,3) \cdot P(3,3) = C(n,3) \cdot 3!\), so \(C(n,3) = \displaystyle \frac{P(n,3)}{3!}\).

We can generalize this argument for each natural number \(r \leq n\) to arrive at the next theoren.

Theorem

\(C(n,r) = \displaystyle \frac{P(n,r)}{r!} = \frac{n!}{r!(n-r)!}\)

Example 13 - Combinations of 5 elements taken 3 at a time

The code below calculates the number of and lists combinations given \(n\) and \(r\).

How many lines of output will be printed by the following code? Remember to count the final print statement after the loop, too.

You try

Edit the code to list and count combinations of 5 choose 4.

Video Example

Video Example

Example 14

How many ways can five cards be dealt from a standard 52-card deck?

Solution

We are choosing 5 cards from 52 cards and the order does not matter, so \(C(52,5)=\displaystyle \frac{52!}{5!47!}\)

Example 15

How many bit strings of length \(n\) contain exactly \(r\) 0s?

Solution

Choosing the positions of the \(r\) 0s corresponds to the \(r\)-combinations of the set \(\{1, 2, 3, \dots, n\}\). Thus there are exactly \(C(n,r)\) such bit strings.

10.3.1. Properties Of Combinations

In this subsection you will learn about some properties of \(C(n, r)\) and a famous "number triangle;" you will see that the values listed in the triangle coincide with the values of \(C(n,r).\)

Pascal’s Triangle

Consider the following number triangle. \[{\displaystyle {\begin{array}{c}1\\1\quad 1\\1\quad 2\quad 1\\1\quad 3\quad 3\quad 1\\1\quad 4\quad 6\quad 4\quad 1\\1\quad 5\quad 10\quad 10\quad 5\quad 1\\1\quad 6\quad 15\quad 20\quad 15\quad 6\quad 1\\1\quad 7\quad 21\quad 35\quad 35\quad 21\quad 7\quad 1\end{array}}}\]

We refer to the top row of this triangle as "row 0," and the left side of the triangle as "column 0" (so the columns are actually drawn diagonally.) Notice that each number that is in row 2 or lower (and not on one of the sides of the triangle) is the sum of the two numbers directly above it in the triangle. For example, in row 5, column 2 row, the number 10 is the sum of the numbers in row 4, column 1 (the number 4) and row 4, column 2 (the number 6.)

This number triangle is often called "Pascal’s Triangle" after the French mathematician Blaise Pascal who wrote about the triangle in the mid-1600’s A.D.. However, the number triangle was known for centuries before Pascal lived. You may also want to see the "History" section of this wikipedia pagefor additional information.

RECOMMENDATION: The "Binomial" activity can replace the rest of this subsection.

The Numbers in the Triangle Are The Values Of \(C(n,r)\)

Notice that \(C(0,0),\) \(C(1,0),\) and \(C(1,1)\) are all equal to 1. These numbers match the values in row 0 and row 1 of the triangle.

Now, consider an alternative way we can compute \(C(n+1,r).\) Imagine we have a set containing \(n+1\) elements, where one of the elements is "special" - in how many ways can we choose \(r\) of the elements? There are two cases: We can choose the "special" element and then choose \(r-1\) other elements from the remaining \(n\) elements, or we can ignore the "special" element and choose \(r\) elements from the remaining \(n\) elements.

As a specific example, suppose we want to choose 2 elements form the set of letters \(\{ J, K, L, M, N\}\). We could treat \(J\) as a special element. In the first case, we would always choose \(J\) and then choose 1 of the 4 remaining letters; there are \(C(4,1)\) ways to do this. In the second case, we never choose \(J\), and must choose two other letters; there are \(C(4,2)\) ways to do this. In all, there are \(C(5,2)\) ways to choose 2 letters from the set, and the sum rule from the Counting: Arithmetic Techniques chapter can be applied to show that \(C(5,2)\) must be equal to \(C(4,1) + C(4,2).\)

In the general case of combinations of \(n+1\) elements taken \(r\) at a time, we have the following theorem.

Theorem - Pascal’s identity

For natural numbers \(n\) and \(r\) such that \(r \leq n,\) \[C(n+1,r) = C(n, r-1) + C(n,r)\]

This theorem shows that the numbers in each row of the triangle are the same numbers we can compute as \(C(n,r).\) That is, since row 0 and row 1 contain the values for \(C(0,0),\) \(C(1,0),\) and \(C(1,1),\) row 2 must contain the values for \(C(2,0),\) \(C(2,1),\) and \(C(2,2),\) and row 3 must contain the values for \(C(3,0),\) \(C(3,1),\) \(C(3,2),\) and \(C(3,3).\) We can continue this pattern for all rows of the triangle. This is an informal proof that the number triangle is made up of the values of \(C(n,r)\) for all natural numbers \(r\) and \(n\) with \(r \leq n.\)

\(C(n,r) = C(n,n-r)\)

You may recall from an earlier example that there are \(C(4,3) = 4\) possible ways to choose three letters from the set \(\{ W, X, Y, Z \}\) and that there are \(C(4,1) = 4\) possible ways to choose one letter from the set \(\{ W, X, Y, Z \}.\) The equation \(C(4,3) = C(4,1)\) does not "just happen to be true" but in fact must be true: Each combination of 4 elements taken 3 at a time corresponds to a combination of 4 elements taken 1 at a time, and vice versa - we can choose 3 of the 4 elements by "throwing out" the 1 element we don’t want to keep. That is, we choose the 3-element subset we care about indirectly by instead choosing the 1-element we do not care about. There is a one-to-one correspondence between the subsets that contain 3 letters and the subsets that contain 1 letter: \[\{ W, X, Y \} \text{ corresponds to } \{ Z \}\] \[\{ W, X, Z \} \text{ corresponds to } \{ Y \}\] \[\{ W, Y, Z \} \text{ corresponds to } \{ X \}\] \[\{ X, Y, Z \} \text{ corresponds to } \{ W \}\]

In general, there is always a one-to-one correspondence between the combinations of \(n\) elements taken \(r\) at a time and the combinations of \(n\) elements taken \(n-r\) at a time: Choosing \(r\) elements for a subset corresponds to choosing the \(n-r\) elements to leave out of the subset. This is an informal proof of the following theorem.

Theorem

For natural numbers \(n\) and \(r\) such that \(r \leq n,\) \[C(n,r) = C(n,n-r)\]

Alternatively, this theorem can be proven algebraically using the formula \(C(n,r) = \frac{n!}{r!(n-r)!}.\)

10.4. The Binomial Theorem

An algebraic expression that is the product of a number and power of zero or more variables is called a term. Two terms are called like terms if the two terms have the exact same variables and those variables appear with the exact same exponents. For example, \(a,\) \(ab,\) \(5a,\) and \(3a^{2}\) are terms, and \(a\) and \(5a\) are like terms. Like terms can be added to make a new term, for example, \(a + 5a\) is \(6a\) where we’ve used the distributive property of multiplication over addition to write \[ a + 5a = 1a + 5a = (1+5)a = 6a \]

Now consider the product of two algebraic expressions \(a+b\) and \(x+y\) where the variables represent real numbers. We can rewrite \((a+b)(x+y)\) in an expanded form by using the distributive property of multiplication over addition: \[ (a+b)(x+y) = a(x+y) + b(x+y) = ax + ay + bx + by. \]

Another way to view this multiplication is as follows: You need to sum all the possible products you can form by choosing a first factor from the set \(\{ a, \, b \}\) and a second factor from the set \(\{ x, \, y \}.\) Apply the multiplication rule to compute that there must be \((2)(2) = 4\) possible products, namely, \(ax,\) \(ay,\) \(bx,\) and \(by.\) In this way, we can calculate the same result, \[ (a+b)(x+y) = ax + ay + bx + by. \]

In case both factors are \((a+b)\), we get \[ (a+b)(a+b) = a(a+b) + b(a+b) = aa + ab + ba + bb = a^{2} + 2ab + b^{2}. \]

Notice that the coefficients can be thought of as the number of ways to choose \(b\) for each term during the multiplication: \[ a^{2} + 2ab + b^{2} = C(2,0)a^{2} + C(2,1)ab + C(2,2)b^{2} = {2\choose0}a^{2} + {2\choose1}ab + {2\choose2}b^{2}. \]

Look at the next highest powers:

\begin{equation} \begin{aligned} (a+b) \left( a^{2} + 2ab + b^{2} \right) {} & = a \left( a^{2} + 2ab + b^{2} \right) + b \left( a^{2} + 2ab + b^{2} \right) \\ & = a^{3} + 2a^{2} b + ab^{2} + a^{2} b + 2ab^{2} + b^{3} \\ & = (1)a^{3} + (2+1) a^{2} b + (1+2) ab^{2} +(1) b^{3} \\ & = a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \end{aligned} \end{equation}

\begin{equation} \begin{aligned} (a+b) \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) {} & = a \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) + b \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) \\ & = a^{4} + 3 a^{3} b + 3 a^{2} b^{2} + ab^{3} + a^{3} b + 3 a^{2} b^{2} + 3 ab^{3} + b^{4} \\ & = (1)a^{4} + (3 + 1) a^{3} b + (3 + 3) a^{2} b^{2} + (1 + 3) ab^{3} + (1)b^{4} \\ & = a^{4} + 4 a^{3} b + 6 a^{2} b^{2} + 4 ab^{3} + b^{4} \end{aligned} \end{equation}

Notice that the coefficients in the algebraic expansions above are the same numbers that appear in Pascal’s arithmetic triangle.

We will prove in the Proofs: Mathematical Induction chapter that \[ (a+b)^{n} = \sum\limits_{i=1}^{n} {n\choose i} a^{n-i} b^{i} \] for every natural number \(n.\)

10.5. Exercises

  1. List all the permutations of \(\{1, 2, 3\}\).

  2. How many permutations are there of the set \(\{1, a, 2, b, 3, c, 5\}\)?

  3. Let \(A=\{a, b, c, d\}\)

    1. List all the 3-permutations of \(A\).

    2. List all the 3-combinations of \(A\).

  4. Let \(A=\{a, b, c, d, e\}\)

    1. List all the 2-permutations of \(A\).

    2. List all the 2-combinations of \(A\).

  5. Find the value of the following

    1. \(P(5,2)\)

    2. \(P(10,8)\)

    3. \(P(14,10)\)

    4. \(P(12,8)\)

    5. \(C(5,2)\)

    6. \(C(10,8)\)

    7. \(C(14,10)\)

    8. \(C(12,8)\)

  6. How many bit strings of length 10 contain:

    1. Exactly five 1s?

    2. At most five 1s?

    3. At least four 1s?

    4. The same number of 0s and 1s?

  7. How many permutations of the digits \(12345678\) contain:

    1. The string 284?

    2. The string 3581?

    3. The string 21 and 57?

  8. How many ways are there to choose 9 cards from a standard 52 card deck?

  9. How many ways can you be dealt a pair in a 5 card hand (2 cards of the same rank and 3 cards of a different rank)?

  10. How many ways can you be dealt a full house in a 5 card hand (2 cards of the same rank and 3 cards of the same rank)?

  11. How many license plates consist of 4 letters followed by 3 digits if:

    1. Repetition is allowed?

    2. Repetition is not allowed?

  12. Using \(C(n,r) = \displaystyle \frac{n!}{r!(n-r)!}\), evaluate the terms of this triangular table. Will you need the formula to extend the table to more rows?

\begin{array}{ccccccccccccc} &&&&&&&C(0,0)&&&&&&\\ &&&&&& C(1,0) && C(1,1) &&&&&\\ &&&&& C(2,0) && C(2,1) && C(2,2) &&&&\\ &&&& C(3,0) && C(3,1) && C(3,2) && C(3,3) &&&\\ &&& C(4,0) && C(4,1) && C(4,2) && C(4,3) && C(4,4) &&\\ \end{array}

11. Proofs: Mathematical Induction

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on April 7, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

In this chapter, you will learn how to use the mathematical induction proof technique to create a single proof of infinitely many different but related propositions. This proof technique will also be used to validate algorithms.

Key terms and concepts covered in this chapter:

  • Mathematical induction

    • Weak and strong induction (i.e., First and Second Principle of Induction)

    • examples of mathematical induction

  • Well-ordering and Induction

  • Structural induction

11.1. Why Is Mathematical Induction Needed?

We often encounter an infinite sequence of related propositions, all of which appear to be True. To prove all of the propositions, we might try to write a proof for each individual proposition and then combine all those proofs together as a single infinitely long "proof"…​ but recall from the chapter on Proofs: Basic Techniques that a proof consists of a finite sequence of propositions of finite length. As an analogy, imagine you had an "algorithm" that required infinitely many steps to complete its task…​ such an algorithm would not be useful in the real world since it might never complete the task!

The next example introduces the ideas we will use to prove an infinite sequence of related propositions.

Example 1 - Why Use Mathematical Induction?

Let’s examine a conjecture about a certain type of polygonal number, namely, square numbers.

Square as sum of gnomons

Consider the image that shows 16 colored disks arranged as a square. By starting at the lower left corner of the square and grouping disks of the same color, we can count the total number of disks in the figure as follows. \begin{equation} \begin{aligned} 1 {} & = 1^{2} \\ 1 + 3 {} & = 4 = 2^{2} \\ 1 + 3 + 5 {} & = 9 = 3^{2} \\ 1 + 3 + 5 + 7 {} & = 16 = 4^{2} \\ \end{aligned} \end{equation}

Notice that the sum of the odd integers on the left-hand side of each equation is equal to the square of the number of odd integers on the left-hand side of the equation. This means that the predicate \[P(n)\text{: "The sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] is a True statement for the natural numbers \(n \in \{ 1, 2, 3, 4 \},\) that is, each of the following four propositions is True:

  • \(P(1)\): "The sum of the first \(1\) positive odd integers is equal to \(1^{2}\)." (I know this is not proper English, but please bear with me!)

  • \(P(2)\): "The sum of the first \(2\) positive odd integers is equal to \(2^{2}\)."

  • \(P(3)\): "The sum of the first \(3\) positive odd integers is equal to \(3^{2}\)."

  • \(P(4)\): "The sum of the first \(4\) positive odd integers is equal to \(4^{2}\)."

In the following code snippet, function \(P\) implements the predicate used in this example. Recall that the predicate’s output is a proposition - the output is just a string of symbols and does not indicate whether the proposition is True or False.

Square as sum of gnomons

Next, consider this second image, which shows how you could change the first image to one that can be used to prove that \(P(5)\) is True.
It’s easy to show that the proposition \(P(5)\) is True because you can just write out and verify that \(1 + 3 + 5 + 7 + 9 = 25 = 5^{2}.\) The goal here is to relate the proposition \(P(5)\) to the "preceding" proposition \(P(4),\) in a way similar to what was done for sequences of numbers in a recurrence relation.

Notice that the second image shows that, given the square arrangement of disks that has 4 disks along each side, we can construct a square arrangement that has 5 disks along each side by inserting 4 new disks above the top row, 4 new disks in a column to the right of the rightmost column, and 1 new disk in the upper right corner to completes the square arrangement.

In fact, there is nothing special about the number 4 in the previous paragraph: If \(k\) is any positive natural number and we have a \(k \times k\) square arrangement of disks, we can enlarge it to a \((k+1) \times (k+1)\) square arrangement of disks by inserting \(k\) disks above, \(k\) disks to the right, and \(1\) disk in the upper right corner of the \(k \times k\) square to complete the \((k+1) \times (k+1)\) square. Algebraically, we can account for the total number of disks in the \((k+1) \times (k+1)\) square by writing \[ k^{2} + k + k + 1 = (k+1)^{2} \] which is True for any natural number \(k\) (Just simplify the left-hand side and expand the right-hand side of the equation to see that the equation must be True.)

Based on this second image, we can make a conjecture that \(P(n)\) must be True for every positive natural number \(n.\) We now need to prove this conjecture.

Notice that if we combine the propositions

  1. \(P(1)\) and \(P(2)\) and \(P(3)\) and \(P(4)\)

  2. For all \(k \in \mathbb{N},\) \(P(k)\) implies \(P(k+1).\)

then we can build a proof for all the integers up to and including any value of \(n \in \mathbb{N}\) that we want.

For example, to prove \(P(1,\!000,\!000),\) we could start by asserting that \(P(1)\) is True, then apply the conditional \(P(k) \rightarrow P(k+1)\) along with the rule of inference modus ponens \(999,\!999\) times to prove that \(P(1,\!000,\!000)\) is True. This proof is finite - I never claimed that the proof would be short!

As an analogy, think of repeatedly applying modus ponens to the conditional as using a loop in code. We are just repeating the same argument \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\) over and over again as the value of the variable \(k\) is incremented by 1 at the end of each loop iteration until we reach the value that we want to stop at. In the following code snippet, the user-defined function addTheOdds implements the computations used in this example’s argument. Notice how the Basis Step corresponds to validating the loop initialization and how the Induction Step corresponds to validating the output at the end of each loop iteration (assuming that the values were correct at the start of that loop iteration.) You can change the value of \(n\) in the code to confirm the truth value of \(P(n)\) for any integer you’d like, assuming you have enough time and computing resources.

Since we can now, in principle, build a proof of \(P(n)\) for any value of \(n \in \mathbb{N}\) that we could choose, we apply the rule of inference universal generalization to conclude that \((\forall \in \mathbb{N})P(n).\) That is, we have proven the proposition \[ \text{"For every positive natural number } n \text{, the sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] Another way to look at this is to define \(s(n)\) to be the number of disks in a \(n \times n\) square arrangement of disks. We have shown that \(s(1) = 1\) and that \(s(k+1) = s(k) + 2k + 1\) for every positive natural number value of \(k,\) and have concluded that \(s(n) = n^{2}\) for every positive natural number \(n.\)

We will rewrite the above example more formally (with full algebraic detail) later in the chapter.

11.2. The Principle of Mathematical Induction

A proof by mathematical induction of a predicate \(P(n)\) defined for natural numbers \(n \in \mathbb{N}\) consists of three steps.

Basis Step

Prove the predicate \(P(n)\) is True for some small value of \(n;\) in most but not all cases, you prove either \(P(0)\) or \(P(1)\) (You can also prove \(P(n)\) for other values if it helps you get a feel for what needs to be proven, as was done in "the sum of the first \(n\) consecutive odd natural numbers is the perfect square \(n^{2}\)" in the previous section.)

Induction Step

Prove that the conditional statement \(P(k) \rightarrow P(k+1)\) is True for any integer \(k.\)
In this context, the predicate \(P(k)\) is called the induction hypothesis and is assumed to be True, where \(k\) represents an arbitrary natural number. Using modus ponens allows you to infer that \(P(k+1)\) must also be True based on the assumption that \(P(k)\) is True.

Conclusion Step

Conclude that \(P(n)\) is True for all natural numbers \(n\) that are greater than or equal to the small(est) value used in the Basis Step. This conclusion uses universal generalization as the rule of inference.

The three steps above are referred to as "The Principle of Mathematical Induction" which is often abbreviated as PMI.
Some textbooks and sources do not include the Conclusion Step as part of PMI, but the remixer wanted to stress that this step is needed to complete the proof.

You can compare the first two steps of PMI to the two steps used in a recursive definition as in the Sequences and Recursion chapter. Note that a recursive defintion is used to describe and define a process for constructing objects or a set of objects or a structure, but a proof by mathematical induction is used to justify and validate such a process.

Note that each of the three steps will be a proof of finite length, but will allow us to conclude that \(P(n)\) is true for every natural number \(n\) greater than or equal to some some small natural number \(n_{0} \geq 0\). That is, we can conclude that \((\forall n \geq n_{0}) P(n)\) is a True propostion.

As an analogy, imagine we are building a tower using interlocking toy blocks. How tall can the tower be? The basis step involves placing a foundation on the ground (either a flat surface for \(n = 0\), or a first block for \(n = 1\)), and the induction step justifies that if we have built a tower that has height \(k\) then we can build a tower of height \(k+1\) by placing one more block on the tower. The conclusion step states that we can build a tower that is of any finite height \(n\) (as long as we have \(n\) or more blocks and ignore issues arising from real-world physics!) Note that we never build an infinitely tall tower.
Note: Some textbooks and sources use an "infinite ladder" analogy for mathematical induction, but this is not quite correct. A better analogy is a ladder that can be extended to any finite height you need, but that is always of finite height.

As an example, here is a proof of the Handshake Theorem for graphs.

Example 2 - Proof of the Handshake Theorem

We will prove the following proposition using mathematical induction.

Theorem

If \(G\) is a graph with vertex set \(V\) and edge set \(E,\) where both \(V\) and \(E\) are finite sets, then the sum of the degrees of all vertices in \(V\) is equal to 2 times the number of edges in \(E.\)

Notice first that since this needs to proven for any graph with any finite number of vertices and edges, it is a good candidate for proof by mathematical induction.

Which number should be one we use for induction?

  • We could try using induction on the number of vertices, but notice that we can add an isolated vertex without effecting either the sum of the degrees or the number of edges. This indicates that the number of vertices is not the correct variable to use for a proof by induction.

  • We could try using induction on the sum of the degrees of the vertices, but we’d have to figure out how to add 1 to that sum…​ but notice that, as above, adding a new vertex \(v\) to the graph

    • either leaves the sum of degrees unchanged (if \(v\) is isolated)

    • or changes the sum of degrees by 2 (because either \(v\) is an endpoint of a loop or \(v\) comes along with a new edge that connects to another vertex of the graph.) Since we cannot meaningfully add 1 to the sum of the degrees, this is also not the correct variable to use for induction.

  • We could try using induction on the number of edges. This could work since adding a new edge will increase the sum of degrees of the vertices by 2 (whether the vertices are "new" or "old").

So, to prove the theorem, let \(n\) represent the number of edges in a graph and let \(P(n)\) be the predicate \[P(n)\text{: "The sum of the degrees of the vertices for a graph with } n \text{ edges is equal to } 2 \cdot n \text{."}\] We will prove that \((\forall n \in \mathbb{N})P(n).\)

Basis Step: \(P(0)\) is the proposition "The sum of the degrees of the vertices for a graph with \(0\) edges is equal to \(2 \cdot 0.\)" since a graph with 0 edges is just a collection of isolated vertices, and each of the isolated vertices has degree 0, the sum of the degrees of the vertices must also be 0, which is 2 times the number of edges. This means that \(P(0)\) is True, and the Basis has been established.

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\).

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

For the natural number \(k,\) the predicate \(P(k)\) is the proposition "The sum of the degrees of the vertices for a graph with \(k\) edges is equal to \(2 \cdot k.\)

Suppose that we have a graph with \(k\) edges. We insert one new edge \(e\) into the graph. There are, essentially, two possible cases to consider.

  • \(e\) has two different endpoints, in which case the degree of each endpoint increases by 1 when \(e\) is added to the graph, so the sum of the degrees of the vertices increases by 2, or ,

  • \(e\) is a loop with only one endpoint, in which case the degree of that endpoint increases by 2, so the sum of the degrees of the vertices increases by 2.
    Notice that in either bullet, it does not to matter whether the new edge \(e\) has endpoints that are "new" to the graph or were "old" vertices that were in the graph already.

Notice that any graph with \(k+1\) edges can be built up this way from a graph that had \(k\) edges. So, using the fact that \(2 \cdot k + 2 = 2 \cdot (k+1),\) we have proven that "The sum of the degrees of the vertices for a graph with \(k+1\) edges is equal to \(2 \cdot (k + 1).\) That is, we have proven that \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that \((\forall n \in \mathbb{N})P(n),\) which translates to "For all natural umber \(n,\) the sum of the degrees of the vertices for a graph with \(n\) edges is equal to \(2 \cdot n.\)"

Q.E.D.

11.3. More Example Proofs Using Mathematical Induction

Example 3 - An Algebraic Expression for Positive Odd Integers

Let \(P(n)\) be the predicate \[P(n)\text{: "The } n\text{th positive odd integer is equal to } 2n-1 \text{."}\] We will prove that \((\forall n \in \mathbb{N}_{>0})P(n),\) that is, for all positive integers \(n.\)
Notice that you can find evidence for the conjecture that \(k\text{th}\) positive odd integer is \(2k - 1\) by making a table and finding an algebraic formula that matches the table. See this appendix if you don’t remember how to do this.

Basis Step: \(P(1)\) is the proposition "The \(1\)th positive odd integer is equal to \(2(1)-1.\)" This is True since \(2(1)-1 = 1\), in spite of the poor English, which should use "\(1\)st" instead of "\(1\)th."

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\). That is, we assume that "The \(k\)th positive odd integer is equal to \(2k-1\)" is True for some positive natural number \(k.\)

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

If the \(k\)th positive odd integer is equal to \(2k-1,\) then the \((k+1)\)th positive odd integer is obtained by adding \(2\) to \(2k-1,\) that is the \((k+1)\)th positive odd integer is equal to \((2k-1) + 2,\) which can be rewritten using algebra as \begin{equation} \begin{aligned} (2k-1) + 2 {} & = 2k - 1 + 2 \\ & = 2k + 2 - 1 \\ & = 2(k+1) - 1 \end{aligned} \end{equation}

We have proven that if \(k\)th positive odd integer is equal to \(2k-1\) then the \((k+1)\)th positive odd integer is equal to \(2(k+1)-1.\) That is, we have proven \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all positive integers \(n,\) the \(n\)th positive odd integer is equal to \(2n-1.\)

Q.E.D.

The next example proof is the formal proof that corresponds to the first example of this chapter.

Example 4 - Square Numbers

Let \(P(n)\) be the predicate \[P(n)\text{: "The sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] We will prove that \((\forall n \in \mathbb{N}_{>0})P(n).\)

Basis Step: \(P(1)\) is the proposition "The sum of the first \(1\) positive odd integers is equal to \(1^{2}.\)" If we allow a sum to have only one addend, then \(P(1)\) is True since \(1 = 1^{2}\), so \(P(1)\) can be used as the basis of our induction proof. If we want to be sure that there are least two addends in the sum, we can use \(P(2)\) as an additional basis. \(P(2)\) is the proposition "The sum of the first \(2\) positive odd integers is equal to \(2^{2}\)" which is True because \(1 + 3 = 2^{2}.\)

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\).

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

Based on the formula for the \(k\text{th}\) positive odd integer, we can rewrite the sum of the first \(k\) positive odd integers \(1+3+...+(2k-1)\) using summation notation as \(\sum\limits_{i=1}^{k}(2i-1)\) and rewrite the predicate \(P(k)\) in algebraic form as \[ P(k): \sum\limits_{i=1}^{k}(2i-1) = k^{2}.\] Note that \(P(k)\) is still a proposition - it is stating that a certain equation holds.
A common error is to treat \(P(k),\) when written algebraically, as a function that gives numerical outputs, but this is incorrect. \(P(k)\) is still a predicate that gives propositions as outputs.

We can now prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True using algebra. \begin{equation} \begin{aligned} \sum\limits_{i=1}^{k+1}(2i-1) {} & = \left(\sum\limits_{i=1}^{k}(2i-1)\right) + (2(k+1)-1) \\ & = k^{2} + (2(k+1)-1) \end{aligned} \end{equation} where we have substituted \(k^{2}\) for the sum \(\sum\limits_{i=1}^{k}(2i-1)\) based on the induction hypothesis.

Now we simplify this using algebra and show that it is the same as the right hand side. \begin{equation} \begin{aligned} \sum\limits_{i=1}^{k+1}(2i-1) {} & = k^{2} + (2(k+1)-1) \\ & = k^{2} + 2k + 2 - 1 \\ & = k^{2} + 2k + 1 \\ & = (k + 1)^{2}\\ \end{aligned} \end{equation}

We have proven that the equation \(1+3+...+(2k-1) = \sum\limits_{i=1}^{k}(2i-1) = k^{2}\) implies the equation \(1+3+...+(2k-1) + (2k+1) = \sum\limits_{i=1}^{k+1}(2i-1) = (k+1)^{2}\), that is, \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that \(1+3+...+(2n-1) = \sum\limits_{i=1}^{n}(2i-1) = n^{2}\) for all positive integers \(n\).

Q.E.D.

Example 5 - The Factorial Grows Faster Than An Exponential Function

Let \(P(n)\) be the predicate \[P(n): 2^{n} < n!\] We will prove that \((\forall n \in \mathbb{N}_{\geq 4})P(n).\)

Basis Step: First notice that the propositions \(P(0),\) \(P(1),\) \(P(2),\) and \(P(3)\) are all False! This is why we must use \(P(4)\) as the basis. \(P(4)\) is the proposition \(2^{4} < 4!\) which is a True statement since \(16 < 24.\)

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\). In this context we can assume \(k \geq 4.\)

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

Assume that \(P(k)\) is True with \(k \geq 4,\) that is, \(2^{k} < k!.\) If we multiply both sides of the inequality by \(2\) we get \begin{equation} \begin{aligned} 2^{k+1} {} & = 2 \cdot 2^{k} \\ & < (k+1) \cdot 2^{k} \text{ (since \(k \geq 4,\) we must have \(k+1 > 5 >2\))}\\ & < (k+1) \cdot k! \text{ by the induction hypothesis} \end{aligned} \end{equation}

Notice that the expression on the last line is equal to \((k+1)!,\) so we have shown that \((2^{k} < k!) \rightarrow (2^{k+1} < (k+1)!)\) as long as \(k \geq 4.\) That is, we’ve proven that \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all positive integers \(n \geq 4\), \(2^{n} < n!\)

Q.E.D.

Exercise - Setting Up A Proof By Induction (The Towers Of Hanoi)

Set up the proof of the following statement.

The minimal number of moves needed to solve the Towers Of Hanoi puzzle with \(n\) discs is \(2^{n}-1.\)

What is the predicate \(P(n)\)?

What value of \(n\) is best to use in the Basis Step?

What is the Induction Hypothesis?

What property of the puzzle is used the key feature to proving the Induction Step? You can describe this in words or use an algebraic equation.

MORE PROOFS TO COME!

11.4. Strong Induction

Strong induction is used when it is easier to use the assumption that all the propositions \(P(0), P(1), \ldots , P(n-1)\) are True in order to prove that \(P(n)\) is True.

Basis Step

Prove the predicate \(P(n)\) is True for one or more consecutive small values of \(n;\) in most but not all cases, you prove both \(P(0)\) and \(P(1).\) You can also prove \(P(n)\) for other values as well if it helps you get a feel for what needs to be proven.

Induction Step

Prove that the conditional statement \(( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)\) is True for any natural number \(k.\)
Using modus ponens allows you to infer that \(P(k+1)\) must also be True based on the assumption that all of the propositions \(P(0), P(1), \ldots , P(k)\) are True.

Conclusion Step

Conclude that \(P(n)\) is True for all natural numbers \(n\) that are greater than or equal to the small(est) value used in the Basis Step. This conclusion uses universal generalization as the rule of inference.

In spite of the name, strong induction and "weak" induction are equivalently powerful techniques in the sense that any proposition that you can prove using strong induction can also be proven by "weak" induction, and any proposition that you can prove using "weak" induction can also be proven by strong induction. The choice of which of the two proof techniques to use is based on convenience only, not power.

Example 6 - An Upper Bound for the Fibonacci numbers

Recall that the Fibonacci numbers are defined by the following recurrence relation: \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]

We will prove that the predicate \[P(n): f_{n} < \displaystyle \left( \frac{1+\sqrt{5}}{2} \right)^{n}\] is True for all natural numbers \(n.\) The proof will use strong induction because, for each \(n \geq 2,\) the value of \(f_{n}\) is defined in terms of both \(f_{n-1}\) and \(f_{n-2}.\)
The number on the right-hand side of the inequality is a famous constant called the golden ratio which is usually denoted by the lowercase Greek letter \(\phi\) ("phi"): \[ \phi = \frac{1+\sqrt{5}}{2} \] \(\phi\) is an irrational number whose first few digits are \(1.618 \ldots .\) Also, \(\phi\) is the positive solution of the equation \(x^{2} = x + 1,\) which is a fact that we will use in the proof. The other solution of \(x^{2} = x + 1\) is \(1 - \phi,\) a fact you can use when you attempt the Challenge question at the end of this example. \[ 1 - \phi = \frac{1-\sqrt{5}}{2} \] Notice that you can verify that \(\phi\) and \(1-\phi\) are the two roots of \(x^{2} - x - 1 = 0\) by using the quadratic formula.

Basis Step: In this case, we need to prove that the conjunction \(P(0) \land P(1)\) is True as our basis for strong induction.

  • \(P(0)\) is the proposition \(f_{0} < \phi^{0}\) which is True since \(0 < 1.\)

  • \(P(1)\) is the proposition \(f_{1} < \phi^{1}\) which is True since \(1 < \phi.\)

Since \(P(0)\) and \(P(1)\) are True, you can use the tautology \(q \rightarrow ( r \rightarrow (q \land r) )\) to conclude that \(P(0) \land P(1)\) is True, too.

Induction Step: First, we assume as the induction hypothesis that \[P(i) \text { is True for all positive natural numbers } i \leq k \] where we can assume that the integer \(k\) is greater than or equal to \(2\) (since the cases where \(k < 2\) were already dealt with in the Basis Step.) That is, we assume that the single proposition \(P(0) \land P(1) \land \cdots \land P(k)\) is True, where \(k\) is some integer greater than or equal to 2.

Secondly, we will prove that the conditional \(( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)\) must be True, which means we can use modus ponens to show that \(P(k+1)\) is also True.

If the inequality \(f_{i} < \phi^{i}\) is True for each \(i \leq k,\) then \begin{equation} \begin{aligned} f_{k+1} {} & = f_{k} + f_{k-1} \\ & < \phi^{k} + \phi^{k-1} \text{ by the induction hypothesis} \\ & \leq \phi^{k-1} \cdot (\phi + 1) \text{ by algebra} \\ & \leq \phi^{k-1} \cdot \phi^{2} \text{ since } \phi^{2} = \phi + 1 \\ & \leq \phi^{(k-1)+2} \text{ using one of the laws of exponents} \\ & \leq \phi^{k+1} \end{aligned} \end{equation}

We have proven that if \(f_{i} < \phi^{i}\) is True for each natural number \(i \leq k,\) then \(f_{k+1} < \phi^{k+1}\) must also be True. That is, we have proven \(( ( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all natural numbers \(n,\) \(f_{n} < \phi^{n}.\)

Q.E.D.

Challenge
Use strong induction to prove that the closed form of the Fibonacci numbers is \[ f_{n} = \displaystyle \frac{\phi^{n} - (1-\phi)^{n}}{\sqrt{5}} \text{ for all natural numbers }n. \]
Hint
For the Induction Step, use the fact that both \(\phi^{2} = \phi + 1\) and \((1-\phi)^{2} = (1-\phi) + 1\) are True and use steps similar to the ones in the Induction Step in the proof above but working with equations instead of inequalities. Using the two equations in the previous sentence will help you avoid working directly with expressions like \[f_{n} = \displaystyle \frac{ \left( \displaystyle \frac{1 + \sqrt{5}}{2} \right)^{n} - \left( \displaystyle \frac{1 - \sqrt{5}}{2} \right)^{n}}{\sqrt{5}}\] or \[ f_{n} = \displaystyle \frac{ \left( 1 + \sqrt{5} \right)^{n} - \left( 1 - \sqrt{5} \right)^{n} }{ 2^{n} \sqrt{5} } \] which would be unnecessarily difficult and time-consuming.

Next, we’ll prove the following theorem.

The Fundamental Theorem of Arithmetic

Every positive integer \(n\) that is greater than or equal to \(2\) is either a prime number or can be written as a product of two or more prime numbers.

Proof

Let \(P(n)\) be the predicate \[P(n) \text{: "} n \text{ is either prime or the product of two or more primes."}\] We will prove that \((\forall n \in \mathbb{N}_{\geq 2})P(n),\) that is, each positive integer \(n \geq 2\) is either a prime or a product two or more prime numbers.

Basis Step: \(P(2)\) is True since \(2\) is a prime number. In this case, we could use only \(P(2)\) as the basis, but it is easy to prove \(P(3)\) and \(P(4)\) are True since \(3\) is prime and \(4 = 2 \cdot 2\) is a product of two primes.

Induction Step: First, we assume as the induction hypothesis that \[P(i) \text { is True for all positive natural numbers } i \text{ such that } 2 \leq i \leq k \] where we can assume that the integer \(k\) is greater than or equal to \(4\) (since the cases where \(k \in \{ 2, 3, 4 \}\) were already dealt with in the Basis Step.) That is, we assume that the single proposition \(P(2) \land P(3) \land \cdots \land P(k)\) is True, where \(k\) is some integer greater than or equal to 4.

Secondly, we will prove that the conditional \(( P(2) \land P(3) \land \cdots \land P(k) ) \rightarrow P(k+1)\) must be True, which means we can use modus ponens to show that \(P(k+1)\) is also True.

There are two cases: Either \(k+1\) is prime or it is not prime (that is, it is composite.)

  1. If \(k+1\) is prime, then \(P(k+1)\) is True in the case when \(k+1\) is a prime number.

  2. If \(k+1\) is not prime, then there are two integers \(a\) and \(b\) that are both greater than \(1\) such that \(k+1 = ab.\) Notice that both \(a\) and \(b\) must be less than \(k+1\) because if either one were greater than or equal to \(k+1\) then the product \(ab\) would be greater than or equal \(2(k+1).\) Assuming the induction hypothesis, both \(P(a)\) and \(P(b)\) are True, so each of \(a\) and \(b\) is either a prime or a product of two or more primes, which means that the product \(ab\) is a product of at least two primes (if both \(a\) and \(b\) are primes, they are the only two factors of \(k+1,\) otherwise, there will be more than two prime factors of \(k+1.\)) Since \(k+1 = ab\), this proves that \(P(k+1)\) is True in the case when \(k+1\) is a composite number.

In either case, we have shown that \(P(k+1)\) must be True. We have proven that if \(P(i)\) is True for each natural number \(i\) with \(2 \leq i \leq k,\) then \(P(k+1)\) must also be True. That is, we have proven \(( ( P(2) \land P(3) \land \cdots \land P(k) ) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all natural numbers \(n > 2,\) \(n\) is either a prime number or a product of two or more prime numbers.

Q.E.D.

11.5. Validating An Algorithm Using Induction

In this section, we’ll prove that the Euclidean Algorithm as described below correctly computes the greatest common divisor of two positive integers.

  • Task: Given the positive integers \(a\) and \(b\) with \(a > b\), compute the greatest common divisor (or "g.c.d") of \(a\) and \(b.\) That is, compute the greatest integer that is a factor of both \(a\) and \(b.\)

    • Input: Two positive integers

    • Steps:

      1. Set \(a\) to the greater and \(b\) to the lesser of the two input values.

      2. Compute the remainder \(r\) when \(a\) is divided by \(b\) (using long division of integers, not "floating-point" decimals.)

      3. If \(r > 0\)

        1. Set \(a\) equal to \(b\)

        2. Set \(b\) equal to \(r\)

        3. Go to step 2

      4. Return the value stored in \(b.\)

    • Output: A positive integer that is a factor of both input values.

Example 7 - The Euclidean Algorithm in Python

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

First, we will prove a lemma (that is, a minor theorem) that we’ll use in the induction step of the main proof.

Lemma

If \(a\) and \(b\) are integers such that \(a > b > 0\) and the integers \(q\) and \(r\) satisfy \[ a = q \cdot b + r \text{ and } 0 \leq r < b \] then the set of positive integers that divide both \(a\) and \(b\) is the same as the set of positive integers that divide both \(b\) and \(r.\)

Proof

Notice that \(q\) is the quotient and \(r\) is the remainder that result when doing a long division of \(a\) by \(b.\)

Also notice that we can rewrite the equation \(a = q \cdot b + r\) in the equivalent form \(r = a - q \cdot b.\)

We can now prove the lemma.

  • If the integer \(c\) divides both \(a\) and \(b\) then there are integers \(a'\) and \(b'\) so that \(a = c \cdot a'\) and \(b = c \cdot b'.\) Substitute these two new expressions in the equation \(r = a - q \cdot b\) to get \begin{equation} \begin{aligned} r {} & = a - q \cdot b \\ & = (c \cdot a') - q \cdot (c \cdot b') \\ & = c \cdot a' - c \cdot (q \cdot b') \\ & = c \cdot (a' - q \cdot b') \end{aligned} \end{equation} which shows that \(c\) is also a divisor of \(r.\) Since we already assumed that \(c\) divides \(b,\) this means that \(c\) is a divisor of both \(b\) and \(r.\) Therefore, the set of positive integers that divide both \(a\) and \(b\) is a subset of the set of positive integers that divide both \(b\) and \(r.\)

  • If the integer \(k\) divides both \(b\) and \(r\) then there are integers \(b''\) and \(r''\) so that \(b = k \cdot b''\) and \(r = k \cdot r''.\) Substitute these two new expressions in the equation \(a = q \cdot b + r\) to get \begin{equation} \begin{aligned} a {} & = q \cdot b + r \\ & = q \cdot (k \cdot b'') + (k \cdot r'') \\ & = k \cdot (q \cdot b'') + k \cdot r'' \\ & = k \cdot (q \cdot b'' + r'') \end{aligned} \end{equation} which shows that \(k\) is also a divisor of \(a.\) Since we already assumed that \(k\) divides \(b,\) this means that \(k\) is a divisor of both \(a\) and \(b.\) Therefore, the set of positive integers that divide both \(b\) and \(r\) is a subset of the set of positive integers that divide both \(a\) and \(b.\)

So we have two subsets, each of which is a subset of the other, which means that the two sets must be equal. That is, the set of positive integers that divide both \(a\) and \(b\) is equal to the set of positive integers that divide both \(b\) and \(r;\) more plainly, the two set descriptions define the same set.

Q.E.D.

From the lemma we can conclude that the greatest common divisor of \(a\) and \(b\) is equal to the greatest common divisor of \(b\) and \(r.\)

We are now ready to prove the main result by induction.

Theorem

The Euclidean Algorithm correctly computes the greatest common divisor (g.c.d.) of the two positive integers \(a\) and \(b.\)

Proof

We use mathematical induction on the number of times \(n\) we must compute a new remainder (that is, the number \(n\) of iterations of the code block inside the loop), and prove that the algorithm computes the correct g.c.d. no matter what the value of \(n\) is.

Let \(P(n)\) be the predicate "If the loop executes \(n\) times, then the last nonzero remainder is the g.c.d. of the two initial inputs." We will prove that \((\forall n \in \mathbb{N})P(n)\) is True.

Basis Step: Notice that the number of iterations of the loop is \(n=0\) if and only if the value of \(r = a\%b\) is equal to \(0\), which is True if and only if \(b\) divides \(a.\) This means that the "last nonzero remainder" is \(b,\) and \(b\) is the greatest common divisor of \(a\) and \(b,\) which means that \(P(0)\) is True.

We can also prove that \(P(1)\) is True in case the proof of \(P(0)\) is unsatisfying. In the case when \(n=1,\) the loop executes \(1\) time, which means that \(a = q \cdot b + r\) and \(r\) is a nonzero divisor of \(b,\) so \(r\) is the g.c.d. of \(b\) and \(r,\) and we can use the lemma to conclude that \(r\) is also the g.c.d. of \(a\) and \(b.\) This proves that \(P(1)\) is True.

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\).

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

Assume that \(P(k)\) is True for the positive integer \(k,\) that is, if the loop executes \(k\) times, then the last nonzero remainder is the g.c.d. of the two numbers we started with. We can assume \(k \geq 1\) since the cases when \(k \in \{ 0, 1 \}\) were proved in the Basis Step. Suppose that we have numbers \(a\) and \(b\) such that the loop executes \(k+1\) times in order to reach the last nonzero remainder: We need to prove that this last nonzero remainder is actually the g.c.d. of \(a\) and \(b.\) Now, notice that if we find the first remainder so that \(r = a - q \cdot b\) and \(0 < r < b,\) then the Euclidean Algorithm requires \(k\) loop iterations to find the last nonzero remainder for the pair of inputs \(b\) and \(r.\) That is , we know from the induction hypothesis that the last nonzero remainder for the initial values \(b\) and \(r\) is the g.c.d. of \(b\) and \(r.\) Now apply the lemma to conclude that the g.c.d. of \(a\) and \(b,\) computed after \(k+1\) loop iterations, is equal to the g.c.d. of \(b\) and \(r,\) computed after \(k\) loop iterations. Therefore, \(P(k+1)\) is True, too. This proves that \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all natural numbers \(n,\) if the loop executes \(n\) times then the last nonzero remainder is the g.c.d. of \(a\) and \(b.\) That is, the Euclidean Algorithm correctly computes the g.c.d. of \(a\) and \(b\) no matter how many loop iterations are required to compute the last nonzero remainder.

Q.E.D.

Notice that if \(a = f_{n+2}\) and \(b = f_{n+1}\) are consecutive Fibonacci numbers then the Euclidean Algorithm requires exactly \(n\) loop iterations to compute the last nonzero remainder.
The French mathematician Gabriel Lamé proved in 1844 that if the Euclidean Algorithm requires \(n\) loop iterations to compute the last nonzero remainder for two given positive integer inputs \(a\) and \(b\) (with \(a>b\)) then it must be True that both \(f_{n+2} \leq a\) and \(f_{n+1} \leq b.\) A proof by induction of Lamé’s theorem is given at this Wikipedia page.
The "worst-case complexity" for the Euclidean Algorithm is described at that webpage as well: The Euclidean Algorithm is \(O(log_{\phi}(b)).\) (Complexity and "big \(O\)" notation are discussed in the Rates of Growth of Functions chapter.)

11.6. Exercises

  1. Prove by induction:

    1. For all \(n ≥ 1, \) \(1+2+3+\ldots+n=\displaystyle\sum_{i=1}^{n}i=\displaystyle \frac{n\left(n+1\right)}{2}\)

    2. For all \(n ≥ 1, \) \(1^2+2^2+3^2+\ldots+n^2=\displaystyle\sum_{i=1}^{n}i^2=\frac{1}{6} n (n+1) (2 n+1)\)

    3. For all \(n ≥ 1, \) \(1^3+2^3+3^3+\ldots+n^3=\displaystyle\sum_{k=1}^{n}k^3=\frac{1}{4} n^2 (n+1)^2\)

    4. For all \(n ≥ 1, \) \({23}^n-1\) is divisible by 11.

  2. Prove by induction that \(n^2+n = n(n+1)\) is even for all integers \(n ≥ 1\).

  3. Find an appropriate \(N \in \mathbb{Z}\), and prove by induction that \(n^3 +3n^2\) is even for all \(n ≥ N\).

  4. Find an appropriate \(N \in \mathbb{Z}\), and prove by induction that \(n^3 +2n\) is divisible by 3 for all \(n ≥ N\). (Hint: You may use the result \(n(n+1)\), is even for \(n\), an integer.)

  5. Prove by induction that \(7\) divides \(2^{4n+2} + 3^{2n+1}\) for all nonnegative integers \(n\).

  6. Prove that for any \(n ≥ 1\) and \(x ≥ 0\) that \(\left(1+x\right)^n\geq1+nx\).

  7. For all \(n ≥ 5\), prove that \(n^2 < 2^n\)

  8. Graph \(n!\) and \(2^n\), and then prove by induction that \( 2^n < n!\) for \(n>3\).

  9. Graph \(n^3\) and \(5n+12\), and then use your graph to find an appropriate \(N \in \mathbb{Z}\) to prove by induction that \(5n+12 < n^3\) whenever \(n>N\).

  10. Prove by induction that a set \(A\) with cardinality \(|A|=n\) has \(2^n\) subsets.

  11. Prove by induction that there are \(3^n\) numbers in base 3 (using the digits 0 ,1, 2) made up of \(n\) digits.

  12. Prove by induction that there are \(4^n\) numbers in base 4 (using the digits 0 ,1, 2, 3) made up of \(n\) digits.

  13. State the principle of mathematical induction using a conditional logical statement.

  14. Consider the sequence defined recursively as \[a_1=1,a_2=5, \text{ and } a_n=5a_{n-1}-6a_{n-2}\]

    1. Calculate the first eight terms of the recursive sequence.

    2. Prove by induction that the closed-form formula for the sequence is \(a_{n} = 3^{n} - 2^{n}.\)
      (Hint: You can use the fact that \(2\) and \(3\) are the solutions of the quadratic equation \(x^{2} = 5x - 6.\))

  15. Consider the sequence defined recursively as \[a_1=1 \text{ and } a_n=2a_{n-1}+n\]

    1. Calculate the first eight terms of the recursive sequence

    2. Prove by induction that the recursive sequence is given by the formula \(a_n={4\cdot2}^{n-1}-n-1\).

  16. Recall that the Fibonacci numbers are defined by the following recurrence relation: \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]

    1. Prove by induction that \[ f_0+f_1+f_2+\ldots+f_n= f_{n+2}-1. \]

    2. Prove by induction that \[ f_0^2+f_1^2+f_2^2+ \cdots + f_n^2 =f_n \cdot\ f_{n+1}. \]

12. Rates of Growth of Functions

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on April 14, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

You have seen that some tasks can be completed by more than one algorithm. Two questions to ask are

  1. "How do you choose which algorithm to use?"

  2. "Why is is it important to make such a choice?"

This chapter will discuss tools you can use to help answer these questions. In particular, ways of comparing the rates of growth of functions will allow us to compare how two algorithms perform as the size of their input increase with no upper bound.

Key terms and concepts covered in this chapter:

  • Complexity

  • Big-\(\Theta\) notation

  • Big-\(O\) notation

12.1. Complexity of Algorithms

In order to implement an algorithm, there are issues of the space needed to do the work and the time needed to complete all the steps.

For example, imagine that you are asked to complete a few Algebra homework exercises by hand, and each exercise involves solving linear equations by hand using paper and pencil. Now suppose that "few" means "one hundred" and that each linear equation involves multiple steps to solve like the one below. \[ \text{Exercise 1. Solve for } x \text{: } 40(x+6)-9(3x+5) = 65(x+7)-(7x+132)\] It is not difficult to solve linear equations like this one because it is clear what steps you need to use…​ but it is tedious and will likely require a lot of paper! That is, it will consume a lot of time and space to solve even the first one of these equations, and you’ll have only ninety-nine more to do after that!

In this textbook, the focus will be on time complexity and asymptotics, that is, the comparison of the time needed by the algorithms as the size of the input becomes larger and larger without any bound.

12.2. The Order of a Function and Big Theta Notation

In this section, we will define a relation that describes what it means to say that "two functions grow at the same rate." More precisely, "two functions grow at the same rate, asymptotically, as the input variable grows without an upper bound." We will also introduce big \(\Theta\) notation.
Note: \(\Theta\) is the uppercase Greek letter "Theta."

Definition

Suppose that \(f\) and \(g\) are two functions, both having domain \(\mathbb{R}\) and codomain \(\mathbb{R}\). Define the relation f has the exact same order as g to mean that

There exist positive real number constants \(A,\) \(B,\) and \(x_{0}\) so that \[ A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\]

The constants \(A,\) \(B,\) and \(x_{0}\) are called witnesses: The constants confirm the "has the exact same order as" relationship between the two functions.
The Remix’s definition of "has the exact same order as" is based on notations and descriptions proposed by Donald Knuth in the 1976 letter "Big Omicron and Big Omega and Big Theta" published in ACM SIGACT News.

Illustration: Two functions that have the exact same order

Consider the functions \(f\) and \(g,\) each with domain \(\mathbb{R}\) and codomain \(\mathbb{R},\) described by the rules \[ f(x) = x + \sin(x), \, g(x) = x.\] The function g is linear and the function f is not, but f can be thought of as asymptotically linear in the sense that its growth is more like that shown in a straight line plot of a linear function than, say, a parabolic plot for a quadratic function or a square root function. This is what we are describing with the relation "has the exact same order as."

Let’s plot the graphs of f and two constant multiples of g to illustrate what the relation "has the exact same order as" means.

BigThetaLinear010

The preceding image shows the plots of the graphs of the functions \(f,\) \(0.9g,\) and \(1.1g.\) Notice that for x between -10 and 10, the corresponding y output for f is sometimes between the two straight lines, sometimes above both lines, and sometimes below both lines.

BigThetaLinear010

However, if we zoom out, it appears that the plotted graph of \(f\) lies between the two straight lines for all values of the input x that have a large absolute value. In fact, the image seems to indicate that \[ 0.9g(x) \leq f(x) \leq 1.1g(x) \text{ for all } x \geq 100.\] That is, the image suggests that f has the exact same order as g. Notice that we could prove, more rigorously, that the inequality is true, by using some algebra and the fact that the sine function’s outputs are in the interval \([-1, \, 1\)], so the functions have the exact same order. The two functions \(f\) and \(g\) grow at the same rate, asymptotically, as the input variable grows without an upper bound.
It appears from the zoomed-out plot that we could choose a value less than 100 for the bound \(x_{0},\) but the definition of "has the exact same order as" does not require us to find optimal values of any of the constants \(A,\) \(B,\) and \(x_{0}.\) In fact, if there exists at least one ordered triple \(( A, \, B, \, x_{0})\) that witnesses the "has the exact same order as" relation between two functions, then there must be infinitely many other ordered triples that witness the same relationship. As an example, we’ve used the ordered triple \(( 0.9, \, 1.1, \, 100)\) for the two multipliers and the lower bound for inputs, but we could use the same two multipliers and any value greater than 100 for \(x_{0}\) instead. Also, the triple \(( \frac{1}{4}, \, 4, \, 0)\) witnesses the same relationship since \[ \frac{1}{4}g(x) \leq f(x) \leq 4g(x) \text{ for all } x \geq 0.\]

BigThetaLinear010_alt

We have the following theorem about the "has the exact same order as" relation.

Theorem

For any functions \(f,\) \(g,\) and \(h\) with domain \(\mathbb{R}\) and codomain \(\mathbb{R},\)

(1) \(f\) has the exact same order as \(f,\)
(2) if \(f\) has the exact same order as \(g,\) then \(g\) has the exact same order as \(f,\)
(3) if \(f\) has the exact same order as \(g,\) and \(g\) has the exact same order as \(h,\) then \(f\) has the exact same order as \(h.\)

Proof
For statement (1), choose any value \(x_{0}\) that is in the domain of \(f\) and \(A = 1\) and \(B = 1\) as witnesses. Since \[1 \cdot |f(x)| \leq |f(x)| \leq 1 \cdot |f(x)| \text{ for all } x \geq x_0\] must be True, \(f\) has the exact same order as \(f.\) Notice that we could have used other values for the witnesses \(A\) and \(B\) such as \(A = 0.99\) and \(B = 1.01.\)

For statement (2), assume that \(f\) has the exact same order as \(g,\) so there are positive real number constants \(A,\) \(B,\) and \(x_{0}\) such that \[A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\] Notice that the extended inequality above can be broken into the two inequalities \[A|g(x)| \leq |f(x)| \text{ and } |f(x)| \leq B|g(x)|\] which are both True for all \(x > x_{0}.\) The two inequalities can be rewritten as \[|g(x)| \leq \frac{1}{A}|f(x)| \text{ and } \frac{1}{B}|f(x)| \leq |g(x)|\] which shows that \[\frac{1}{B}|f(x)| \leq |g(x)| \leq \frac{1}{A}|f(x)| \text{ for all } x > x_{0}.\] The last extended inequality above shows that \(g\) has the exact same order as \(f.\)

For statement (3), assume both that \(f\) has the exact same order as \(g\) and that \(g\) has the exact same order as \(h.\) This means that there are positive real number constants \(A,\) \(B,\) and \(x_{0}\) such that \[A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}\] and also positive real number constants \(C,\) \(D,\) and \(x_{1}\) such that \[C|h(x)| \leq |g(x)| \leq D|h(x)| \text{ for all } x > x_{1}.\] By breaking up the extended inequalities, then doing some algebra and recombining inequalities, you can get \[AC|h(x)| \leq A|g(x)| \leq |f(x)| \text{ and } |f(x)| \leq B|g(x)| \leq BD|h(x)|\] which are True for all pass:[\(x > max(x_{0}, x_{1}).\)] So \[AC|h(x)| \leq |f(x)| \leq BD|h(x)| \text{ for all } x > max(x_{0}, x_{1})\] which shows that \(f\) has the exact same order as \(h,\) witnessed by the constants \(AC,\) \(BD,\) and \(max(x_{0}, x_{1}).\)

These three properties let you conclude that the "has the exact same order as" relation is an equivalence relation, so the relation partitions the set \(S = \{ f \, | \, f \text{ is a function with domain and codomain } \mathbb{R} \}\) into disjoint sets. For each function \(g \in S\) we can define \(\Theta(g)\) to be the equivalence class \[ \Theta(g) = \{ f \, | \, f \text{ has the exact same order as } g \} \] Every function with domain and codomain \(\mathbb{R}\) is an element of at least one of the \(\Theta(g)\) and for any two functions \(g\) and \(h,\) the sets \(\Theta(g)\) and \(\Theta(h)\) must either be equal or have empty intersection. For example, the earlier example shows that \(\Theta(x + \sin x)\) and \(\Theta(x)\) are the same set, so we can say that the function \(f(x) = x + \sin x\) is of linear order.

Mathematicians and computer scientists are very different beasts…​ well, they are all human but they have developed different cultures so they often use the same symbols in different ways.

A mathematician, like the author of the Remix, would write the very formal \(f \in \Theta(g)\) and state "f is an element of Theta g" to mean that "f has the exact same order as g." In the earlier example, a mathematician could abbreviate this a little bit and write "\(x + \sin(x)\) is in \(\Theta(x).\)"

Computer scientists have traditionally written this relation as \(f(x) = \Theta(g(x))\) and state "\(f(x)\) is big Theta of \(g(x)\)." In the earlier example, a computer scientist could write "\(x + \sin(x) = \Theta(x)\)." As a mathematician, I need to point out that the function f is not equal, in the mathematical sense, to the equivalence class containing g because it’s just one of the infinitely many functions in that equivalence class.

I believe that both mathematicians and computer scientists agree that Θ(g(x)) = f(x) is just too hideous a notation to use…​ so please do not ever, ever use it!

12.3. Big O notation

Traditionally, computer scientists are much more interested in the idea that "f grows at most at the rate of g". This corresponds to the second part of the inequality used to define big Theta in the previous section.

Definition

f is of order at most g means that there exist positive real number constants \(B\) and \(x_{0}\) so that \[ |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\] This is usually stated (by computer scientists) as "\(f(x)\) is Big O of \(g(x)\)" and written as \(f(x) = O(g(x)).\)

BigOmegaXPlusSinX

Note that Big O only gives an upper bound on the growth rate of functions. That is, the function \(f(x) = x + \sin(x)\) with domain and range \(\mathbb{R},\) used in an earlier example, is \(O(x)\) but also is \(O(x^{2})\) and is \(O(2^{x}).\)

Big O is typically used to analyze the worst case complexity of an algorithm. If, for example, \(n\) is the size of the input, then big O really only cares about what happens in the "worst-case" when \(n\) becomes arbitrarily large. Mathematically, we want to consider time complexity in this asymptotic sense, when \(n\) is arbitrarily large, so may ignore constants. That we can ignore constants will make sense after discussing how limits, borrowed from continuous mathematics (that is, calculus), can be used to compare the rates of growth of two different functions.

12.3.1. Common Complexities To Consider

The size of the input complexities most commonly used, ordered from smallest to largest, are as follows.

  • Constant Complexity: \(O(1)\)

  • Logarithmic Complexity: \(O(\log (n))\),

  • Radical complexity : \(O(\sqrt{n})\)

  • Linear Complexity: \(O(n)\)

  • Linearithmic Complexity: \(O(n\log (n))\),

  • Quadratic complexity: \(O(n^2)\)

  • Cubic complexity: \(O(n^3)\),

  • Exponential complexity: \(O(b^n)\), \( b > 1\)

  • Factorial complexity: \( O(n!)\)

To understand the sizes of input complexities, we will look at the graphs of functions; it is easier to consider these functions as ones that are defined for any real value input instead of just the natural numbers. This will also allow us to use continuous mathematics (that is, calculus) to analyze and compare the growth of different functions.

Radical growth is larger than logarithmic growth:

geometricsequence
In the preceding graph, we’ve used \(\text{Log}[x\)] to label the graph of a logarithmic function without stating the base for the logarithm: Is this the function \(y = log_{2}(x)\), \(y = log_{10}(x)\), \(y = ln(x) = log_{e}(x)\), or a logarithm to some other base? For the purposes of studying growth of functions, it does not matter which of these logarithms we use: You may recall that one of the properties of logarithms states that for two different positive constant bases \(a\) and \(b\) we must have \(log_{a}(x) = log_{a}(b) \cdot log_{b}(x)\), where \(log_{a}(b)\) is also a constant. As stated earlier, we may ignore constants when considering the growth of functions.

Polynomial growth is larger than radical growth:

geometricsequence

Exponential growth is larger than polynomial growth:

geometricsequence

Factorial growth is larger than exponential growth:

geometricsequence
In the preceding graph, we’ve used \(x!\) to label the graph of the function \(y = \Gamma(x+1)\) , where \(\Gamma\) is the Gamma function which is defined and continuous for all nonnegative real numbers. That is, \(n! = \Gamma(n+1)\) for every \(n \in \mathbb{N}\). Further study of the Gamma function is beyond the scope of this textbook.

Using the graphical analysis of the growth of typical functions we have the following growth ordering, also presented graphically on a logarithmic scale graph.

Ordering of Basic Functions by Growth
\$1,\log \ ⁡n, root(3)(n), sqrt n , n, n^2, n^3,2^n,3^n,n!, n^n\$
geometricsequence

The asymptotic behavior for large \(n\) should be determined by the most dominant term in the function for large \(n\). For example, \(f(x)=x^{3} + 2x^{2}-2x\) for large \(x\), is dominated by the term \(x^3\). In this case we want to state that \(f(x)=O(x^3)\). For example \(f(1000) =1.001998×10^9≈ 1×10^9 =1000^3\). For large \(x\), \(f(x) ≈x^3\) or asymptotically, \(f(x)\) behaves as \(x^3\) for large \(x\). We write \(f(x)=O(x^3),\) that is, \(x^3 +2x^2-2x=O(x^3).\)

Likewise we want to say that if \(c\) is a constant that \(c \cdot f(x)\), and \(f(x)\) have the same asymptotic behavior for large \(n\), or \(O(c \cdot f(x))=O(f(x))\).

Example 1

Show that \(f\left(x\right)=2x^2 +4x\) is \(O(x^2)\)

Solution

While intuitively we may understand that the dominant term for large \(x\) is \(x^2\) so that \(f(x) = O\left(x^2\right)\), we show this formally by producing as witnesses \(A=3\) and \(n =4\) with reference to the following graph.

geometricsequence
Example 2

Show that \(f(x) =2x^3 +3x\) is \(O(x^3)\), with \(A=3\) and \(n=2\). Support your answer graphically.

Solution

Notice that \( x^3 > 3x\) when \( x \geq 2\). This means \(2x^3 +x^3 > 2x^3 +3x \) when \(x >2 \). In other words \( 3x^3 > 2x^3 +3x\) whenever \( x>2\), confirming \(A=3\) and \(n=2\) as witnesses, and supported by the following graph.

geometricsequence

To show that a function \( f(x)\) is not \(O(g(x))\), means that no \(A\) can scale \(g(x)\) so that \( Ag(x) \geq f(x)\) for \(x\) large enough as in the following example.

Example 3

Show that \( f(x) = x^2\) is not \( O( \sqrt{x})\).

Solution

Consider the graphs of \( \sqrt{x}\), \( 2 \sqrt{x}\), \( 3\sqrt{x}\), and the graph of \(x^2\). Notice that eventually, or for \(x\) large enough, \(x^2\) is larger than any \(A \sqrt{x}\) as in the figure below

geometricsequence

Suppose \(A>1\) is given and fixed, then if \( f(x) = x^2\) is \( O(g(x))=O( \sqrt{x})\) , there is a corresponding \(n\), also fixed, for which \(A \sqrt{x} \geq x^2\) whenever \(x>n\).

We solve the inequality \(A \sqrt{x} ≥ x^2\) by dividing both sides by \(\sqrt{x} =x^{1/2}\), to obtain, \(A \sqrt{x} ≥ x^{3/2}\).

But \(A\) is fixed and cannot be greater than all arbitrarily large \( x^{3/2}\). Hence no such \(n\) can exist for a given fixed \(A\).

For example, consider \(g(x)=A \sqrt{x}\) and \( f(x) =x^2 \), when \( x= A^2\) we obtain \( g(A^2) = A \sqrt{(A^2)}= A^2\) and \( f(A^2) = {\left ( {A}^2 \right )}^2\) and \( f(A^2)= A^4 > A^2 = g(A^2) \) when \(A>1\).

12.4. Properties of Big O notation.

Suppose \(f(x)\) is \(O(F(x))\) and \(g(x)\) is \(O(G(x))\).

Properties of Big O Notation
  1. \(c \cdot f(x)\) is \(O(F(x))\)

  2. \( f (x )+g(x)\) is \(O(\max \left ( F(x), G(x) \right )\)

  3. \( f (x ) \cdot g(x))\) is \(O(F(x) \cdot G(x))\)

We can use these properties to show for instance \( 2x^2\) is \(O\left(x^2\right)\). Likewise if \(f(x) =2x^2\) and \(g(x) =4x\), then \( 2x^2\) is \(O(x^2)\) and \( 4x\) is \(O(x)\), and the maximum gives that \(2x^2+4x\) is \( O(\max(x^2, x)) =O(x^2)\).

It is true in general that if a polynomial \(f(x)\) has degree \(n\) then \(f(x)\) is \(O(x^n)\).

Big O for Polynomials

\(p(x)=a_nx^n +a_{n-1}x^{n-1} +a_{n-2}x^{n-2}+\ldots +a_2x^2 +a_1x^1+a_0\) is \(O(x^n)\)

For example, if \(f(x)= x^3+1\) being \( O(x^3)\), and \(g(x)=x^2-x\) being \(O(x^2)\), then \(f(x) \cdot g(x)\) is \(O(x^3 \cdot x^2) =O(x^5)\). This is verified explicitly by multiplying \(f(x) \cdot g(x)= (x^3+1) \cdot (x^2-x)= x^5 -x^4+x^2-x \) which clearly is \(O(x^5)\)

Example 4 - ordering by growth

Order the following functions by growth: \(n⋅\log_2⁡ n\) , \(n^2\), \(n^{4/3}\)

Solution

Recall the ordering,

\(\log_2⁡ n\), \(n^{1/3}\), and \(n\),

which is ordered by logarithmic, then radical, and then polynomial (or linear) growth.

Notice also, that multiplying each by \(n\), preserves the order.

\(n⋅\log_{2⁡}n=n\times \log_{2⁡}n\)

\(n^{4/3} =n \times n^{1/3}\)

\(n^2=n \times n\)

The using the original ordering, \(\log{n}\), \(n^{1/3}\), \(n\), we obtain also the following ordering \(n⋅\log n\), \(n^{4/3}\), \(n^2\).

As a final example we consider ordering three functions by growth using the basic properties for Big O and the basic orderings.

Example 5

Find the Big O of each of the following and then rank by Big \(O\) growth:

\(f\left(x\right)=\left({3x}^3+x\right)2^x+\left(x+x!\right)x^4\)

\(g\left(x\right)=x^x(2^x+x^2)\)

\(h\left(x\right)=5x!+4x^3\log{x}\)

Solution

First consider \(f\left(x\right)\) and using the polynomial property observe that \(\left({3x}^3+x\right)\) is \(O(x^3)\). Using the multiplicative property, conclude that \(\left({3x}^3+x\right)2^x\) is \(O(x^32^x)\). Likewise using the sum property, \(\left(x+x!\right)\) is \(O\left(\max{\left(x,x!\right)}\right)= O (x!)\). Then using the multiplicative property, \(\left(x+x!\right)x^4\) is \(O (x^4x!)\). Then \(f\left(x\right)=\left({3x}^3+x\right)2^x+\left(x+x!\right)x^4\) is \(O\left(\max{\left(x^32^x,x^4x!\right)}\right)=O\left(x^4x!\right)\).

For \(g(x)\), notice using the maximum property for the sum, that \(2^x+x^2\) is \(O(2^x)\). Then using the multiplicative property, \(x^x(2^x+x^2)\) is \(O(2^xx^x)\).

For \(h\left(x\right)\), we want \(O\left(\max{\left(x!,\ x^3\log{x}\right)}\right)=O(x!)\). Notice here, that \(4x^3\log{x}\) is \(O(x^4)\), and \(x^4\) has smaller asymptotic growth than \(x!\). In fact, \(x^4\) is \(O(x!)\).

So, \(f(x)\) is \(O\left(x^4x!\right)\), and \(g(x)\) is \(O\left(2^xx^x\right)\). Also, \(h(x)\) is, \(O\left(x!\right)\).

We conclude that from an ordering perspective, we have by increasing growth order, \(h(x)\), \(f(x)\), and \(g(x)\). To convince yourself that \(g(x)\) grows faster than \(f(x)\), use the facts that \(2^x\) grows faster than \(x^4\), and \(x^x\) grows faster than \(x!\).

12.5. Using Limits to Compare the Growth of Two Functions (CALCULUS I REQUIRED!)

In general, the Remix avoids using calculus methods because calculus is part of continuous mathematics, not discrete mathematics. However, it can be useful to use calculus to compare the growth of two functions \(f(x)\) and \(g(x)\) that are defined for real numbers \(x\), are differentiable functions on the interval \((0,\, \infty)\), and satisfy \(\lim_{x \to \infty} f(x) = \lim_{x \to \infty} g(x) = \infty\). To avoid needing to use the absolute value, we can assume that \(0 < f(x)\) and \(0 < g(x)\) for all \(x \geq 0\) (This assumption is safe to make since both functions go to infinity as \(x\) increases without bound, which means that both functions are positive for all \(x\) values greater than or equal to some number \(x_{0}\)…​ we are just assuming that \(x_{0}=0\) which is the equivalent of shifting the plots of \(f\) and \(g\) to the left by \(x_0\) units.)

If \(f(x)\) and \(g(x)\) are such functions and \(\lim_{x \to \infty} \frac{f(x)}{g(x)} = L\), where \(0 \leq L < \infty\), then \(f(x)\) is \(O(g(x)),\) and if \(0 < L < \infty\) then \(f(x)\) is \(\Theta(g(x)).\)

To see this, recall that \(\lim_{x \to \infty} \frac{f(x)}{g(x)} = L\) means that we can make the value of \(\frac{f(x)}{g(x)}\) be as close to \(L\) as we want by choosing \(x\) values that are sufficiently large. In particular, we can make \(L-\frac{L}{2} < \frac{f(x)}{g(x)} < L+\frac{L}{2}\) be true for all \(x\) greater than some real number \(x_{0}\). Now we can use the earlier stated assumption that \(0 \leq g(x)\) to rewrite the inequality as \((L-\frac{L}{2}) \cdot g(x) < f(x) < (L+\frac{L}{2}) \cdot g(x)\), which is true for all \(x >x_{0}\). We can choose for our witnesses \(B = L + \frac{L}{2}\) and \(x_{0}.\) This means that \(f(x) < B \cdot g(x)\) whenever \(x > x_{0},\) which shows that \(f(x)\) is \(O(g(x))\). Furthermore, if \(L>0\) we can choose \(A = L - \frac{L}{2}\) as a witness for the lower bound, too, which means that \( A \cdot g(x) < f(x) < B \cdot g(x)\) whenever \(x > x_{0},\) so \(f(x)\) is \(\Theta(g(x))\).

Note that using this method does not focus on determining the actual numerical values of \(A\) and \(n\) but just guarantees that the witnesses exist, which is all that is needed to show that \(f(x)\) is \(O(g(x))\).

Example 6

Show that \(100,000 n + n \cdot log (n)\) is \(O(n \cdot log (n))\).

Solution

Notice that the expressions \(100,000 x + x \cdot log (x)\) and \(x \cdot log (x)\) can be used to define differentiable functions on the interval \((0,\, \infty)\). We changed the variable from \(x\) to \(n\) to stress that we are treating the variable as a real number in this example. Also, we will assume that \(log (x)\) is the natural logarithm; as mentioned earlier, any other base for the logarithm results in a constant multiple of the natural logarithm and will not effect the Big-\(O\) computations.

Let \(f(x) = 100,000 x + x \cdot log (x)\) and \(g(x) = x \cdot log (x)\). It is easy to see that \(\lim_{x \to \infty} f(x) = \lim_{x \to \infty} g(x) = \infty\).

Now let’s compute \(\lim_{x \to \infty} \frac{f(x)}{g(x)}\), that is, \(\lim_{x \to \infty} \frac{100,000 x + x \cdot log(x)}{x \cdot log (x)}\). Direct computation gives the indeterminate form \(\frac{\infty}{\infty}\), so we can use L’Hôpital’s rule to write \(\lim_{x \to \infty} \frac{100,000 x + x \cdot log(x)}{x \cdot log(x)} = \lim_{x \to \infty} \frac{100,000 + (1 \cdot log (x) + x \cdot \frac{1}{x})}{1 \cdot log(x) + x \cdot \frac{1}{x}} = \lim_{x \to \infty} \frac{100,000 + log (x) + 1}{log(x) + 1}\). This limit still gives us an indeterminate form if we try to directly find the limits of the numerator and denominator separately without some simplification, but we can divide both numerator and denominator by \(log (x)\) to rewrite the last limit as the equivalent limit \(\lim_{x \to \infty} \frac{\frac{100,001}{log (x)} + 1}{1 + \frac{1}{log(x)}} = \frac{0+1}{1+0} = 1\). Since the limit is a positive finite number, \(100,000 x + x \cdot log (x)\) is \(\Theta(x \cdot log (x))\) which means that is also \(O(x \cdot log (x)).\) As mentioned above, we do not need to find the actual values of the witnesses when using this limit method.

12.6. Exercises

  1. Give Big O estimates for

    1. \(f\left(x\right)=4\)

    2. \(f\left(x\right)=3x-2\)

    3. \(f\left(x\right)=5x^6-4x^3+1\)

    4. \(f\left(x\right)=2\ \ \sqrt x+5\)

    5. \(f\left(x\right)=x^5+4^x\)

    6. \(f\left(x\right)=x\log{x}+3x^2\)

    7. \(f\left(x\right)=5{x^2e}^x+4x!\)

    8. \(f\left(x\right)=\displaystyle \frac{x^6}{x^2+1}\) (Hint: Use long division.)

  2. Give Big O estimates for

    1. \(f\left(x\right)=2^5\)

    2. \(f\left(x\right)=5x-2\)

    3. \(f\left(x\right)=5x^8-4x^6+x^3\)

    4. \(f\left(x\right)=\) \$4 root(3)(x)+3\$

    5. \(f\left(x\right)=3^x+4^x\)

    6. \(f\left(x\right)=x^2\log{x}+5x^3\)

    7. \(f\left(x\right)=5{x^610}^x+4x!\)

    8. \(f\left(x\right)=\displaystyle \frac{x^5+2x^4-x+2}{x+2}\) (Hint: Use long division.)

  3. Show, using the definition, that \(f\left(x\right)=3x^2+5x\) is \(O(x^2)\) with \(A=4\) and \(n=5\). Support your answer graphically.

  4. Show, using the definition, that \(f\left(x\right)=x^2+6x+2\) is \(O(x^2)\) with \(A=3\) and \(n=6\). Support your answer graphically.

  5. Show, using the definition, that \(f\left(x\right)=2x^3+6x^2+3\) is \(O(x^2)\). State witnesses \(A\) and \(n\), and support your answer graphically.

  6. Show, using the definition, that \(f\left(x\right)=\ {3x}^3+10x^2+1000\) is \(O(x^2)\). State the witnesses \(A\) and \(n\), and support your answer graphically.

  7. Show that \(f\left(x\right)=\sqrt x\) is \(O\left(x^3\right)\), but \(g\left(x\right)=x^3\) is not\(\ O(\ \sqrt x)\).

  8. Show that \(f\left(x\right)= x^2\) is \(O\left(x^3\right)\), but \(g\left(x\right)=x^3\) is not\(\ O( x^2)\).

  9. Show that \(f\left(x\right)=\sqrt x\) is \(O\left(x\right)\), but \(g\left(x\right)=x\) is not\(\ O(\ \sqrt x)\).

  10. Show that \(f\left(x\right)=\) \$root(3)(x)\$ is \(O\left(x^2\right)\), but \(g\left(x\right)=x^2\) is not \$O( root(3)(x))\$

  11. Show that \(f\left(x\right)=\) \$root(3)(x)\$ is \(O\left(x\right)\), but \(g\left(x\right)=x\) is not \$root(3)(x)\$.

  12. Order the following functions by growth \(x^\frac{7}{3},\ e^x,\ 2^x,\ x^5,\ 5x+3,\ 10x^2+5x+2,\ x^3,\log{x,\ x^3\log{x}}\)

  13. Order the following functions by growth from slowest to fastest. \(\ 3x!,\ {10}^x,\ x\cdot\log{x},\ \log{x\cdot\log{x,\ \ }2x^2+5x+1,\ \pi^x,x^\frac{3}{2}\ },\ 4^5,\ \ \sqrt{x\ }\cdot\log{x}\)

  14. Consider the functions \(f\left(x\right)=2^x+2x^3+e^x\log{x}\) and \(g\left(x\right)=\sqrt x+x\log{x}\). Find the best big \(O\) estimates of

    1. \((f+g)(x)\)

    2. \((f\cdot\ g)(x)\)

  15. Consider the functions \(f\left(x\right)=2x+3x^3+5\log{x}\) and \(g\left(x\right)=\sqrt x+x^2\log{x}\). Find the best big \(O\) estimates of

    1. \((f+g)(x)\)

    2. \((f\cdot\ g)(x)\)

  16. State the definition of "\( f(x)\) is \( O(g(x))\)"" using logical quantifiers and witnesses \(A\) and \(n\).

  17. Negate the definition of "\( f(x)\) is \( O(g(x))\)" using logical quantifiers, and then state in words what it means that \( f(x)\) is not \( O(g(x))\).

13. Algorithms and Their Analysis

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on April 14, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

An algorithm is a step-by-step process, defined by a set of instructions to be executed sequentially, that is used to compute a value or solve a problem. Algorithms are used in discrete mathematics to perform numerical computations, to sort lists, to do searches, insertions, and deletions of items in data structures, to solve optimization problems, and more.
The word algorithm is derived from the name of Abū ‘Abd Allāh Muḥammad ibn Mūsā al-Khwārizmī. Western Europeans learned how to do arithmetic with Hindu-Arabic numerals and the base-ten place-value system from a 12th-century AD Latin translation of one of al-Khwārizmī’s books (The Latin translation was done about 300 years after al-Khwārizmī wrote the original book in Arabic; the original is now lost.) By the way, the word algebra is derived from the title of another one of Al-Khwārizmī’s books.

In this chapter, several algorithms are implemented in Python using non-optimal code to illustrate all steps needed. As mentioned in the About this text chapter, this was a deliberate choice because the code examples are designed for teaching the mathematical concepts and for neither presenting the most efficient implementations nor illustrating the "Pythonic" way of coding. However, a few code examples in this chapter do show how you can complete the same task by using some of Python’s built-in methods.

Key terms and concepts covered in this chapter:

  • Algorithm

    • Properties of an algorithm

  • Types of algorithm

    • Arithmetic

    • Search

    • Sorting

13.1. What Is An Algorithm? What Is Not An Algorithm?

There are several problem-solving strategies that humans use, but not all of these strategies are algorithms.

An algorithm should have the following properties, based on Donald E. Knuth’s description in his famous Art of Computer Programming, The: Volume 1: Fundamental Algorithms, 3rd Edition:

Definiteness

Each step must be precisely defined. The actions taken at each step must be unambiguously specified.

Effectiveness

Each step must be so basic that it can be done exactly and in a finite length of time.

Finiteness

The process must terminate after a finite number of steps (although the number of steps may be a very large natural number.)

Input

Zero or more input quantities are taken from a specified set. Inputs can be introduced before the initial step of the algorithm or dynamically before any later step of the algorithm.

Output

The algorithm produces one or more outputs that have a specified relation to the inputs. Furthermore, the algorithm should produce the correct outputs for each set of inputs.

13.1.1. An Example: Finding the Minimum in a List of Integers

Consider the following algorithm for finding the minimum element in a finite sequence of integers.

  • Task: Given a finite list of integers, find then minimum value in the list.

  • Input: A finite list of integers L.

  • Steps:

    1. Define a variable \(min\) and set its value to the initial value in the list.

    2. While there are values in the list that have not been examined,

      1. Increase the index by 1

      2. If the value of the element at the current index is less than the value of \(min,\) set the value of \(min\) to the value of the element at the current index.

  • Output: The value of \(min.\)

Does this satisfy all the requirements for an algorithm? Each step is precisely defined and can be done exactly and in a finite length of time. The process terminates after a finite number of steps (when \(min\) has been compared to the element at the highest index of the list.) The input and output are specified. This process satisfies the requirements for an algorithm.

A Python implementation of this algorithm is given below.

Example 1 - Minimum in Python

The Python code below uses a while loop to implement an algorithm that finds the minimum in a list of integers.

Note that Python provides a built-in function to find the minimum of a list: The minimum value of the list L can be computed using min(L).

13.2. Arithmetic Algorithms

Historically, algorithms first arose as ways to solve arithmetic problems. In this section, algorithms for such operations are presented.

13.2.1. Division By Repeated Subtraction

A simple division algorithm is presented in this subsection.

Example 2 - Division of Integers by Repeated Subtraction

The code below implements integer division of the positive integer a by the positive integer b. This algorithm is very ancient, dating back to at least Euclid’s Elements.

Click on the "Next" button to step through the code.

Big-O Analysis Of Division By Repeated Subtraction

Notice that the "division by repeated subtraction" algorithm has a worst-case scenario when the divisor b equals 1. In this worst-case scenario you must subtract the divisor \(b=1\) exactly \(a\) times to exit the loop, so "division by repeated subtraction" is \(O(a),\) that is, of at least linear order in the larger number \(a.\)

13.2.2. Long Division Of Natural Numbers

You can revise the previous algorithm to work faster by using place-value thinking.

Example 3 - Long Division

The code below implements integer division of the integer a by the positive integer b.

Click on the "Next" button to step through the code.

This long division algorithm uses powers of two (instead of ten) but otherwise is like the standard algorithm, using decimal notation, that you may have learned in school.
Notice that the shift operators << and >> multiply or divide by 2 (>> gives only the quotient, without keeping track of the remainder.)

Big-O Analysis Of Long Division

The worst-case scenario corresponds to the divisor \(b=1,\) which requires you to shift as many times as there are binary digits in \(a.\) Notice that \(a\) is bounded above by 2 raised to the number of binary digits; this means that the long division algorithm is \(O(\log (a)).\)

Python provides built-in operators to find the quotient and remainder for variables a and b of type int: a//b is the quotient and a%b is the remainder. There is also the operator divmod(a,b) that returns an ordered pair (of data type tuple) containing both the quotient and the remainder, but divmod is less efficient than // and % and should only be used when you need to find both the quotient and the remainder.

13.2.3. Greatest Common Divisors: The Euclidean Algorithm

It is often needed to find a common divisor of two integers. This next algorithm goes back to Euclid.

Example 4 - The Euclidean Algorithm

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

Big-O Analysis Of The Euclidean Algorithm

The Euclidean Algorithm takes two positive integer inputs \(a\) and \(b\) with \(a > b.\) It can be proven using mathematical induction that the worst-case scenario for this algorithm is \(O(\log (b)).\) The proof uses the closed-form for the Fibonacci numbers.

Python’s math module includes a function gcd that will compute the greatest common divisor of two or more integers.

13.3. Search Algorithms

In the first two subsections you will see two algorithms for searching for a target integer within a list of integers. In the third subsection, Python’s built-in search methods are discussed.

RECOMMENDATION: The "Algorithms and Recursive Functions" activity can replace one or both of the first two subsections.

13.3.1. The Linear Search Algorithm

Linear search compares a target integer, t, to each element in a list of distinct integers, starting at index 0, and returns either the index i at which the target integer was found or a value indicating that the target integer was not found in the list.

A Python implementation of the linear search algorithm is given below.

Example 5 - Linear Search Algorithm in Python

The Python code below uses a while loop to implement the linear search algorithm. The code prints either the index at which the target was found in the list or the built-in constant None to indicate that the target was not found.

Why not use -1 to indicate that the target integer was not found?

Negative integers can be valid indices for a Python list! This is very different than other languages like Java in which indices must be natural numbers. As an example, for the list \(L = \lbrack 2,4,7 \rbrack\) we have \(L \lbrack -1 \rbrack = 7,\) \(L \lbrack -2 \rbrack = 4,\) and \(L \lbrack -3 \rbrack = 2.\) If you are coding in Python it may be safer either to raise an exception or to use the built-in constant None to indicate that no index for the target was found.

The linear search algorithm iterates across a list of \(n\) data elements. If the first element in the list is the target element, the algorithm stops. Otherwise, move to the next element and continue repeatedly until the target element is found or not. If the target element is not in the search list the algorithm exhaustively searches through every single element.

This is the worst case scenario with linear search in which the algorithm inspects every single element, either because the target element is the last element of the array, or the target element is not actually in the search list at all. The algorithm runs in \(O(n)\) time in the worst case.

13.3.2. The Binary Search Algorithm

The binary search algorithm searches a sorted list L of integers for a target value t. The algorithm starts looking for t in the middle of the sorted list. If t is greater than the value in the middle, the algorithm continues the binary search in the upper half of the list, otherwise the algorithm continues the binary search in the lower half of the list. The algorithm continues in this way until we reach a list of length 1 that either does or does not have t as its only element.

Example 6 - Binary Search Algorithm in Python

The binary search algorithm searches for a target element \(x\) in a list of \(n\) elements by comparing the middle element in the the sorted data set with the target \(x\). The algorithm stops if the middle element \(a_m\) is the target element. Otherwise the search continues with half the data set—​the half to the left if the middle element is larger than the target \(x\) or the half to the right if the middle element is smaller than the target.

The number of steps in the binary search then is the number of times we have to split the data set until we locate the target element, or determine that the target element is not in the search list after splitting down to 1 element.

The number of times we need to split the data set of size \(n\), in the worst case then, is \(p\) which is found by solving the exponential equation,

\(2^p = n\).

The algorithm then is \(O(p)\).

The solution of the exponential equation, \(2^p = n\), is in log form,

\(p=\log_2{n}\).

The binary search algorithm then is \(O(p)=O(\log{n})\).

13.3.3. Searching Within a List using Python

In this subsection you will see how to use Python to efficiently search lists.

Example 7 - Searching a List in Python

You can search for the index of a target value in a Python list by calling the list.index(x) method. This method returns the least natural number index of the target value if it is found in the list, otherwise it raises a ValueError.

If you need to know whether the target value x is in the list s but do not need the least index, you can use x in s which returns a Boolean.

If you need to know how many times the target value x occurs in the list s you can use the list.count(x) method.

13.4. Sorting Algorithms

In this section you will see three algorithms for sorting a list of real numbers. Two of these algorithms, bubble sort and insertion sort, are inefficient but are presented here as in many other textbooks because they are easy to understand and analyze. The third algorithm, merge sort, is an efficient recursive algorithm.

In Python, the elements of a list L can be sorted into increasing order by calling the list.sort() method. This built-in method uses one of two sorting algorithms that won’t be discussed in this textbook: the Timsort algorithm in Python versions 2.3 to 3.10, or the Powersort algorithm in Python versions 3.11 to 3.12 (the current version as of this writing).
Python built-in sort() method

The code below uses Python’s built-in sort() method.

13.4.1. Bubble Sort

ArrayWithSixIntegerValues

The bubble sort algorithm is a simple sorting procedure. It is typically used to sort a list of n data elements in either increasing or decreasing order.

NOTE: This algorithm is called "bubble sort" because "the lighter items bubble up to the top" of the list, closer to index 0, like bubbles in a drink.

WARNING: The bubble sort algorithm produces the correct result but is very inefficient. You should almost never use bubble sort in code that you write. In almost every application that requires sorting a list, there is an algorithm that can be used that is much more efficient than bubble sort. You have been warned!
In fact, most modern programming languages have built-in sort methods for you to use; these built-in methods are implemented using efficient algorithms.

We describe the bubble sort algorithm for arranging a list of \(n\) real numbers in increasing order.

  • The algorithm compares the first two elements of the list and swaps them if they are out of order.

  • It continues by traversing the list in order of increasing index, comparing each pair of adjacent elements and swapping them if they are out of order until we reach the last entry in the list at index \(n-1\).

  • The last entry in the list will then be the largest element of the original list.

  • After the largest element has been sorted into position \(n-1\), the algorithm continues by again comparing the first two elements and swapping if they are out of order.

  • Continue traversing the list and comparing and swapping adjacent elements that are out of order until position \(n-2\) of the array, after which the 2nd largest element is at index \(n-2\). The elements, now at indices \(n-1\) and \(n-2\) are sorted.

  • Continue to sort at indices \(n-3,\) then \(n-4,\) and so on, until all elements are in increasing order.

A Python implementation of the bubble sort algorithm is given below.

Python implementation of the Bubble Sort Algorithm

The code below uses two nested while loops to implement the Bubble Sort algorithm.

Big-O Analysis of Bubble Sort

We analyze the bubble sort algorithm beginning with a concrete list of size \(n=5\) and generalize the analysis.

Consider the case of a list of size \(n=5\). The naive bubble sort algorithm in this case will involve 4 passes.

In the first pass, there will be 4 comparisons and up to 4 swaps, after which the element in position 5 is in its correct position.

In the second pass, there will be 3 comparisons and up to 3 swaps, after which the element in position 4 is in its correct position.

In the third pass, there will be 2 comparisons and up to 2 swaps, after which the element in position 3 is in its correct position.

In the fourth pass, there will be 1 comparison and one possible swap , after which both the elements in positions 1 and 2 are both in their correct positions.

Adding the comparisons from each pass we obtain,

\(4+3+2+1=1+2+3+4\).

In general, if the list is of size \(n\), there will be \(n-1\) passes with swaps,

\((n-1)+(n-2)+...+2+1 = 1+2+...+(n-2)+(n-1)\).

You can use mathematical induction to prove that \[1+2+\cdots+(n-2)+(n-1)= \frac{(n-1)\cdot n}{2}\] and since \(\frac{(n-1)\cdot n}{2} =\frac{1}{2}n^2-\frac{1}{2}n\), the bubble sort algorithm is \(O(n^2)\).

13.4.2. Insertion Sort

The insertion sort works through a list and classifies two sections as sorted and unsorted.

  • The insertion sort scans through each element of the list using an outer loop with a variable, say \(i\).

  • At each stage, the list is divided into a sorted section, say the left section, and a section that is not sorted, say the right.

  • The location up to which the list is sorted, is denoted by a pointer or index, called a key.

  • At the current stage, the next element from the unsorted section, on the right, is inserted into its appropriate position in the sorted section on the left.

  • The process of inserting smaller elements in the left involves shifting, larger elements to the right, using a variable, say \(j\).

A Python implementation of the Insertion Sort Algorithm is given below:

Example 8 - Insertion Sort in Python

The code below uses two nested while loops to implement the Insertion Sort algorithm.

Big-O Analysis of Insertion Sort

It is left as an exercise to verify that the insertion sort algorithm is \(O(n^2)\).

13.4.3. Merge Sort

Merge Sort is a recursive sorting algorithm. The general idea is to divide a list of length \(n\) into \(n\) sublists of length \(1\), where a list of length one is considered already sorted. Subsequently, the algorithm repeatedly merges the sublists to make sorted lists until only a single list of length \(n\) is remaining. This will be the a sorted list consisting of the elements of the original list.

The picture below illustrates this with a list of length seven.
Image credit: "Merge Sort Algorithm Diagram" by VineetKumar. This work has been released into the public domain by its author, VineetKumar at English Wikipedia. This applies worldwide.

GGC
Example 9 - The Merge Sort Algorithm in Python
Big-O Analysis of Merge Sort

In the first part of Merge Sort, similar to Binary Search which is \(O(\log{n})\), the list of size \(n\) is recursively split into sublists. In the second part of Merge Sort, \(n\) elements are merged which is \(O(n)\). Since the algorithm performs the first and the second parts, we can multiply to find that the complexity of Merge Sort is \(O(n \log{n})\).

MORE TO COME!

14. Graphs

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on May 3, 2025.
Inserted a version of Dijkstra’s algorithm for a minimal-weight path between two given vertices.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Graphs are discrete mathematical structures that have many applications in a diversity of fields including chemistry, network analysis, algorithms, and social sciences.

Key terms and concepts covered in this chapter:

  • Graphs

    • Undirected graphs

    • Directed graphs

    • Weighted graphs

  • Creating new graphs from old graphs

    • Subgraphs

    • Unions and intersections of graphs

  • Graph isomorphism

  • Graph coloring

  • Connectivity of graphs

    • Eulerian circuits

    • Hamiltonian circuits

    • Shortest-path problems

      • Dijkstra’s algorithm

      • Traveling Salesperson Problem (TSP)

14.1. Introduction and Basic Definitions

A graph consists of a set of vertices (also called nodes) and a set of edges, where each edge connects either two different vertices or one vertex to itself.

  • For each edge, its endpoints are the vertices that it connects. The edge is said to be incident with each endpoint, and to connect the endpoints.

  • If an edge has only one endpoint, it is called a loop.

  • If two or more edges connect the same endpoints (or endpoint if the edges are loops), the graph is called a multigraph. Two edges that have the same endpoints are called parallel edges.

  • Two vertices are adjacent if they are the endpoints of at least one edge. Adjacent vertices are also called neighbors.

  • The degree of a vertex \(v\) is the number of edges in the graph \(G\) containing the vertex \(v.\)

  • An isolated vertex is a vertex that is not an endpoint of any edge.

  • A simple graph is a graph that has no loops and does not have two or more edges that connect the same endpoints.

Example 1

The graph shown has 7 vertices and 7 edges.

graphMKD1

This is one graph that is made up of three separate connected components (Connectivity will be defined in detail later in the chapter, but is introduced informally here).

  • One connected component contains the vertices \(A\) and \(C\) and two edges that connect them.

  • A second connected component contains the vertices \(B\), \(D\), \(E\), and \(F\) and the edges that are incident to those vertices.

  • A third connected component contains the single isolated vertex \(G\) and no edges.

In the second connected component, the graph is drawn so that the edge with endpoints \(B\) and \(F\) and the edge with endpoints \(D\) and \(E\) cross, but the point of intersection is ignored because it is not a vertex. We could redraw this graph so that the two edges do not cross; for example, we could move \(E\) inside the triangle. However, there are some graphs which cannot be drawn in 2 dimensions without some edge crossings.

This graph is a multigraph because there are multiple edges that connect the pair of vertices \(\{A,C\}\).

This graph is not simple because (1) it contains a loop and (2) it has a pair of vertices that are connected by two different edges.

14.2. Simple Graphs

A simple graph is a graph that has no loops and no parallel edges (that is, no two edges can connect the same endpoints.) This means that each edge is determined by its two distinct endpoints. This allows us to give a relatively simple but formal set-theoretic definition of "simple graph". This formal definition can be useful if you need to implement a simple graph in code. Graphs discussed in this textbook are assumed to be simple unless stated otherwise.

14.2.1. A Formal Definition

A simple graph \(G=\left(V,\ E\right)\) is an ordered pair consisting of a set \(V\) of objects called vertices (also called nodes) and a set \(E\) of objects called edges. An edge \(e\in\ E\) is a set of the form \(e=\left\{x,y\right\}\), where the vertices \(x\) and \(y\) are two different elements of \(V\). The two vertices \(x\) and \(y\) in the edge \(e=\left\{x,y\right\}\) are said to be adjacent or connected or neighbors, and \(x\) and \(y\) are called the endpoints of the edge \(e\).

Example 2 - a simple graph.

The graph shown has vertex set \(\left\{A,\ B,\ C,\ D,\ E,\ F\right\}\) and edge set \( \{\{A,C\},\{A,D\},\{B,D\}\{B,F\},\{C,F\},\{D,F\},\{E,F\}\}.\)

graph1
Example 3

The degrees of each of the vertices in the undirected graph \(G\) with vertex set \(V=\{A,B,C,D,E,F,G\}\) and edge set \(E=\{\{A,C\},\{A,D\},\{B,D\}\{B,F\},\{C,F\},\{D,F\},\{F,G\}\) are,

\(d\left(A\right)=2\)

\(d\left(B\right)=2\)

\(d\left(C\right)=2\)

\(d\left(D\right)=3\)

\(d\left(E\right)=0\)

\(d\left(F\right)=4\)

\(d\left(G\right)=1\)

Notice that the sum of all the degrees \(d\left(A\right)+\ d\left(B\right)\ +\ d\left(C\right)+\ \ d\left(D\right)\ \ +d\left(E\right)+\ d\left(F\right) + d\left(G\right)=14\), which is is twice the number of edges, \(2 \cdot \left|E\right|=7\). In fact, this is true for any undirected graph with finitely many edges as long as any loops are counted twice. This result is called the Handshaking Theorem. A formal proof of the Handshaking Lemma can be written using mathematical induction on the predicate \(P(n):\) "If graph \(G\) has \(n\) edges, then the sum of the degrees of the vertices of \(G\) is equal to \(2n.\)"

Handshaking Theorem

The sum of the degrees of the vertices of a graph \(G=\left(V,\ E\right)\) is equal to twice the number of edges in \(G\). That is, \(\displaystyle \sum_{v\in V}{d\left(v\right)=2\ \left|E\right|}\).

A useful consequence of this to keep in mind is that the sum of the degrees of a graph is always even.

14.3. Directed Graphs

The main focus of this chapter will be undirected simple graphs, but we will briefly discuss directed graphs in this section.

A directed graph (or digraph) is a graph in which the edges are directed from one vertex to another vertex. Each edge has an initial vertex \(u\) and a terminal index \(v;\) the edge is drawn as an arrow pointing from \(u\) to \(v.\)

The out-degree of a vertex \(w\) is the number of edges that have \(w\) as the initial index. The in-degree of a vertex \(w\) is the number of edges that have \(W\) as the terminal index.

Example 4 - A directed graph.

The graph \(G=(V,E)\) with vertex set \(V=\{A,B,C,D,E,F\}\) and edge set \(E=\{ (A,C),(D,A),(B,D),(F,B),(C,F),(D,F),(F,E) \}\). The first coordinate of each edge is the initial vertex and the second coordinate is the terminal vertex.

graph2
Example 5 - The game "rock, paper, scissors"

The graph \(G=(V,E)\) with vertex set \(V = \{ \text{"rock", "paper", "scissors"} \}\) and edge set \(E = \{ \text{("rock", "paper"), ("paper", "scissors"), ("scissors", "rock")} \}\) can be used to represent the game "rock, paper, scissors."

rock paper scissors digraph

Each directed edge has for its initial vertex the loser and for its terminal edge the winner.

14.3.1. Simple Directed Graphs

We can give a formal set-theoretic definition of simple directed graph as well. To indicate the directed edges, ordered pairs of vertices are used instead of 2-element sets.

A simple directed graph \(G=\left(V,\ E\right)\) is an ordered pair consisting of a set \(V\) of objects called vertices (or nodes) and a set \(E\) of objects called edges. A directed edge \(e\in\ E\) is an ordered pair of the form \(e=\left(x,y\right)\), where the vertices \(x\) and \(y\) are two different elements of \(V\). Vertex \(x\) is the initial vertex of \(e\) and vertex \(y\) is the terminal vertex of edge \(e\).

14.4. Examples of Simple Graphs

In this section presents several classes of graphs.

KompletGraphOn4Vertices

The complete graph \(K_n\) is the simple graph with \(n\) vertices such that any two vertices are adjacent, that is, every pair of vertices are the endpoints of an edge. The image shows \(K_{4},\) the complete graph on 4 vertices. Click here to see images of \(K_{n}\) for the positive integers that are less than or equal to \(12.\)

nCubesv1

The n-cube \(Q_{n}\) can be described as the graph that has vertex set consisting of the \(2^{n}\) bitstrings of length \(n,\) and edges such that two vertices are adjacent if and only if the bitstrings differ in exactly one bit position. The image shows the three graphs \(Q_{1},\) \(Q_{2},\) and \(Q_{3};\) these graphs can be used as a way to represent the power sets of sets that have \(1,\) \(2,\) and \(3\) elements, respectively. Notice that \(Q_{2}\) can be drawn as a square and that \(Q_{3}\) can be represented as a cube in \(3\)-dimensional space (or by a drawing of a cube in a \(2\)-dimensional plane.)

A bipartite graph is a simple graph whose set of vertices can be partitioned into two disjoint nonempty sets such that every vertex is in exactly one of the two sets and every edge has one endpoint in each of the two sets. One way to think of a bipartite graph is that each vertex can be assigned one of two colors so that every edge must connect vertices of different colors. Notice that \(Q_{1},\) \(Q_{2},\) and \(Q_{3}\) are all examples of bipartite graphs (Question: Is \(Q_{n}\) a bipartite graph for every natural number \(n?\) Why or why not?)

3cubev2

This image shows the graph \(K_{2,3}\) and is another example of a bipartite graph. Notice that \(K_{2,3}\) has an additional property: Every pair of vertices \(\{a, b \}\) with \(a\) in the set of \(2\) "upper" vertices and \(b\) in the set of \(3\) "lower" vertices are the endpoints of an edge. A bipartite graph that has this additional property is called a complete bipartite graph. In general, the symbol \(K_{m,n}\) represents the complete bipartite graph that has two disjoint sets of vertices, one of cardinality \(|m|\) and the other of cardinality \(|n|,\) such that every pair of vertices that come from the different sets are joined by an edge. Notice that \(Q_{1} = K_{1,1}\) and \(Q_{2} = K_{2,2}\) are complete bipartite graphs, but that \(Q_{3}\) is not a complete bipartite graph because, for example, there is no edge joining \(000\) and \(111.\)
NOTE: The phrase "complete bipartite" needs to be read as a single term used to indicate that a bipartite graph has all the edges it can possibly have. For example, \(K_{2,3}\) is a bipartite graph such that if you tried to enlarge it by inserting an additional edge into the graph, that edge would join either the \(2\) "upper" vertices, \(2\) of the "lower" vertices, or \(2\) vertices that are already joined; in this sense, \(K_{2,3}\) is "complete" as a bipartite graph. \(K_{2,3}\) is not a "complete graph" in the sense of the earlier example in this section. In fact, since a "complete graph" must contain an edge for every pair of distinct vertices, the only graph that can be both a "complete graph" and a "complete bipartite graph" is \(Q_{1} = K_{2} = K_{1,1}.\) Mathematicians recycle and reuse a lot of words…​ .

14.5. Representing Simple Graphs

In addition to the vertex-edge drawing, a simple graph can be represented in other ways that are more useful for computing.

First, recall that if \(u\) is a vertex of a simple graph, then vertex \(v\) is said to be adjacent to \(u\) if and only if \(\{u, v \}\) are the endpoints of an edge of the graph.

One way to represent a simple graph is by using an adjacency list. This list can be written as a table, where each row has two columns. In each row, the entry in the first column is a single vertex \(v\) and the entry in the second column is a list of all vertices of the graph that are adjacent to \(v.\)

Another way to represent a simple graph is by using an adjacency matrix. The adjacency matrix of a simple graph represents the graph in table form, and contains an entry for each pair of vertices. For each vertex of the graph, there is a row and also a column. If vertices \(u\) and \(v\) are adjacent (that is, connected by some edge), then the adjacency matrix will contain a \(1\) in the position that corresponds to the row for \(u\) and the column for \(v,\) otherwise the matrix contains a \(0\) at that postion. The next example may help make this more clear.

Example 6 - Representing A Simple Graph

The graph with vertex set \(\left\{A,\ B,\ C,\ D,\ E,\ F\right\}\) and edge set \(\{\{A,C\},\{A,D\},\{B,D\}\{B,F\},\{C,F\},\{D,F\},\{E,F\}\}\) can be represented by

the drawing

graph1

or the adjacency list

Vertex Adjacent Vertices

A

C, D

B

D, F

C

A, F

D

A, B, F

E

F

F

B, C, D, E

or the adjacency matrix

\(\mathbf{M}=\left(\begin{matrix}0&0&1&1&0&0\\0&0&0&1&0&1\\1&0&0&0&0&1\\1&1&0&0&0&1\\0&0&0&0&0&1\\0&1&1&1&1&0\\\end{matrix}\right)\)
For example, in matrix \(\mathbf{M}\) the rows, from top to bottom correspond to the vertices \(A,\ B,\ C,\ D,\ E,\ F\) and the columns, from left to right, corespond to vertices \(A,\ B,\ C,\ D,\ E,\ F.\) The values in row 3, which corresponds to vertex \(C\), indicate whether the vertex for that column is adjacent to \(C.\) If we use the symbol \(M_{r,c}\) to stand for the value in row \(r\) and column \(c,\) then \(M_{3,5} = 0\) because there is no edge in the graph with endpoints \(C\) and \(E,\) and \(M_{3,6} = 1\) because there is an edge in the graph with endpoints \(C\) and \(F\).

14.6. Weighted Graphs

In some applications, each edge of a graph has a weight, which is some nonnegative number. The weight could represent the physical distance between the two endpoint nodes, or could represent the cost to travel or transmit data between the endpoint nodes.

You can use an adjacency matrix to describe a weighted graph, but instead of using a \(1\) to represent that there is an edge between two vertices you place the the weight of the edge in the correct position of the adjacency matrix, as shown in the following example.

Example 7 - Weighted Graph

Consider the following weighted simple graph

graph5

The adjacency matrix of this weighted graph is \( \left(\begin{matrix}0&2&5&0\\2&0&3&0\\5&3&0&1\\0&0&1&0\\\end{matrix}\right). \)

14.7. Creating New Graphs From Old Graphs

Given a set of one or more graphs, there are several ways to create new graphs using the graphs in the set.

14.7.1. Subgraphs

Given a simple graph \(G,\) you can form a subgraph \(H\) by choosing a subset of the vertices of \(G\) along with a subset of the edges of \(G\) such that each edge has endpoints in the set of vertices you chose. That is, \(H\) is a subgraph of \(G\) if \(H\) is a graph such that every vertex of \(H\) is a vertex of \(G\) and every edge of \(H\) is a vertex of \(G.\)
More formally, \(H = (V_{H}, E_{H})\) is a subgraph of \(G = (V,E)\) if and only if all three of the following statements are True: \(V_{H} \subseteq V,\) \(E_{H} \subseteq E,\) and for every edge \(e \in E_{H}\) the endpoints of \(e\) are in \(V_{H}.\)

If \(v\) is a vertex of \(G,\) we denote by \(G-v\), the subgraph obtained from \(G\) by removing the vertex \(v\) along with all edges in \(E\) that have \(v\) as an endpoint.

The image shows a graph \(G\), and the subgraph \(G-d\) formed by removing the vertex \(d\).

graph7

In the same way, you can obtain a subgraph by removing multiple vertices along with the edges associated with the removed vertices. The subgraph obtained is called the subgraph induced by removing those vertices.

Example 8

Below is a graph \(G(V,E)\) and the subgraph obtained by \(V-\{a,d\}\), called the induced subgraph \(G-\{a,d\}\), with a slight abuse of notation

graph8

14.7.2. Unions and Intersections Of Graphs

Given two simple graphs \(G_{1}\) and \(G_{2}\), you can form the union of the graphs by taking the union of the two sets of vertices to get a new set of vertices, and taking the union of the two sets of edges to get a new set of edges. Notice that any edge that is in both graphs will only appear once in the new graph because you took the union of the sets of edges, that is, you can’t create parallel edges by forming the union.

In the same way, you can form the intersection of two simple graphs by taking the intersection of the two sets of vertices to get a new set of vertices, and taking the intersection of the two sets of edges to get a new set of edges.

14.8. Graph Isomorphism

Recall that a graph is determined by its set of vertices and how those vertices are connected by edges, but not the drawing you use to represent the graph.

Example 9 - The Same Graph Can Be Drawn In More Than One Way

Consider the two graphs shown in the image.

Isomorphism2av2

Notice that these two graphs are different-looking drawings of the same graph that has vertex set \(\{ A, B, C, D\}\) and edge set \(\{\{A,B\},\{A,C\},\{A,D\}\{B,C\},\{B,D\},\{C,D\}\}.\) Also, notice that the drawing on the left appeared earlier in the chapter, but with unlabeled vertices: This is a drawing of \(K_{4},\) the complete graph on \(4\) vertices.

Notice that using either the adjacency list

Vertex Adjacent Vertices

A

B, C, D

B

A, C, D

C

A, B, D

D

A, B, C

or the adajcency matrix \[\left(\begin{matrix}0&1&1&1\\1&0&1&1\\1&1&0&1\\1&1&1&0\\\end{matrix}\right)\] makes it easier to see that the two drawings represent the exact same graph.

You can imagine the graph on the right being the result of dragging the vertex \(C\) inside the "triangle" with vertices \(A,\) \(B,\) and \(D.\)

Sometimes, different graphs may be essentially the same graph, as in the next example.

Example 10 - Two Graphs That Are Essentially The Same Graph

Consider the two graphs, each with \(4\) vertices and \(6\) edges, shown in the image.

Isomorphism2av3

These graphs are not equal since the graph on the left has vertex set \(\{ A, B, C, D\}\) and the graph on the right has vertex set \(\{ W, X, Y, Z\}.\) However, by comparing the graph on the right to the one on the right in the previous example, you can see that there is a one-to-one correspondence between the two sets of vertices that preserves adjacency (that is, if two vertices in the upper row are endpoints of an edge of the graph on the left, then the corresponding vertices in the lower row are endpoints of an edge of the graph on the right.)

K4Isomporphismv1

A one-to-one correspondence between the set of vertices of two simple graphs that preserves adjacency is called a graph isomorphism, and the two graphs are said to be isomorphic. Informally, you can think of two isomorphic graphs as a pair of graphs where one graph can be relabeled and/or reshaped to obtain the other graph (That is, the two graphs are the same graph but have drawings that are labeled and/or shaped differently.)

Example 11 - Using Graph Isomorphism

Using graph isomorphisms can help identify properties of a graph.

Isomorphism1av2

The three graphs in the image are isomorphic; it is an exercise for you to write out the one-to-one correspondences.

You Try

Write out the one-to-one correspondences between the sets of vertices that define the graph isomorphisms.

Once you have shown that the three graphs are isomorphic, you can use the fact that they are different representations of the same graph. For example,

  • It is not immediately clear that the graphs on the left and right are bipartite, but the arrangement of the vertices in the middle graph into "upper" and "lower" rows makes this easy to see.

  • Also, it is not immediately clear that the graph in the middle or the graph on the right is planar (that is, the graph can be redrawn in a \(2\)-dimension plane so that no edges cross) but this is obvious for the graph on the left.
    Note: This textbook does not discuss planar graphs in detail, but it is worth mentioning that it can be proven that neither \(K_{5}\) nor \(K_{3,3}\) is planar. If you’d like to learn more about planar graphs, one source is the section "Planar Graphs" in Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

Challenge
Write out the adjacency matrix for each of the three graphs, using alphabetical order of the vertex labels, then identify a connection between the three adjacency matrices.
Hint
Look for rows and columns in the different matrices that are identical. The order of the rows and columns would change if you use non-alphabetical reorderings of vertices that correspond to the graph isomorphisms you wrote for the "You try" exercise above.

14.9. Graph Coloring

In some contexts, it can be useful to partition either the set of vertices or the set of edges of a graph into disjoint subsets to make it easier to understand the graph and the network it represents. This act of partitioning is usually referred to as "coloring" since using different colors can make it easy to see and interpret the properties of the partition when the graph is drawn. Notice that you could instead create the partition by assigning labels like "group 1," "group 2," and so on, to each vertex (or edge.)

Petersen_graph_3-coloring.svg

For example, the image shows a graph called the Petersen graph with its vertex set partitioned into 3 subsets so that each edge’s endpoints are in two different subsets of the partition (That is, each edge’s endpoints have different colors.)
Image credit: "Petersen_graph_3-coloring.svg" by Д.Ильин. The copyright holder of this work has released this work into the public domain. This applies worldwide.

The next example discusses an application of vertex coloring.

Example 12 - Redrawing a Map as a Graph

The following image represents a "map" showing four countries; the blue region represents one country (not a body of water) that is surrounded by three other countries.

MapsAndGraph.png

The map can be represented as a graph with vertices colored to match the regions, as shown on the right. If it helps you to connect the graph to the map, imagine that each vertex represents a capital city of the corresponding country.

This way of representing a map was used to prove the Four Color Theorem which states, roughly, that

Four Color Theorem

Any map of countries that can be drawn in a plane such that
(1) every country has a color and
(2) no two adjacent countries have the same color
requires at most four different colors.
In this context "two adjacent countries" share a border that is not just a single point.

The first proof of the theorem was announced in 1976, and a corrected version of the first proof was published in 1989 after some errors were fixed (Yes, professional mathematicians do make mistakes!) The proof was considered controversial by many mathematicians at the time because it was the first major computer-assisted proof: Over one thousand five hundred different cases needed to be checked!

198px-K44_arboricity.svg

In other contexts, it is more appropriate to use edge coloring. That is, each edge of the graph is assigned a color so that the set of edges is partitioned into disjoint subsets. For example, the graph in the image shows that the complete bipartite graph \(K_{4,4}\) can be partitioned as a union of 3 disjoint graphs called forests (Forests are defined later in this textbook, in the Trees chapter.)
Image credit: "K44 arboricity.svg" by David Eppstein. The copyright holder of this work has released this work into the public domain. This applies worldwide.

14.10. Connectivity of Undirected Graphs

A walk on a graph \(G=\left(V,E\right)\) is a finite, non-empty, alternating sequence of vertices and edges of the form, \(v_0e_1v_1e_2\ldots e_nv_n\), with vertices \(v_i\in V\) and edges \(e_i\in E\), where for each integer value of \(i \leq n\) the endpoints of \(e_i\) are the vertices \(v_{i-1}\) and \(v_i.\) The integer \(n\) is called the length of the walk.

If we restrict ourselves to simple undirected graphs, there is at most one edge joining each pair of adjacent vertices, so a walk can be specified simply by listing the sequence of vertices \(v_0v_1\ldots v_n\) (That is, we don’t need to write down the edges.)

  • A trail is a walk that does not repeat an edge. That is, all edges in a trail are distinct.

  • A path is a trail that does not repeat a vertex (but we allow for the possibility that the initial vertex \(v_0\) and terminal vertex \(v_n\) of the path are the same vertex; When \(v_0=v_n\) the path is called a closed path or a circuit.)

  • A cycle is a closed path of length at least 1.

The distance \(d(u,v)\) between two vertices \(u\) and \(v\) in a graph \(G\) is the number of edges in a shortest path connecting them, assuming such a path exists.

Note that different textbooks use different terminology for walks, paths, and so on. The Remix uses terminology consistent with Handbook of Graph Theory, Second Edition by Gross, Yellin, and Zhang.

Example 13 - Trails, Paths, and Cycles

In the graphs below the first shows a trail \(CFDBFE\). It is not a path since the vertex \(F\) is repeated. The second shows a path \(CADFB\), and the third a cycle \(CADFC\). Also note the following distances, \(d(A,D)=1\), while \(d(A,F)=2\), and \(d(A,E)=3\).

graph9

14.11. Connected Graphs

A graph \(G\) is connected if there is a path between any pair of vertices.

Example 14 - A graph that is not connected

The graph \(G\) below is not connected since, as just one example, there is no path between vertex \(a\) and vertex \(e.\)

graph10

\(G\) has adjacency matrix

\( \left(\begin{matrix}0&1&1&0&0\\1&0&1&0&0\\1&1&0&0&0\\0&0&0&0&1\\0&0&0&1&0\\\end{matrix}\right). \)

In the previous example, the graph \(G\) can be treated as a union of two connected subgraphs, called the connected components of \(G.\) It can be proven by mathematical induction that any simple undirected graph that has a finite number of vertices can be written as a union of a finite number of connected components.

14.12. Eulerian Graphs

An Euler path on a graph is a path that uses each edge of the graph exactly once.

An Euler circuit (also called an Eulerian trail) is a closed trail containing each edge of the graph \(G\) exactly once and returning to the start vertex. A graph with an Euler circuit is called Eulerian or is said to be an Eulerian graph.

In the following, the first graph is Eulerian. The sequence of edges \(e_1 e_2 e_3 e_4 e_5 e_6 e_7\) describes an Euler circuit (Notice that some vertices are visited multiple times; it is the edges that must appear exactly once in an Euler path.) The second graph is not an Eulerian graph. Convince yourself of this fact by looking at all necessary trails or closed trails.

graph11 MKD

The following are useful characterizations of graphs with Euler circuits and Euler paths and are due to Leonhard Euler

Theorem on Euler Circuits and Euler Paths
  1. A finite connected graph has an Euler circuit if and only if each vertex has even degree.

  2. A finite connected graph has an Euler path if and only if it has at most two vertices with odd degree.

Euler solved a famous problem about the seven bridges of Königsberg by representing the problem as a graph (with parallel edges.)

14.13. Hamiltonian Graphs

A cycle in a graph \(G\), is called a Hamiltonian cycle if every vertex, except for the starting and ending vertex, is visited exactly once.

A graph is Hamiltonian, or said to be a Hamiltonian graph, if it contains a Hamiltonian cycle.

The following graph is Hamiltonian and shows a Hamiltonian cycle \(ABCDA\), highlighted (Notice that some edges are used multiple times; it is the vertices, starting and ending vertex, that must appear exactly once in an Hamiltonian path.) The second graph is not Hamiltonian.

graph12
Theorem (Dirac) on Hamiltonian graphs

A simple graph, with \(n≥3\) vertices, is Hamiltonian if every vertex \(v\) has degree \(d(v)\geq \frac{n}{2}\).

14.14. Finding A Shortest Path in a Weighted Graph: Dijkstra’s Algorithm

In some applications of graph theory, you need to find a "shortest path" between two vertices of a weighted graph. In the context, shortest may mean "of least distance" but could mean "of least cost" or something else, depending on what the edge weights represent.

Edsger Dijkstra published a paper in 1959 that describes an algorithm for finding the path of "minimum total weight" between two given vertices of a simple connected graph with weighted undirected edges.
Dijkstra’s original paper is also available in the ACM Digital Library at this link.

Here is a description of the algorithm, based on Dijkstra’s original.

  • Task: Given two vertices \(a\) and \(z,\) find the edges of a path between the two vertices that has the minimum possible sum of weights.

  • Input: The list \(V\) of all vertices of the graph, and the list \(E\) of all weighted edges of the graph.
    For example, an adjacency matrix for the graph could be given.

  • Steps:

    1. Define lists \(V_{chosen},\) \(V_{candidates},\) \(E_{chosen},\) and \(E_{candidates}.\)
      Initialize each of the four lists to the empty list.

    2. Append vertex \(a\) to the end of \(V_{chosen}.\)

    3. While vertex \(z\) has not been appended to \(V_{chosen}\)

      1. Set \(v\) to the last vertex appended to \(V_{chosen}.\)

      2. For each vertex \(w\) that is not in \(V_{chosen}\) but is connected to vertex \(v\)

        1. If \(w\) is in \(V_{candidates}\)

          1. If the edge \(e\) that connects \(v\) and \(w\) is part of a path between \(a\) and \(w\) that has total weight less than the weight of the known path that uses the corresponding edge in list \(E_{candidates},\) remove that edge from \(E_{candidates}\) and append \(e\) to \(E_{candidates}.\)

        2. Otherwise, \(w\) is in neither list \(V_{chosen}\) nor list \(V_{candidates},\) so append vertex \(w\) to the end of \(V_{candidates}\) and append the edge \(e\) that connects \(v\) and \(w\) to the end of \(E_{candidates}.\)

      3. After exiting the "for" loop,

        1. find the vertex \(w\) in list \(V_{candidates}\) that has the minimal-weight path to the starting vertex \(a\) and append \(w\) to the end of \(V_{chosen},\) and remove \(w\) from \(V_{candidates},\) and

        2. append the edge in \(E_{candidates}\) that has \(w\) as one of its endpoints to the end of \(E_{chosen}\) and remove that edge from \(E_{candidates}.\)

  • Output: The list \(E_{chosen}\) of weighted edges.

Notice that the list \(E_{chosen}\) is constructed so that it contains edges for only one possible path between \(a\) and \(z,\) and that path must be a minimal-weight path.

Also notice if the loop condition is changed to "while there is a vertex that is not in \(V_{chosen}\)" then the algorithm’s output \(E_{chosen}\) will find the edges needed for a possible minimal-weight path between vertex \(a\) and any other vertex in the graph.

Question: What change would be needed to the input if you had a graph with unweighted edges and needed to find a path between \(a\) to \(z\) that uses the smallest number of edges possible?

This Wikipedia page has some animations that illustrate an alternate implementation of Dijkstra’s algorithm.

14.14.1. The Traveling Salesperson Problem (TSP)

A traveling salesperson needs to visit a number of cities and then return to the starting point.To save time and energy, the salesperson wants to determine the shortest path for the trip.

You can represent the cities and the distances between them by a weighted, complete, undirected graph. The problem then is to find a cycle of minimum total weight that visits each vertex exactly one.

Notice that there are \(\frac{1}{2}(n-1)!\) different cycles for the specified starting point (division by 2 represents that we could reverse the cycle.)

At present, there is no algorithm with polynomial worst-case time complexity to solve the TSP.

14.15. Additional topics will be added to this chapter soon!

  • Traveling Salesperson Problem (TSP)

  • Algorithms for Graphs

  • Shortest-path algorithms (Dijkstra’s and Floyd’s algorithms)

  • Transitive closure (Floyd’s algorithm)

  • Topological sort

MORE TO COME!

15. Trees

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on May 7, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

A tree is a connected simple graph that contains no cycles. Trees are used to model decisions, to sort data, and to optimize networks.

Key terms and concepts covered in this chapter:

  • Trees

    • Properties

    • trees (binary, spanning)

    • expression trees

  • Traversal strategies

  • Spanning trees/forests

  • Spanning trees

  • tree traversals

15.1. Definitions, Examples, and Properties of Trees and Forests

Recall that a simple graph has no loops and no parallel edges, that is, no edge connects a vertex to itself, and no two edges can connect the same pair of vertices. In a simple graph, each edge is determined by its two vertices.

A tree is an undirected simple graph that is connected and has no cycles, which you may recall are paths that start and end at the same vertex. Some sources use the term acyclic to mean "has no cycles."

A forest is the union of several trees. A forest is a simple graph that has one or more connected components, where each connected component is a tree.

Trees1v1

The image shows a forest composed of three trees.

Notice that if you choose any pair of vertices in one of the trees in the image, there is only one path that joins that pair of vertices. In fact, this property is True for any tree.

Theorem

Suppose that \(G\) is a undirected simple graph.

\(G\) is a tree if and only for every pair of vertices of \(G\) there is exactly one path between the vertices.

Proof

First, assume that \(G\) is a tree; we will prove, using proof by contradiction, that for every pair of vertices of \(G\) there is exactly one path between the vertices. By assumption, \(G\) is connected and for every pair of vertices in \(G\) there is a least one path joining those two vertices, so we only need to show that there cannot be two different paths that connect the same pair of vertices. So suppose that for some pair of vertices \(u\) and \(v\) there are two different paths between \(u\) and \(v,\) then we can start to go from \(u\) to \(v\) along a first path and then "turn around" and go back to \(u\) along a second path. This means that there must be a cycle in \(G\) that starts and ends at \(u.\)

Why must there be a cycle?
If we can go all the way from \(u\) to \(v\) along the first path and then go all the way back, in reverse order, from \(v\) to \(u\) along the second path without repeating any edges or vertices (except \(u\)) then we have found a cycle that starts and ends at \(u.\)
If the two paths have some common edges or vertices, we can use only part of each path to create a cycle that does not repeat any edges or vertices. To do this, write the first path as \(v_{0}v_{1}...v_{n-1}v_{n}\) where \(u = v_{0}\) and \(v = v_{n}\) and let \(j\) stand for the smallest positive index such that vertex \(v_{j}\) appears in both the first and second paths. Now create a cycle by using the beginning of the first path to go from \(u\) to \(v_{j}\) and then using the beginning part of the second path that goes from \(u\) to \(v_j\), but in reverse order, to go back from \(v_{j}\) to \(u.\) Notice that the index \(j\) is chosen so that that no edge or vertex (except \(u\) and \(v_j\)) that we use can belong to both of these shorter paths from \(u\) to \(v_{j},\) so no edges are repeated when we use this path to go from \(u\) back to \(u\) - we have created a cycle.

To continue with the proof by contradiction, we’ve shown that there is a cycle in \(G,\) but this contradicts the assumption that \(G\) is a tree that has no cycles. This means that the assumption "there is a pair of vertices in \(G\) that are connected by two different paths" must be False. We have proven that for any pair of vertices in \(G\) there is exactly one path joining that pair of vertices.

Secondly, assume that for every pair of vertices of \(G\) there is exactly one path between the vertices. We will prove, using proof by contradiction, that \(G\) must be a tree. By assumption, \(G\) is connected because for any pair of vertices there is a path between those vertices, so we only need to prove that \(G\) has no cycles. So suppose that \(G\) does have a cycle \(v_{0}v_{1} \cdots v_{n-1}v_{n}\) where \(v_{0} = v_{n}\) (that is, \(v_{0}\) and \(v_{n}\) are the same vertex, but no other vertex is repeated in the cycle.) Notice that, because \(G\) is a simple graph, the integer \(n\) must be greater than or equal to 3, so we have two different trails, \(v_{0}v_{1}\) and \(v_{1} \cdots v_{n},\) between \(v_0\) and \(v_1\) (Recall that \(v_{0}\) and \(v_{n}\) are the same vertex.) Now reverse the order of vertices in the second path to get two different paths between the vertices \(v_{0}\) and \(v_{1}\) - but we assumed that for every pair of vertices of \(G\) there is exactly one trail between those vertices. We have gotten a contradiction, which means that \(G\) cannot have a cycle. Therefore, \(G\) is connected and has no cycles, which by definition means that \(G\) is a tree.

Q.E.D.

Here is another characterization of trees.

Theorem

For every positive integer \(n,\) if a tree has \(n\) vertices then the tree has \(n-1\) edges.

Proof

We use mathematical induction on the number of vertices \(n.\)

Let \(P(n)\) be the predicate \[P(n): \text{If a tree has } n \text{ vertices then the tree has } n-1 \text{ edges.}\] We will prove that the proposition \((\forall n \in \mathbb{N}_{>0})P(n)\) is True.

Basis Step: \(P(1)\) is True since a tree that has \(1\) vertex has \(0\) edges (otherwise, the edge would have to be a loop, but trees are simple graphs that don’t have loops.)
We can also prove that \(P(2)\) is True in case the proof of \(P(1)\) feels unsatisfying. If a tree has \(2\) vertices, then since a tree is a connected simple graph, the \(2\) vertices must be connected by a path, and since a tree cannot have parallel edges, the vertices are the endpoints of exactly \(1\) edge. This proves that \(P(2)\) is True.

Induction Step: First, we assume that the induction hypothesis \(P(k)\) is True for some positive natural number \(k\).

Secondly, we will prove that the conditional \(P(k) \rightarrow P(k+1)\) must be True, which means we can use modus ponens (or the equivalent tautology \(( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)\)) to show that \(P(k+1)\) is also True.

Assume that \(P(k)\) is True for the positive integer \(k,\) that is, if a tree has \(k\) vertices then the tree has \(k-1\) edges. We can assume \(k \geq 2\) since the cases when \(k \in \{ 1, 2 \}\) were proved in the Basis Step. Suppose that we have a tree \(T\) that has \(k+1\) vertices; we will prove that the tree must have \(k\) edges. First, there must be at least one vertex \(v\) in \(T\) such that the degree of \(v\) is \(1:\) If every vertex had at least degree \(2,\) then we could find a cycle in \(T\) which cannot be True since \(T\) is a tree. Remove one vertex \(v\) that has degree \(1\) along with the edge that has \(v\) as an endpoint to obtain the subgraph \(T-v.\) Notice that for every pair of vertices of \(T-v\) there is exactly one trail between the vertices, so from the previous theorem, \(T-v\) is a tree. Also, \(T-v\) has \(k\) vertices because we removed only \(v,\) and we can apply the Induction Hypothesis to conclude that \(T-v\) has \(k-1\) edges. Now reinsert vertex \(v\) and the edge that was removed to obtain the tree \(T\) that has \(k+1\) vertices and \(k\) edges. Therefore, if \(P(k)\) is True then \(P(k+1)\) is True, too. That is, \(P(k) \rightarrow P(k+1)\).

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can use universal generalization to conclude that for all positive integers \(n,\) if a tree has \(n\) vertices then the tree has \(n-1\) edges.

Q.E.D.

15.2. Additional Topics

Students in CSC 230 Spring 2025 can refer to the slide decks (from Spring 2024 and Fall 2024) that are posted in Canvas.

  • Spanning Trees and Spanning Forests

    • Kruskal’s Algorithm

  • Binary Trees

    • Tree Traversal Strategies

    • Expression Trees

  • Algorithms for Binary search trees

    • Algorithms for Depth- and breadth-first traversals

16. Appendix: On-Demand Math Resources

This chapter was last updated on August 25, 2024.

This appendix discusses material that you have likely seen before but may need some review.

16.1. Linear Functions And Their Equations

A linear function is one that has a constant rate of change.

\begin{array}{|l|c|c|c|c|c|} & x & 1 & 2 & 3 & 4 & 5 \\ \hline \\ & y & 1 & 3 & 5 & 7 & 9 \end{array}

The table above displays a function with independent variable \(x\) and dependent variable \(y\).

Notice that the value of \(y\) increases by \(2\) for each increase in \(x\) by \(1\). The rate of change of this function is \(2\); this corresponds to the slope \(m\) of the continuous line that passes through the points with \(xy\)-coordinates given in the table.

The vertical intercept \(b\) (in this case, the y-_intercept) is the \(y\)-value that corresponds to \(x = 0\), that is, \((0,\,b)\) is on the same continuous line as the points represented in the table. In this example, \(0\) is not a value of \(x\), but we can still find the vertical intercept by _subtracting \(1\) from the smallest \(x\)-value and subtracting \(2\) from the corresponding \(y\)-value , which tells us that the point \((0,\,-1)\) lies on the same continuous line as the points represented in the table.

The equation of the linear function determined by the points in the table is \(y = 2 \cdot x + (-1)\), which can be written more simply as \(y = 2x - 1\). This also is the equation of the continuous line that passes through the points with \(xy\)-coordinates given in the table, but the linear function can be restricted to a smaller domain as needed by the context where it is being used, for example, we may only need to use inputs \(x\) from the set of positive integers or possibly just the set \(\{ 1,\,2,\,3,\,4,\,5 \}\).

16.2. Arithmetic Sequences

An arithmetic sequence or arithmetic progression is a sequence of numbers \(a_{0}, \, a_{1}, \, a_{2}, \, \ldots\) such that there is a constant \(d\) so that \[ a_{i+1}-a_{i} = d \text{ for all } i \in \mathbb{N}\] The constant \(d\) is called the common difference of the sequence. The sequence can be infinite, defined for every \(i \in \mathbb{N},\) or finite, defined only for \(i \in \mathbb{N}\) less than some greatest index \(n\).

As an example, the sequence \(1, 4, 7, 10, 13, 16\) is a finite arithmetic sequence with common difference 3 and index set \(\{ 0, 1, 2, 3, 4, 5 \}.\) We can extend that sequence to an infinite arithmetic sequence \(1, 4, 7, 10, 13, 16, \ldots\) using a recursive definition \[a_{0} = 1, \text{ and } a_{i+1} = a_{i} + 3 \text{ for integer } \in \mathbb{N} \]

Notice that there is also a nonrecursive definition for this sequence: Since the difference between two consecutive terms of the sequence is always \(d\) the points \((i, \, a_{i})\) must lie on a line in the xy-plane. The slope of this line is \(d\) and the y-intercept of the line is the initial value \(a_{0},\) so the arithmetic sequence can also be described as \[ a_{i}= d \cdot i + a_{0} \text{ for all } i \in \mathbb{N}\]

For the example \(1, 4, 7, 10, 13, 16, \ldots\), the nonrecursive definition is \[a_{i} = 3i + 1 \text{ for all } i \in \mathbb{N}\]

16.3. Geometric Sequences

A geometric sequence or geometric progression is a sequence of numbers \(a_{0}, \, a_{1}, \, a_{2}, \, \ldots\) such that there is a constant \(r\) so that \[ a_{i+1} = r \cdot a_{i} \text{ for all } i \in \mathbb{N}\] The constant \(r\) is called the common ratio of the sequence. The sequence can be infinite, defined for every \(i \in \mathbb{N},\) or finite, defined only for \(i \in \mathbb{N}\) less than some greatest index \(n\).

As an example, the sequence \(5, 10, 20, 40, 80\) is a finite geometric sequence with common ratio 2 and index set \(\{ 0, 1, 2, 3, 4 \}.\) We can extend that sequence to an infinite geometric sequence \(5, 10, 20, 40, 80, \ldots\) using a recursive definition \[a_{0} = 5, \text{ and } a_{i+1} = a_{i} \cdot 2 \text{ for integer } i \in \mathbb{N} \]

There is also a nonrecursive definition for this sequence: Since the ratio between a term and its predecessor in the sequence is always \(r\) the points \((i, \, a_{i})\) must lie on the graph of an exponential function in the xy-plane. The y-intercept of the graph is the initial value \(a_{0},\) so the geometric sequence can be described as \[ a_{i} = a_{0} \cdot r^{i} \text{ for all } i \in \mathbb{N}\]

For the example \(5, 10, 20, 40, 80, \ldots\), the nonrecursive definition is \[a_{i} = 5 \cdot 2^{i} \text{ for all } i \in \mathbb{N}\]

MORE TO COME!

17. Appendix: Library of Functions

This chapter was last updated on August 25, 2024.

Recall that for any function,

  • the domain is the given set of input values for the function,

  • the codomain is a given set that contains all possible output values (but may contains other values that are not outputs, too), and

  • the range is the set that contains only the output values of the function.

Functions can in many cases be visualized graphically, for example when mapping from the real line, \(\mathbb{R}\) to the real line, such maps are viewed on a Cartesian plan.

17.1. Polynomial Functions

A polynomial is an algebraic expression of the form \(a_{n}x^{n} + a_{n-1}x^{n-1} + \ldots + a_{1}x^{1} + a_{0}\), that is, \(\sum\limits_{i=0}^{n}a_{i}x^{i}\), where n is a natural number, x is a variable, and \(a_{n}, a_{n-1}, \ldots, a_{1}, a_{0}\) are real numbers. Examples of such expressions are

  • 7, a constant,

  • \(2x + 7\), a linear polynomial,

  • \(3x^{2} + 2x + 7\), a quadratic polynomial.

A polynomial can be evaluated by substituting a number for each occurence of x. For example, if we substitute \(-1\) for x in each of the three polynomials above, we get

  • 7 evaluated at \(x = -1\) is 7,

  • \(2x + 7\) evaluated at \(x = -1\) is \(2 \cdot (-1) + 7 = 5\),

  • \(3x^{2} + 2x + 7\) evaluated at \(x = -1\) is \(3 \cdot (-1)^{2} + 2 \cdot (-1) + 7 = 8.\)

In this way, every polynomial can be used to define a corresponding polynomial function with domain \(\mathbb{R}\) and codomain \(\mathbb{R}.\)

17.1.1. Quadratic Function

The function \(f(x) =x^2\), denotes the association \((a,b) =(x, x^2)\) with \(f : \mathbb{R} \rightarrow \mathbb{R}\). We notice that the range is the set of real numbers \([0, \infty)= \mathbb{R}^{+}\). The function is not invertible, since it is not injective. For example, we have both \(f(-3) =9\) and \(f(3)=9\). With \(f : \mathbb{Z} \rightarrow \mathbb{Z}\) notice that the range is now \(\mathbb{N}\)

\begin{array}{lccccccccccc} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & x^2 & 25 & 16 & 9 & 4 & 1 & 0 & 1 & 4 & 9 & 16 & 25 \end{array}

The graph of \(x^2\)
geometricsequence

17.1.2. The Cubic function

The function \(f(x) =x^3\), denotes the association \((a,b) =(x, x^3)\) with \(f : \mathbb{R} \rightarrow \mathbb{R}\). Also, we notice that the range is the set of all real numbers \((- \infty , \infty)=\mathbb{R}\). The function is bijective and so invertible. With \(f : \mathbb{Z} \rightarrow \mathbb{Z}\), notice that the range, in addition to domain, is also \(\mathbb{Z}\)

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & x^3 & -125 & -64 & -27 & -8 & -1 & 0 & 1 & 8 & 27 & 64 & 125 \end{array}

The graph of \(x^3\)
geometricsequence

17.1.3. The Square Root and Cube Root Functions

For the purposes of completeness and for comparing how fast functions \(f(x)\) grow for large x, we present the inverse of the functions \(f(x)= x^2\) and \(f(x)= x^3\), when \(f(x):\mathbb{R}+→\mathbb{R}+\). Respectively, the functions\( f(x)=\sqrt{x}\) and \(f(x)= \) \$root(3)(x)\$.

\begin{array}{lcccccccccclll} & x & 0 & 1 & 4 & 9 & 16 & 25 & 36 & 49 & 64 & 81 & 100 & 121 & 144 \\ & \sqrt{x} & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \end{array}

The graph of \(√x\)
geometricsequence

\begin{array}{lcccccl} & x & 0 & 1 & 8 & 27 & 64 & 125 \\ & \sqrt[3]{x} & 0 & 1 & 2 & 3 & 4 & 5 \end{array}

The graph of \$root(3)(x)\$
geometricsequence

17.2. Exponential and Logarithmic Functions

We begin by summarizing important properties of exponentials.

Properties of Exponentials
  1. For \(a>0, a ≠ 1\), \(a^m.\ a^n=a^{m+n}\). For example, \(3^4\cdot 3^5=3^{4+5}=3^9\).

  2. \(\frac{a^m}{a^n}=a^{m-n}\). For example, \(\frac{3^5}{3^2}=3^{5-2}=3^3 \).

  3. \(\left(a^m\right)^n=a^{m.n\ }\). For example, \(\left(3^4\right)^3=3^{4\cdot 3}=3^{12}\).

  4. \(\left(a.b\right)^m=a^mb^m\). For example, \(\left(3x\right)^4=3^4.x^4\)

  5. \(a^0=1\)

  6. \(a^{-1}=\frac{1}{a}\) For example, \(3^{-1}=\frac{1}{3}\).

  7. \( a^\frac{1}{n}=root(n)(a)\).

17.2.1. Exponential Functions

Exponential functions are of the form \(f\left(x\right)=b^x\), where \(b\) is the base and the variable \(x\) is in the exponent. The base \(b>0\) and \(b ≠ 1\). Properties of exponential functions come from properties of exponents. When the base \(b\) is greater than 1 the exponential function is increasing exponentially, as in the case \(f(x) = 2^x\).

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & 2^x & \frac{1}{32} & \frac{1}{16} & \frac{1}{8} & \frac{1}{4} & \frac{1}{2} & 1 & 2 & 4 & 8 & 16 & 32 \end{array}

The graph of \(2^x\)
geometricsequence

When the base \(b\) is less than 1 the exponential function is decreasing exponentially, as in the case \(f(x) = \left(\frac{1}{3}\right) ^x\).

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & (\frac{1}{3})^x & 243 & 81 & 27 & 9 & 3 & 1 & \frac{1}{3} & \frac{1}{9} & \frac{1}{27} & \frac{1}{81} & \frac{1}{243} \end{array}

The graph of \(\left(\frac{1}{3}\right)^x\)
geometricsequence

17.2.2. Logarithmic Functions

Logarithmic functions are the inverse functions corresponding to exponential functions and are used to solve exponential equations. For example, \(y=2^x\) is solved for \(x\) by inverting \(x=\log_2{y}\). Properties of logarithms follow from this relationship between exponentials and logarithms and properties of the exponentials.

We summarize three important properties of logarithms.

Properties of Logarithms
  1. The exponential function \(f\left(x\right)=y=b^x\), written in exponential form is \(\log_b{f\left(x\right)=\log_b{y=x}}\). Its inverse is the logarithmic function \(x=b^y\), which is denoted \(y=\log_b{x}\).

  2. The power rule for logarithms states that \(\log_b m^x=x\cdot \log_b m\).

  3. Comparing the solutions of \(2^x\), \(x=\log_2{5}\text{,}\) and \(x=\frac{\log_{10}{5}}{\log_{10}{2}}\), gives \(\log_2{5}=\frac{\log_{10}{5}}{\log_{10}{2}}\), which, essentially, is the change of base formula \(\log_b{A}=\frac{\log_a{A}}{\log_a{b}}\).

All other properties of logarithmic functions come from properties relating the logarithm as the inverse of the exponential and the equivalence of the logarithm \(a =\log_b m\) with \(b^a=m\).

When the base \(b\) is greater than 1, the logarithm function is increasing, as in the case \(f(x) = \log_2 x\).

\begin{array}{llllllcccccc} & x & \frac{1}{32} & \frac{1}{16} & \frac{1}{8} & \frac{1}{4} & \frac{1}{2} & 1 & 2 & 4 & 8 & 16 & 32 \\ & log_2 x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \end{array}

The graph of \(\log_2 x\)
geometricsequence

When the base \(b\) is less than 1, the logarithm function is decreasing exponentially, as in the case \(f(x) = \log_{\frac{1}{3}} \ x\).

\begin{array}{llllllcccccl} & x & \frac{1}{243} & \frac{1}{81} & \frac{1}{27} & \frac{1}{9} & \frac{1}{3} & 1 & 3 & 9 & 27 & 81 & 243 \\ & \log_{\frac{1}{3}} x & 5 & 4 & 3 & 2 & 1 & 0 & -1 & -2 & -3 & -4 & -5 \end{array}

The graph of \(\log_{\frac{1}{3}} \ x\)
geometricsequence

17.3. The Floor and Ceiling Functions

The floor and ceiling functions round a real number input to an integer.

  • The floor of x, written as \(\lfloor x \rfloor,\) is the greatest integer that is less than or equal to x. In older textbooks you may see this function named as the greatest integer function and denoted by \([ x ] .\) For example, \(\lfloor -1.5 \rfloor = -2\).

  • The ceiling of x, written as \(\lceil x \rceil,\) is the least integer that is greater than or equal to x. For example, \(\lceil -1.5 \rceil = -1\).

On a number line, the floor of x and the ceiling of x are the consecutive integers such that \(\lfloor x \rfloor \leq x \leq \lceil x \rceil\).

The floor and ceiling functions are step functions: In the plane, their plots look like they are made of horizontal steps.

id_floor_and_ceil

Note that the plot of the floor function, shown in green, is always at the same height or below the graph of the line \(y = x\), and that the plot of the ceiling function, shown in red, is always at the same height or above the graph of the line \(y = x.\)

17.4. Other Functions

MORE TO COME!!

18. Appendix: An Introduction to Python

18.1. Programming Basics

Computers are programmable machines that process information by manipulating data. As the data can represent any real world information, and programs can be readily changed, computers are capable of solving many kinds of problems.

18.1.1. Programming Languages and Environments

There are many different programming languages for programmers to choose from. Each language has its own advantages and disadvantages, and new languages gain popularity while older ones slowly lose ground. In this book, we use the Python 3 programming language. It is popular in both academia and industry, and was designed with education in mind.

18.1.2. PythonTutor

PythonTutor is an environment for creating very short and simple Python programs and visualizing their execution. This enables beginners to visually see the data as it gets manipulated by the instructions.

Example 1 - A Simple Program

Use the Next button to step through the program below and watch the data get created and modified. Notice how the arrows move to indicate what instruction the program execution is on.

18.1.3. Comments

Program files can contain source code and comments. Comments are not instructions for the computer to follow, but instead notes for programmers to read. Comments in Python start with a pound sign (#). Anything following the pound sign that is on the same line as the pound sign will not be executed. Often, at the very beginning of a program, comments are used to indicate the author and other information. Comments are also used to explain tricky sections of code or disable code from executing.

# This line is not Python code, it is a comment.

score = 9001 # over 9000!!!

# The next line of code is disabled because is starts with a #.
# score = 8000

18.2. Data Types

Programming is all about information processing. Information is categorized by data types. Four basic data types we will be considering are int, float, bool, and str. Int consists of integers, which are whole numbers written without a decimal point. This includes positive and negative whole numbers as well as zero. Float consists of floating-point numbers, which are numbers that are written with a decimal point. Bool consists of Boolean values (named after the mathematician George Boole). The only Boolean values are True and False. Str consists of strings, which are sequences of text characters including punctuation, symbols, and whitespace. Every value in Python has a corresponding data type. The table below shows examples of ints, floats, and strings.

Table 2. Basic Data Types
Data Type Example Values

int

2, -2, 0, 834529

float

3.14, -2.3333, 7.0

bool

True, False

str

"Hello World!", 'Coconut', "0", '4 + 6'

Strings and Quotation Marks

Strings are always surrounded by quotation marks. Python allows either single (') or double (") quotation marks for single line strings. Python also allows triple quotation marks (either ''' or """) for a string that spans multiple lines.

18.3. Variables

Variables are (virtual) boxes that store values for reuse later. A variable has a name and a current value. Each variable can only hold one value at a time. Variables are assigned a value using the single equal sign (=). As Python executes one line at a time, variables come into existence on the line where they are first assigned. Each variable only stores the most recent value assigned to it.

Example 2 - Basic Variables and Data Types

Use the Next button to step through the program below and watch the variables get created.

Variable Names

Variables can have complex names like player1_score. In general, never start a variable name with a number and never use spaces in variable names.

18.4. Operators and Expressions

Example 3 - Numerical Operators and Expressions

Try to predict the variable names, values, and data types in the the code below.

Expression Evaluation

When Python encounters a line with an expression, it always evaluates the expression first. Consider the following line of code:

x = (3 + 4) * 2

Python first calculates the value of the expression to the right of the equal sign by using the standard order of operations starting inside the parentheses. The value given by the above expression is calculated to be equal to 14. Then, Python creates the variable x and assigns the value 14 to this variable. The variable only stores the calculated value, not the entire expression that generated that value.

Example 4 - Boolean Operators and Expressions

Note how each expression returns a Boolean value. These are called Boolean expressions.

18.5. Strings and Printing

Besides creating and storing values in variables, we can also output text on a screen by calling the print() function.

Example 5 - Strings and print()

Try to predict the printed output. Look at the small window in the top-right as you use the Next button.

18.6. If Statements

A block of code is a collection of lines of code that are either all executed (in sequential order) or all skipped. Blocks always start with a colon (:) on the previous line and require every line in the block to be indented the same amount using tabs or spaces. One way in which Python can execute or skip over a block involves using an if command and a Boolean expression. If the expression is true, then the block executes. Othewise, the block is skipped.

Example 6 - If Statements
  • Notice that all un-indented lines and the second block execute, while the first block does not execute.

  • Which blocks execute if age = 22? What about if age = 15?

When you want to force exactly 1 of 2 blocks to execute (as opposed to just skipping a block), you can use the else command in addition to the if command. If the expression following the if command is true, then the first block executes. Otherwise, the second block executes.

Example 7 - If-Else Statements
  • No matter how you change the scores, only 1 print() function executes.

  • Try making the scores the same.

In order to force exactly 1 of more than 2 blocks to execute, you can use the elif command in addition to the if and else commands. Each elif command must be followed by a Boolean expression. When using if and elif commands, each expression is checked in sequential order, and the block following the first true expression executes. If none of the expressions are true, the block following the else command is executed.

Example 8 - If-Elif-Else Statements
  • Even though several of the Boolean expressions are true when temp = 83, only the block after the first such expression executes.

  • Try several different values of temp and see what is printed.

18.7. While Statements

Python can execute a block repeatedly using a while statement and a Boolean expression. The block repeats until the Boolean expression is false.

Example 9 - While Statements

What numbers do you think will print? Notice that, without line 3, the loop would run forever.

The += operator increases the value of the variable written to the left of the operator by the value written to the right of the operator.

18.8. Lists and Loops

When you need to consider many values at once, use a list.

Example 10 - List Indexing

Try index -2.

When you want to consider every value in a list, use a for loop.

Example 11 - For Loops With Indices
  • What does the variable i represent?

  • What line creates the variable i?

  • What line modifies it?

The range() function returns a sequence of numbers. The sequence starts at the value given by the first argument, increments by 1, and ends at one less than the value given by the second argument. For example, range(2,5) returns 2,3,4. If only one argument is given, that argument is considered the second argument and the first argument is set to 0 by default. For example, range(4) returns 0,1,2,3.
Example 12 - For Loops Without Indices
  • What line creates the variable x?

  • What line modifies it?

Example 13 - Summing with Loops
  • What line creates the variable g?

  • What line modifies it?

18.9. List Appending and Slicing

We can append to lists with the concatenation operator (+). We can also slice a list using the bracket notation and two indices separated by a colon (:). The first index specifies the starting point of the slice while the second index specifies the stopping point of the slice + 1.

Example 14 - List Appending and Slicing

Try to predict the variable names, values, and data types in the code below. Use the Next button to check your answers.

18.10. Lists versus Arrays

Python has both lists and arrays. Lists are convenient because the items in the list can be of different data types, but all items in an array must have the same data type. Arrays are more efficient because the items are stored in the array, but a list stores only a reference (or pointer) to the actual item. For the purposes of the MKD Remix, lists are preferred, but be aware that in some cases an array may be a better choice than a list. You can read more about arrays here.

18.11. Defining Custom Functions

In the examples above we have called several functions like print() and len(). You can define your own functions using def. A function definition includes zero or more parameter variables. The values of those parameter variables are referred to as the arguments of the function.

Example 15 - Defining Functions
  • What line creates the variables a and b?

  • When does that line execute?

  • How many times?

  • Where do the variables a and b get their values from?

18.12. Exercises

  1. Given the following Python code, what is the value and data type of each variable?

    a = 6 + 8
    large = a // 4
    b = 22 // 3
    c = 22 % 3
    d = False or True
    e = True and False
    sheep = (True or (b > 10))
  2. Given the following Python code, determine the printed output.

    print("Hello World!")
    a = "The answer is"
    b = 6 * 7
    print(a, b)
    print(False, "Hobbit", 1, "Ring")
  3. For the following code, determine the value of the variable letter when the score is 92, 84 and 59.

    score = #an interger between 0 and 100
    if score >= 90:
    	letter = 'A'
    elif score >= 80:
    	letter = 'B'
    elif score >= 70:
    	letter = 'C'
    elif score >= 60:
    	letter = 'D'
    else:
    	letter = 'F'
  4. For the following code, determine the value of the variable ans for each case given below.

    if outside == False:
    	if (n >= 2 and n <= 20):
    		return ans = True
    	else:
    		return ans = False
    else:
    	if (n <= 2 or n >= 20):
    		return ans = True
    	else:
    		return ans = False
    1. n = 3, outside = False

    2. n = 15, outside = False

    3. n = 15, outside = True

    4. n = 12, outside = True

  5. What will this code print out?

    while count > 0:
    	print("Welcome")
    	count -= 1
  6. Write Python code to satisfy the following conditions. Then test your code on the values of the variables given.

    1. Given an int n, return the absolute diffrence between n and 10, except return triple the absolute dfference if n is over 10. It should return 1 when n=9. It should return 33 when n=21. What will the code return when n=7 or n=35?

    2. We have a loud talking robot. The "hour" parameter is the current hour time in the range 0 to 23. We are in trouble if the robot is talking and the hour is before 6 or after 21. Return True if we are in trouble. It should return True when the robot is talking and the hour is 8. It should return False when the robot is not talking and the hour is 8. What does it return if the robot is talking and the hour is 9?

  7. What will the following code print out?

    numbers = [1, 3, 5, 7, 10]
    sq = 0
    for val in numbers:
    	sq = val * val
    	print(sq)
  8. What will the following code print out?

    for i in range(1, 20, 2):
    	print(i)
  9. Use the following definition of the function front3() to find the output of the program for the list [1, 3, 5, 7].

    def front3(nums):
    	i = 0
    	while (i < len(nums) and i < 5):
    		if nums[i] == 3:
    			return True
    		i += 1
    	return False
  10. Write a function that takes, as input, two lists of integers, a and b, both of length 3, and returns, as output, a new list of length 2 containing the last elements of a and b. For example, if a = [1, 2, 3] and b = [10, 20, 30], then the function should return the list [3, 30].

19. Appendix: Python Syntax Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
# the '#' character makes a COMMENT separating Python from English x = 3 # create the VARIABLE with NAME x and STORE INT VALUE 3 Sebastien_Score = 9001 # variable names can be long, but no spaces! y = 1.0 * 3 # y stores EXPRESSION's RETURN FLOAT value 3.0 z = "Hi There!" # z stores a STRING value w = False # w stores a BOOLEAN value v = [3, 30, "Hello World"] # v stores a LIST of values print(z) # print function displays output ("Hi There!") # Maths a = 3 b = 3.0 # b stores 3.0 (float values are decimal approximations) c = 7 // 2 # c stores 3 (int division always gives ints) d = 7 % 2 # d stores 1 (Mod or Remainder of the division) a = 5 # change the value of a to 5 a += 1 # INCREMENT the value of a by 1 (to 6) # Boolean Operators a = (3 > 2) # a stores True because 3 is greater than 2 a = (2 >= 2) # a stores True because 2 is greater than or equal to 2 a = (3 < 2) # a stores False because 3 is not less than 2 a = (2 <= 2) # a stores True because 2 is less than or equal to 2 a = (3 != 2) # a stores True because 3 is not equal to 2 a = (3 == 3) # a stores True because 3 is equal to 3 a = (True and False) # a stores False, AND returns True only when both sides are True a = (True or False) # a stores True, OR returns True if at least 1 side is True a = (not False) # a stores True, NOT returns opposite # BLOCKS are sections of any code chunked together with INDENTATION # BLOCKS start with a ':' and continue with each INDENTED line x = 7 if x > 8: # if CONDITION is True, then execute block, otherwise skip block. print("Hello") # since x stores 7, this will skip print("I Am Sam.") # since x stores 7, this will skip elif x > 2: # elif condition is True AND previous if was False, execute block print("Hi") # since x stores 7, this will execute print("I am Sally.") # since x stores 7, this will execute else: # if all previous conditions are False, executer block. print("Yo") # since x stores 7, this will skip print("I'm Bob.") # since x stores 7, this will skip while x > 3: # repeat a block until condition becomes False print("Apples") x += -1 # Lists store multiple values a = [10, 30, 20, 90] # create a new list x = len(a) # x stores 4 (the length) b = a[0] # INDEX into the list, 0 is first value, b stores 10 c = a[3] # c stores 90 d = a[-1] # -1 is last value, d stores 90 e = a[1:3] # slice a from index 1 up to index 3, e stores [30, 20] a[1] = 50 # modify the second element in the list, a is now [10, 50, 20, 90] f = a + [5, 15] # f stores [10, 50, 20, 90, 5, 15], CONCATENATION not addition g = range(0, 4) # range function returns list 0 up to 4, g stores [0, 1, 2, 3] # For Loops for c in "Elephant!": # repeat block with c storing each character 1 at a time print(c) # prints one letter per line for x in [10, 30, 20]: print(x) # prints one number per line # Custom Functions def myfunc(a, b): # DEFINES a new function that takes 2 INPUT PARAMETER values c = 2 * a + b # executes only when function is called return c # RETURNS a value back to the calling code x = myfunc(10, 5) # Calls the myfunc() function, x stores return value 25 y = myfunc(1, 3) # Calls the myfunc() function, x stores return value 5